A Scientific Human-Agent Reproduction Pipeline

Joschka Birk; Gregor Kasieczka; Siddharth Mishra-Sharma; Benjamin Nachman; Dennis Noll; Tanvi Wamorkar

doi:10.48550/arXiv.2604.18752

← Recent

AG-2026.04-1288·hep-ph·cross-listed: hep-ex

A Scientific Human-Agent Reproduction Pipeline

Authors

Joschka Birk
Gregor Kasieczka
Siddharth Mishra-Sharma
Benjamin Nachman
Dennis Noll
Tanvi Wamorkar

Abstract

Reproducing scientific analyses is essential for preserving knowledge, building extensible codebases, and deepening researcher understanding - yet the effort often outweighs its academic recognition. We argue that the reproduction of scientific data analyses is fundamentally a translation task: converting human-readable knowledge (papers, documentation) into machine-readable analysis code. This makes it uniquely well-suited for AI agents. We present SHARP (Scientific Human-Agent Reproduction Pipeline), a structured framework for reproducing scientific analyses through human-agent collaboration. SHARP decomposes a reproduction task into discrete steps, which an AI agent executes autonomously using specialized subagents for code generation, testing, and quality assurance. At defined checkpoints, the researcher reviews progress, provides feedback, and steers the analysis - keeping the human firmly in control of scientific judgment while the agent handles implementation. We demonstrate SHARP by reproducing a jet classification task in particle physics from a published paper. We evaluate the reproduction along three axes: analysis performance against the original results, code quality and faithfulness, and the nature of the human-agent conversation. The latter is evaluated with a novel framework for characterizing human-agent interactions. Our work highlights a practical model for AI-assisted scientific reproduction where the researcher's role shifts from writing code to understanding, evaluating, and directing - elevating human understanding rather than replacing it.

Submitted

20 April 20263 months ago

Version

v1

License

CC-BY-4.0

DOI

10.48550/arXiv.2604.18752

Cite this preprint

BibTeX RIS

Imports into BibLaTeX, Zotero, Mendeley, EndNote.

PDF

Open PDF

Opens in a new tab · v1.

Summary

SHARP is an AI-assisted framework that breaks down scientific paper reproduction into discrete steps, with an AI agent handling code generation while researchers maintain control through checkpoint reviews.

Reproducing published analyses is reframed as a translation problem—converting human-readable papers into machine code—which AI agents can tackle systematically using specialized subagents for coding, testing, and quality checks.
The framework keeps humans in the decision-making loop at defined checkpoints rather than automating away scientific judgment, shifting the researcher's role from code-writing to understanding and directing the analysis.
A demonstration on particle-physics jet classification shows the approach works in practice, and the authors introduce a novel evaluation framework for characterizing how humans and AI agents interact during scientific reproduction.

curious · generated by claude-haiku-4-5

Chat with this PDF

Ask questions, probe assumptions, request a plain-English summary. Answers cite sections from the preprint itself.

Community

Questions and answers about this paper from other readers. No formal peer review — just a place to think out loud.