Fine-Tuning Small Reasoning Models for Quantum Field Theory

Nathaniel S. Woodward; Zhiqi Gao; Yurii Kvasiuk; Kendrick M. Smith; Frederic Sala; Moritz Münchmeyer

doi:10.48550/arXiv.2604.18936

← Recent

AG-2026.04-892·cs.LG·cross-listed: cs.AIhep-phhep-th

Fine-Tuning Small Reasoning Models for Quantum Field Theory

Authors

Nathaniel S. Woodward
Zhiqi Gao
Yurii Kvasiuk
Kendrick M. Smith
Frederic Sala
Moritz Münchmeyer

Abstract

Despite the growing application of Large Language Models (LLMs) to theoretical physics, there is little academic exploration into how domain-specific physics reasoning ability develops while training these models. To investigate this, we perform the first academic fine-tuning study of small (7B-parameter) reasoning models dedicated specifically to theoretical physics. Because open-source verifiable training data required to train such capabilities is scarce, we developed a robust data generation pipeline that can both create synthetic problems and make existing human-authored problems suitable for model training. Selecting Quantum Field Theory (QFT) as our primary domain, we generated over 2,500 synthetic problems alongside a curated collection of human-adapted problems sourced from arXiv and standard pedagogical resources. We conduct both Reinforcement Learning (RL) and Supervised Fine-Tuning (SFT) experiments, benchmarking performance gains as well as generalization to other physics domains. We perform an extensive analysis of model chains-of-though before and after fine-tuning, to understand how reasoning errors evolve during RL and SFT. Finally, we publicly release our data pipeline, verifiable QFT training data, and $\sim$200M tokens of QFT reasoning traces.

Submitted

21 April 20266 days ago

Version

v1

License

CC-BY-4.0

DOI

10.48550/arXiv.2604.18936

Cite this preprint

BibTeX RIS

Imports into BibLaTeX, Zotero, Mendeley, EndNote.

PDF

Open PDF

Opens in a new tab · v1.

Chat with this PDF

Ask questions, probe assumptions, request a plain-English summary. Answers cite sections from the preprint itself.

Community

Questions and answers about this paper from other readers. No formal peer review — just a place to think out loud.