Antigravity Q&A
What if models trained on physics derivations with all steps spelled out performed similarly to mathematics?
Answered by claude-haiku-4-5 · 4 May 2026
Answer
Your question touches on a key assumption in AI training: whether *spelling out all the steps* in physics derivations would let models perform as well on physics as they do on mathematics. The archive suggests the answer is more subtle than "yes."
What the data shows
When LLMs work on physics problems where every step is explicit and stays within a stable conceptual frame, they do reach near-ceiling performance [AG-2026.04-565]. That's the good news. But performance degrades sharply when tasks require reconstruction of *omitted* reasoning steps or when the problem demands reorganizing representations under global consistency constraints—the kinds of things physicists do routinely but don't always write down [AG-2026.04-565].
This difference is crucial. A mathematics derivation can be a self-contained chain: step A → step B → step C → answer. A physics derivation often involves *implicit structural constraints*—conservation laws that shape what's allowed, symmetries that constrain how terms can combine, physical intuition about which approximations are valid in which regimes. Writing out "all steps" in the mechanical sense doesn't capture these.
Why this matters for training
When researchers fine-tuned smaller models (7 billion parameters) on synthetic QFT problems with full derivations spelled out, they could improve performance measurably [AG-2026.04-892]. However, even the best current models solve only about 6% of full research-scale physics challenges [AG-2025.09-1139]—compared to stronger performance on mathematics at similar difficulty levels. The gap isn't simply that steps are missing; it's that physics requires integrating conceptual knowledge across a problem in ways that step-by-step mechanics alone doesn't teach.
One promising direction: using symbolic verification to check intermediate steps in physics reasoning, which outperforms pure scaling approaches [AG-2025.06-1078]. This works because it forces the model to justify each step against the actual structure of the problem, not just pattern-match to training examples.
The short answer: fully explicit derivations help, but don't close the performance gap between physics and mathematics, because physics reasoning involves tacit constraints that written steps alone can't encode.
Sources · 8
- 64%physics.comp-phGrading the Unspoken: Evaluating Tacit Reasoning in Quantum Field Theory and String Theory with LLMsAG-2026.04-565
- 62%cs.LGFine-Tuning Small Reasoning Models for Quantum Field TheoryAG-2026.04-892
- 60%physics.data-anLarge Physics Models: Towards a collaborative approach with Large Language Models and Foundation ModelsAG-2025.01-1147
- 59%physics.data-anTowards a Large Physics BenchmarkAG-2025.07-1634
- 59%cs.AIProbing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research BenchmarkAG-2025.09-1139
- 58%cs.LGTest-time Scaling Techniques in Theoretical Physics -- A Comparison of Methods on the TPBench DatasetAG-2025.06-1078
- 58%hep-phFoundation models for equation discovery in high energy physicsAG-2025.10-1233
- 58%cs.LGTheoretical Physics Benchmark (TPBench) -- a Dataset and Study of AI Reasoning Capabilities in Theoretical PhysicsAG-2025.02-240
Keep exploring
- How do implicit symmetry constraints specifically prevent models from solving physics problems step-by-step correctly?
- What happens if models learn to verify each step symbolically before proceeding to the next one?
- Could physics performance improve if training included problems where students must explicitly state hidden assumptions?
This is a research aid — not a peer review. Verify sources before citing.