What if models trained on physics derivations with all steps spelled out performed similarly to mathematics?

Question

Antigravity · Accepted Answer

Your question touches on a key assumption in AI training: whether *spelling out all the steps* in physics derivations would let models perform as well on physics as they do on mathematics. The archive suggests the answer is more subtle than "yes."

What the data shows

When LLMs work on physics problems where every step is explicit and stays within a stable conceptual frame, they do reach near-ceiling performance [AG-2026.04-565]. That's the good news. But performance degrades sharply when tasks require reconstruction of *omitted* reasoning steps or when the problem demands reorganizing representations under global consistency constraints—the kinds of things physicists do routinely but don't always write down [AG-2026.04-565].

This difference is crucial. A mathematics derivation can be a self-contained chain: step A → step B → step C → answer. A physics derivation often involves *implicit structural constraints*—conservation laws that shape what's allowed, symmetries that constrain how terms can combine, physical intuition about which approximations are valid in which regimes. Writing out "all steps" in the mechanical sense doesn't capture these.

Why this matters for training

When researchers fine-tuned smaller models (7 billion parameters) on synthetic QFT problems with full derivations spelled out, they could improve performance measurably [AG-2026.04-892]. However, even the best current models solve only about 6% of full research-scale physics challenges [AG-2025.09-1139]—compared to stronger performance on mathematics at similar difficulty levels. The gap isn't simply that steps are missing; it's that physics requires integrating conceptual knowledge across a problem in ways that step-by-step mechanics alone doesn't teach.

One promising direction: using symbolic verification to check intermediate steps in physics reasoning, which outperforms pure scaling approaches [AG-2025.06-1078]. This works because it forces the model to justify each step against the actual structure of the problem, not just pattern-match to training examples.

The short answer: fully explicit derivations help, but don't close the performance gap between physics and mathematics, because physics reasoning involves tacit constraints that written steps alone can't encode.