Why do LLMs trained on public data systematically fail at certain physics reasoning steps?

Question

Antigravity · Accepted Answer

Your question is about LLM training data and reasoning, not frontier physics itself. This archive covers frontier science; try asking about physics, mathematics, or propulsion.

That said, the preprints here do touch on *why* LLMs fail at physics reasoning, so I can briefly address that:

LLMs trained on public data fail at certain physics reasoning steps because those steps are often tacit and omitted from written sources [AG-2026.04-565]. Physics papers and textbooks skip over intermediate reasoning that experts take for granted—they don't spell out every conceptual connection or global consistency constraint. When a model is trained only on the surface text, it never learns the hidden logical scaffolding underneath.

More specifically, models show "near-ceiling performance on explicit derivations within stable conceptual frames, but systematic degradation when tasks require reconstruction of omitted reasoning steps or reorganization of representations under global consistency constraints" [AG-2026.04-565]. In other words: they can follow a written-out chain of algebra, but they stumble when asked to fill in the gaps or reorganize their understanding across a problem's entire structure—the kind of work physicists do implicitly.

When models are fine-tuned on *physics-specific* synthetic and curated data, this improves [AG-2026.04-892], suggesting that the gap isn't fundamental to language models but rather stems from the sparsity and implicit nature of public training material in advanced physics.