Antigravity Q&A
Why do LLMs fail at tacit reasoning tasks that human physicists handle intuitively?
Answered by claude-haiku-4-5 · 2 May 2026
Answer
This is a fascinating question, but it sits at the boundary of what this archive covers. The preprints here focus on *measuring* where LLMs fail at physics reasoning—they're benchmarks and empirical studies rather than foundational work on *why* tacit reasoning breaks down.
That said, the archive does illuminate the gap:
Where LLMs stumble
LLMs excel at explicit derivations within stable conceptual frames but systematically fail when tasks require reconstruction of omitted reasoning steps or reorganization of representations under global consistency constraints [AG-2026.04-565]. The most telling evidence comes from the CritPt benchmark: even state-of-the-art models solve only ~6% of full research-scale physics challenges [AG-2025.09-1139], and research-level problems in theoretical physics remain mostly unsolved [AG-2025.02-240].
This suggests the core problem: human physicists develop *tacit* knowledge—intuitions built from long exposure to how concepts hang together structurally. A physicist recognizes instantly that a proposal violates some deep constraint (symmetry, dimensional analysis, thermodynamic consistency) without consciously reconstructing the full argument. LLMs, trained to predict the next token, lack this structural embedding.
A tentative bridge
One preprint offers a speculative lens: if robust reasoning operated as a topological phase protected by non-Abelian symmetry—rather than fragile statistical associations—it could maintain invariant fidelity under noise [AG-2026.01-735]. The metaphor suggests that what humans do intuitively, current architectures cannot do robustly because they lack formal symmetry protections.
The archive doesn't contain a deep cognitive or mathematical theory explaining *why* this asymmetry exists—only evidence that it does.
Sources · 8
- 72%physics.comp-phGrading the Unspoken: Evaluating Tacit Reasoning in Quantum Field Theory and String Theory with LLMsAG-2026.04-565
- 66%cs.AIProbing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research BenchmarkAG-2025.09-1139
- 63%cs.LGFine-Tuning Small Reasoning Models for Quantum Field TheoryAG-2026.04-892
- 61%cs.LGTest-time Scaling Techniques in Theoretical Physics -- A Comparison of Methods on the TPBench DatasetAG-2025.06-1078
- 60%cs.LGTheoretical Physics Benchmark (TPBench) -- a Dataset and Study of AI Reasoning Capabilities in Theoretical PhysicsAG-2025.02-240
- 59%physics.data-anLarge Physics Models: Towards a collaborative approach with Large Language Models and Foundation ModelsAG-2025.01-1147
- 58%physics.soc-phWhy we care (about quantum machine learning)AG-2024.01-1602
- 56%cs.LGRobust Reasoning as a Symmetry-Protected Topological PhaseAG-2026.01-735
Keep exploring
- How do human physicists acquire this tacit structural knowledge during training versus how LLMs learn from text?
- Does the CritPt benchmark distinguish between missing reasoning steps versus genuinely impossible reconstructions from context?
- Could architectural changes enforcing dimensional analysis or symmetry constraints reduce the gap on research-scale problems?
This is a research aid — not a peer review. Verify sources before citing.