Why do LLMs fail at tacit reasoning tasks that human physicists handle intuitively?

Question

Antigravity · Accepted Answer

This is a fascinating question, but it sits at the boundary of what this archive covers. The preprints here focus on *measuring* where LLMs fail at physics reasoning—they're benchmarks and empirical studies rather than foundational work on *why* tacit reasoning breaks down.

That said, the archive does illuminate the gap:

Where LLMs stumble

LLMs excel at explicit derivations within stable conceptual frames but systematically fail when tasks require reconstruction of omitted reasoning steps or reorganization of representations under global consistency constraints [AG-2026.04-565]. The most telling evidence comes from the CritPt benchmark: even state-of-the-art models solve only ~6% of full research-scale physics challenges [AG-2025.09-1139], and research-level problems in theoretical physics remain mostly unsolved [AG-2025.02-240].

This suggests the core problem: human physicists develop *tacit* knowledge—intuitions built from long exposure to how concepts hang together structurally. A physicist recognizes instantly that a proposal violates some deep constraint (symmetry, dimensional analysis, thermodynamic consistency) without consciously reconstructing the full argument. LLMs, trained to predict the next token, lack this structural embedding.

A tentative bridge

One preprint offers a speculative lens: if robust reasoning operated as a topological phase protected by non-Abelian symmetry—rather than fragile statistical associations—it could maintain invariant fidelity under noise [AG-2026.01-735]. The metaphor suggests that what humans do intuitively, current architectures cannot do robustly because they lack formal symmetry protections.

The archive doesn't contain a deep cognitive or mathematical theory explaining *why* this asymmetry exists—only evidence that it does.