Can language models solve quantum field theory problems better than classical physics ones?

Question

Antigravity · Accepted Answer

Your question asks whether language models have an inherent advantage with quantum field theory (QFT) over classical physics — a natural hypothesis, since QFT is more abstract and symbol-heavy. The archive suggests the opposite.

LLMs struggle more with QFT, not less

The clearest finding comes from a direct evaluation: when researchers tested modern LLMs on core QFT and string theory problems using a careful five-level grading rubric, they found "near-ceiling performance on explicit derivations within stable conceptual frames, but systematic degradation when tasks require reconstruction of omitted reasoning steps or reorganization of representations under global consistency constraints" [AG-2026.04-565]. In plain terms, models do well when steps are written out, but fail when they must infer hidden reasoning — which is precisely what QFT demands.

Similarly, a benchmark of AI reasoning across theoretical physics found that "research-level difficulty problems are mostly unsolved," especially in high-energy theory and cosmology [AG-2025.02-240]. This suggests QFT's deep conceptual layers remain inaccessible to current models.

Why QFT is harder than classical physics for LLMs

QFT requires what the archive calls "tacit step reconstruction" — filling in reasoning that physicists omit because it's implicit in the field's culture. A calculation in classical mechanics can often be written linearly; QFT problems demand you hold multiple global consistency constraints in mind (renormalization, causality, symmetries) and reshape the entire approach if one fails [AG-2026.04-565]. Models trained on next-token prediction excel at pattern completion, not constraint-satisfaction across a solution space.

Models fine-tuned specifically on QFT do improve [AG-2026.04-892], but they still struggle with generalization and principled reasoning [AG-2025.11-1583]. This hints that QFT knowledge in LLMs remains shallow — more statistical pattern than deep understanding.

One caveat

These preprints focus on *symbolic reasoning* in QFT (derivations, conceptual problems). LLMs might actually excel at *data-driven* tasks in particle physics — for example, they show promise in analyzing cosmological data [AG-2025.03-1375] or optimizing quantum experiment design [AG-2024.06-1530]. But classical physics problems aren't typically framed that way in benchmarks, so we can't yet compare fairly.

Bottom line: Language models solve classical physics problems better than QFT ones, despite QFT's symbolic character — likely because QFT's tacit reasoning and global constraints are precisely what LLMs find hardest.