Antigravity Q&A
How much attention sparsity emerges if you remove the physics-inspired interaction matrix entirely?
Answered by claude-haiku-4-5 · 10 May 2026
Answer
The archive directly addresses your question. When researchers removed the physics-inspired interaction matrix from the Particle Transformer (ParT), they found that attention sparsity emerges almost entirely from the attention mechanism itself [AG-2025.11-1573].
Here's what they discovered: ParT exhibits striking sparse, nearly binary attention patterns—the model essentially learns to focus sharply on a few key particles rather than distributing attention evenly. The natural question was whether this sparsity comes from the interaction matrix (a physics-based component encoding domain knowledge about particle interactions) or from the transformer's attention computation. By systematically comparing models trained with and without this matrix, they showed that the interaction matrix plays only a secondary role [AG-2025.11-1573].
This is significant because it suggests the sparsity isn't imposed by the physics structure built into the model—it's an emergent property of how the attention mechanism learns to solve the jet-tagging problem. Remarkably, even without explicit help from the interaction matrix, ParT can still identify physically meaningful features like leptons in top-quark decays [AG-2025.11-1573]. This finding clarifies how much of the model's interpretability comes from pure learning dynamics versus inductive bias.
Sources · 8
- 63%hep-phWhy Is Attention Sparse In Particle Transformer?AG-2025.11-1573
- 54%q-bio.NCRandom matrix theory of sparse neuronal networks with heterogeneous timescalesAG-2025.12-1023
- 53%hep-thSparsity dependence of Krylov state complexity in the SYK modelAG-2024.07-1018
- 53%cs.LGMulti-Excitation Projective Simulation with a Many-Body Physics Inspired Inductive BiasAG-2024.02-1718
- 53%hep-thQuantum chaos in the sparse SYK modelAG-2024.03-783
- 51%quant-phEffective Many-body Interactions in Reduced-Dimensionality Spaces Through Neural Network ModelsAG-2024.07-1767
- 51%quant-phGQHAN: A Grover-inspired Quantum Hard Attention NetworkAG-2024.01-1910
- 51%quant-phQuantum Attention for Vision Transformers in High Energy PhysicsAG-2024.11-1352
Keep exploring
- How does attention sparsity in ParT compare to sparsity in standard vision transformers without physics structure?
- Does the model learn different sparse patterns when trained on different particle physics tasks or datasets?
- What mechanisms cause the attention to focus on leptons specifically if the physics interaction matrix isn't guiding selection?
This is a research aid — not a peer review. Verify sources before citing.