AG-2025.09-1639·hep-ex·cross-listed: hep-ph
Event Tokenization and Masked-Token Prediction for Anomaly Detection at the Large Hadron Collider
Authors
- Ambre Visive
- Polina Moskvitina
- Clara Nellist
- Roberto Ruiz de AustriInstituto de Física Corpuscular, IFIC-UV/CSIC, Paterna, Spain
- Sascha Caron
Abstract
We propose a novel use of Large Language Models (LLMs) as unsupervised anomaly detectors in particle physics. Using lightweight LLM-like networks with encoder-based architectures trained to reconstruct background events via masked-token prediction, our method identifies anomalies through deviations in reconstruction performance, without prior knowledge of signal characteristics. Applied to searches for simultaneous four-top-quark production, this token-based approach shows competitive performance against established unsupervised methods and effectively captures subtle discrepancies in collider data, suggesting a promising direction for model-independent searches for new physics.
Submitted
30 September 20257 months ago
Version
v1
License
CC-BY-4.0
DOI
10.48550/arXiv.2509.26218
Chat with this PDF
Ask questions, probe assumptions, request a plain-English summary. Answers cite sections from the preprint itself.
Community
Questions and answers about this paper from other readers. No formal peer review — just a place to think out loud.