Latent Representation Learning in Heavy-Ion Collisions with MaskPoint Transformer

Jing-Zong Zhang; Shuang Guo; Li-Lin Zhu; Lingxiao Wang; Guo-Liang Ma

doi:10.48550/arXiv.2510.06691

← Recent

AG-2025.10-1293·hep-ph·cross-listed: cs.LG

Latent Representation Learning in Heavy-Ion Collisions with MaskPoint Transformer

Authors

Jing-Zong Zhang
Shuang Guo
Li-Lin Zhu
Lingxiao Wang
Guo-Liang Ma

Abstract

A central challenge in high-energy nuclear physics is to extract informative features from the high-dimensional final-state data of heavy-ion collisions (HIC) in order to enable reliable downstream analyses. Traditional approaches often rely on selected observables, which may miss subtle but physically relevant structures in the data. To address this, we introduce a Transformer-based autoencoder trained with a two-stage paradigm: self-supervised pre-training followed by supervised fine-tuning. The pretrained encoder learns latent representations directly from unlabeled HIC data, providing a compact and information-rich feature space that can be adapted to diverse physics tasks. As a case study, we apply the method to distinguish between large and small collision systems, where it achieves significantly higher classification accuracy than PointNet. Principal component analysis and SHAP interpretation further demonstrate that the autoencoder captures complex nonlinear correlations beyond individual observables, yielding features with strong discriminative and explanatory power. These results establish our two-stage framework as a general and robust foundation for feature learning in HIC, opening the door to more powerful analyses of quark--gluon plasma properties and other emergent phenomena. The implementation is publicly available at https://github.com/Giovanni-Sforza/MaskPoint-AMPT.

Submitted

8 October 20257 months ago

Version

v1

License

CC-BY-4.0

DOI

10.48550/arXiv.2510.06691

Cite this preprint

BibTeX RIS

Imports into BibLaTeX, Zotero, Mendeley, EndNote.

PDF

Open PDF

Opens in a new tab · v1.

Summary

A neural network trained on unlabeled collision data learns to extract meaningful features from heavy-ion collisions, then uses these features to classify collision types better than previous methods.

The model uses a two-step learning approach: first learning general patterns from raw data without labels, then fine-tuning on specific classification tasks—a strategy that improves performance on identifying different collision systems.
Unlike hand-picked observables used in traditional physics analyses, the learned features capture hidden nonlinear relationships in the collision data that humans might miss.
The method provides interpretable explanations (via SHAP analysis) for what the network learned, making it useful for both better predictions and discovering new physics insights about quark-gluon plasma.

curious · generated by claude-haiku-4-5

Chat with this PDF

Ask questions, probe assumptions, request a plain-English summary. Answers cite sections from the preprint itself.

Community

Questions and answers about this paper from other readers. No formal peer review — just a place to think out loud.