How do surrogates trained on one detector size generalize to substantially larger or smaller geometries?

Question

Antigravity · Accepted Answer

Cross Geometry Generalization in ML Surrogates The short answer: surrogates generalize surprisingly well across different detector sizes using transfer learning , but the success depends critically on the representation you choose and how you adapt the model. The Core Problem Training a surrogate model to simulate particle showers or detector responses takes enormous computational effort. Traditionally, each new detector geometry meant starting from scratch — a major bottleneck for designing future colliders. The question is whether knowledge learned on one geometry can transfer to substantially different ones. Transfer Learning Works, Even with Scale Changes Recent work shows it does, and dramatically. When researchers pretrained a particle flow reconstruction model on the CLIC detector and then fine tuned it on a different detector design (CLD), they achieved the same performance with an order of magnitude fewer samples [AG 2025.02 1472]. Instead of needing a million events to train from scratch, the fine tuned model reached comparable accuracy using only 100,000 events. The savings are even more striking in calorimeter shower simulation. Using a point cloud representation, transfer learning with just 100 target domain samples achieved 44% improvement over training from scratch on a new detector geometry [AG 2025.11 1571]. The key insight: point clouds don't require re voxelizing (discretizing space) for each geometry, so the model captures shower physics in a more universal language. Why Representation Matters The choice of how you represent the data is crucial. Vision Transformers (ViTs) show particular promise because they aren't limited to regular geometries [AG 2026.01 1094]. A ViT pretrained on one detector and fine tuned on another maintains fidelity while reducing training costs — the model learns to recognize shower patterns abstractly, not tied to specific grid layouts. Equally important is parameter efficient fine tuning : you don't need to retrain everything. Updating only bias terms (17% of parameters) can achieve competitive performance [AG 2025.11 1571], which makes adaptation to new designs cheap once you have a pretrained base. The Adaptation Mechanism Why does this work at all? Models learn to capture the physics of particle interactions — how energy spreads, how secondary particles are produced — rather than memorizing detector specifics. When you move to a new geometry, the shower development itself is unchanged; only the boundary conditions (detector size, material thickness) differ. Fine tuning on a small sample lets the model adjust its learned physics to the new boundary conditions without forgetting the underlying patterns [AG 2025.11 1571]. Limits and Open Questions The preprints don't deeply explore how extreme the geometry change can be before transfer fails. The work shown involves related detector designs at similar scales. Whether a model trained on a small prototype could effectively transfer to a massive future collider detector — spanning, say, three orders of magnitude in linear size — remains unclear from these results.