Would deliberately training on adversarially chosen signal classes improve robustness to truly unexpected beyond-Standard-Model physics?

Question

Antigravity · Accepted Answer

This is a sharp question that cuts to the heart of a real tension in machine learning for physics discovery.

The short answer is yes, but with important caveats [AG-2024.11-1226]. Adversarial training does improve robustness—but the benefit is limited to variations *near* the training distribution, not to truly orthogonal new physics.

Why Adversarial Training Helps

When you train a classifier on Monte Carlo simulations of a specific signal, it inevitably learns subtle artifacts: approximations in the shower algorithm, biases in the event generator, even numerical quirks. A vanilla supervised model becomes *brittle* to real data that deviates slightly from these simulations [AG-2024.11-1226]. White-box adversarial attacks—where you deliberately craft inputs that fool the classifier—force the model to flatten its decision boundary and ignore those artifacts. The result: better generalization to slightly-different data.

This matters because it means the model relies less on simulation artifacts and more on genuine physical signatures [AG-2024.11-1226].

The Hard Limit: Truly Unexpected Physics

However, adversarially training on *chosen* signal classes only helps if the truly new physics is somewhat similar to what you trained against. The archive contains multiple studies showing that model-independent anomaly detection (training only on Standard Model backgrounds, with no signal assumption) is necessary for discovering truly unexpected BSM physics [AG-2025.05-1524], [AG-2024.09-1051], [AG-2024.06-1375].

A more sophisticated approach is signal-aware contrastive learning: train on *multiple diverse* hypothesized BSM scenarios simultaneously, which builds a latent space sensitive to broad classes of new physics. The key finding is that this method "retains sensitivity to BSM models not present during training" through interpolation and extrapolation [AG-2026.03-1712]. So diversity of adversarial classes matters more than perfection at any one class.

The Practical Synthesis

The most robust strategy appears to be layering approaches: use adversarial training (or multi-background representation learning) to robustify against simulation artifacts [AG-2024.01-1031], then apply model-agnostic anomaly detection or decorrelated classifiers to catch the genuinely unexpected [AG-2024.04-1123]. One study proposes combining multiple unsupervised anomaly detectors (DeepSVDD and DDD) precisely because no single approach works uniformly across all signals [AG-2024.06-1375].

In short: adversarial training on chosen signals improves robustness within a neighborhood of those signals, but discovering truly beyond-Standard-Model physics requires staying blind to what you're looking for.