Could training on adversarial examples actively hurt sensitivity to rare but physically plausible BSM signals?

Question

Antigravity · Accepted Answer

This is a sharp question that cuts to a real tension in modern machine learning for physics. Let me unpack what the archive shows.

The core risk is real, but subtle.

You're asking whether hardening a model against adversarial noise might inadvertently blind it to genuine rare signals. The archive addresses this from two angles:

First, adversarial training can *improve* generalization to real data. White-box adversarial attacks—which perturb the model's weights or inputs to stress-test it—actually help supervised classifiers generalize better to unseen data by flattening sharp local minima in the loss landscape [AG-2024.11-1226]. The intuition is that a model trapped in a sharp minimum has learned brittle, simulation-specific artifacts. Adversarial robustness forces the model to find smoother solutions that transfer better. This *should* help it catch physically plausible signals outside its training distribution.

But here's the catch: that study focused on *weight-space* robustness in a supervised setting (Higgs classification). A rarer signal in a different channel or parameter space is a different beast.

The real vulnerability: anomaly detection without signal awareness.

The sharper concern emerges from work on unsupervised anomaly detection [AG-2025.05-1524]. Semi-supervised methods (autoencoders, isolation forests, etc.) are agnostic to what signal you're hunting—they just learn to flag deviations from the Standard Model background. The problem: their sensitivity to rare BSM signals depends heavily on untunable hyperparameters, and the paper found no single hyperparameter choice optimizes sensitivity across all BSM benchmarks. If you adversarially harden such a model, you risk *narrowing its anomaly detection surface* to only the background morphology—making it less likely to catch truly unexpected topologies.

The solution the archive points to: signal awareness.

The most relevant paper here is [AG-2026.03-1712]. Rather than pure anomaly detection, they train a contrastive latent space on *both* SM background *and* a diverse set of hypothesized BSM signals. The result: the model stays sensitive to unseen signals (through extrapolation to novel topologies) *while* being hardened against simulation artifacts. Crucially, they show retention of sensitivity to BSM models not in the training set. This suggests that adversarial robustness + signal awareness is synergistic, not antagonistic.

Bottom line:

Adversarial training on background alone could hurt sensitivity to rare signals if it overshoots and erases the model's ability to recognize genuinely anomalous event shapes. But adversarial training applied *within a signal-aware framework*—where you've explicitly shown the model diverse BSM target morphologies—appears to enhance both robustness and sensitivity simultaneously [AG-2024.11-1226, AG-2026.03-1712]. The risk is real mainly for pure anomaly detection; supervised or semi-supervised approaches that embed signal diversity seem to dodge it.