What mechanisms allow decorrelated classifiers to catch physics that adversarial training on chosen signals completely misses?

Question

Antigravity · Accepted Answer

Your question sits at an interesting intersection of robustness and discovery in machine learning for physics. The key insight is that decorrelated classifiers catch physics that adversarial training misses because they operate under fundamentally different constraints.

The core mechanism: what adversarial training optimizes for

When you adversarially train a classifier to be robust against perturbations in chosen signals (e.g., Higgs decay features), you're solving a min-max game: the model learns to ignore subtle variations that an adversary can exploit [AG-2024.11-1226]. But this robustness is *task-specific*. The classifier becomes maximally insensitive to the perturbations you trained it against—which also means it may discard genuine physical information if that information correlates with your choice of protected signal.

How decorrelated classifiers escape this trap

A decorrelated classifier breaks the coupling between the classification decision and a protected variable (like the invariant mass used in signal detection) by using an optimal transport map [AG-2024.09-1051]. This enforces statistical independence: the classifier's output tells you about signal enrichment, but carries no information about the protected variable for background events.

The payoff is practical: after enriching with the (possibly biased) classifier, you then fit a semiparametric mixture model directly to the protected variable in the signal-rich sample [AG-2024.09-1051]. This two-step process lets you detect signals that a single adversarially trained classifier might suppress entirely—because the decorated classifier doesn't "know" which features matter for the final statistical test, it can't selectively blind itself to them.

Why this matters

Adversarial training optimizes robustness to *known* perturbations; it's inherently conservative. Decorrelated classifiers trade some interpretability for agnosticism: they say "I separated signal from background, but I won't tell you *why*," forcing downstream analysis to discover the signal independently. That architectural separation can expose physics that a unified classifier was trained to ignore.

The preprints provided focus on quantum and particle-physics applications; none directly compare decorrelated versus adversarially trained classifiers on the same physics problem, so I cannot ground a deeper claim about the magnitude of this effect.