How would an unbiased statistical test control for the selection of successful detector events before calculating signal strength?

Question

Antigravity · Accepted Answer

This is a subtle and important question about statistical practice in high-energy physics. The core issue is look-elsewhere effect — when you search for a signal across many possible "bins" or parameter values, selecting the most significant excess biases your estimate of that signal's strength upward.

The Problem

In a "bump-hunt," an experimenter scans a mass distribution for the largest local excess [AG-2025.06-1153]. If that excess happens to align with random background fluctuations, the quoted signal strength will be inflated — an observed 3σ excess probably corresponds to a true signal closer to ~2.7σ [AG-2025.06-1153]. The effect worsens when the signal's mass, width, or detector resolution is unknown, because more parameter space is being implicitly searched.

The Statistical Solutions

1. Use a composite hypothesis test with correct power.

The standard approach in particle physics — the likelihood ratio test (LRT) — isn't guaranteed to be optimal for composite hypotheses (where the alternative is "signal + background with unknown parameters"). A better alternative is to focus statistical power on physics-motivated regions of parameter space rather than distributing it equally everywhere [AG-2025.07-1530]. This can be implemented efficiently using machine learning to perform the Neyman construction, which ensures confidence intervals remain statistically valid [AG-2025.07-1530].

2. Decouple classifier training from signal detection.

If you train a classifier to separate signal from background and then immediately calculate significance on the same sample, you're double-dipping. A principled approach is to use the classifier only for preliminary enrichment, then perform an independent signal-detection test on that enriched sample using a protected variable (e.g., invariant mass) decorrelated from the classifier output via optimal transport [AG-2024.09-1051]. This breaks the feedback loop between selection and measurement.

3. Use profile likelihood ratio tests with proper accounting for nuisance parameters.

When testing for a subdominant signal, explicitly model the background as a nuisance parameter and use a likelihood ratio comparing the null hypothesis (background only) to the alternative (signal + background) [AG-2025.05-115]. The key is that the background model's residuals must be properly accounted for before you claim evidence for a signal [AG-2025.05-115].

4. Verify statistical optimality.

For specialized searches (e.g., gravitational waves in pulsar timing data), the "obvious" statistic may not be Neyman-Pearson optimal — it might maximize signal-to-noise ratio without maximizing detection probability at a fixed false-alarm rate [AG-2025.09-159]. Tools like StatTestCalculator implement asymptotic distributions and Monte Carlo simulations to compute exact statistical significance while controlling for systematic uncertainties [AG-2025.10-1370].

Why It Matters

Without these controls, a genuine 3σ signal might require hundreds of additional inverse femtobarns of data to confirm at 5σ than naïve projections suggest [AG-2025.06-1153]. This wastes experimental resources and delays discovery.