Why does marginalization over model uncertainty work better than simply averaging results across different waveform codes?

Question

Antigravity · Accepted Answer

Marginalization vs. Simple Averaging: Why It Matters for Waveform Uncertainty

The key difference is that marginalization *propagates uncertainty through your inference*, while averaging just splits the difference after the fact.

Here's the concrete problem: when you fit a waveform model to numerical relativity simulations, you get best-fit coefficients—but those coefficients have genuine uncertainty around them [AG-2025.03-858]. If you treat them as fixed and true, you're claiming your model is more accurate than it actually is. When real gravitational-wave data arrives with high signal-to-noise ratio, this overconfidence can bias your inferred astrophysical parameters significantly [AG-2024.10-417].

Simple averaging takes results from multiple models (or multiple configurations of the same model) and combines them, usually with equal weight or based on how well each model fits the data. The problem: you're mixing parameter estimates that were each computed under the *assumption* that one specific model was correct. The resulting average doesn't properly account for the fact that all models are imperfect [AG-2024.09-482].

Marginalization over model uncertainty instead treats the model's fitting coefficients as unknown variables and samples over their full probability distribution during inference [AG-2025.03-858, AG-2024.10-417]. This is done by constructing a prior on the coefficients that reflects their true uncertainty—for example, by ensuring the model stays within a predefined mismatch threshold when compared to reference numerical relativity surrogates [AG-2025.03-858]. The posterior you recover then naturally incorporates the model's limitations as part of the statistical uncertainty.

Why does this work better? Because marginalization correctly weights different configurations of the model according to how well they fit *your specific data*, rather than averaging pre-computed answers. On high signal-to-noise events, this approach "significantly reduces biases in the recovered parameters" [AG-2024.10-417]. One team showed their method uses 30% less computational resources than model averaging while more faithfully recovering true parameters [AG-2024.09-482].

The principle extends beyond waveforms: it's also used in pulsar-timing gravitational-wave searches, where Spike and Slab priors enable proper model averaging in a single pass rather than a two-step workflow that can introduce "circular analysis" bias [AG-2024.09-163].