What happens to Bayes factors if the two models absorb the same systematic differently?

Question

Antigravity · Accepted Answer

On Bayes Factors When Systematics Are Absorbed Across Model Components This is a subtle and important question—and the archive has a direct answer. When two competing models absorb the same systematic effect differently (say, one pushes it into a foreground model, another into a signal model), the Bayes factor can become unreliable as a discriminator. Here's why: The Bayes factor compares the overall likelihood each model assigns to the data. If both models fit the data equally well in aggregate—because they've simply redistributed where the systematic lives—they will have nearly identical marginal likelihoods, yielding a Bayes factor close to 1. But that agreement masks a critical problem: the posterior estimates for individual components can be badly biased in one or both models, even though the fit is good [AG 2025.02 241]. This is the essence of what researchers call "category II" model comparison: models with accurate and predictive components may not be separable through standard Bayesian comparison, because interactions between components allow systematics to hide [AG 2025.02 241]. A minor foreground mismodeling, for example, can leak into signal recovery, and the Bayes factor won't catch it. Why this matters: You could publish a result claiming two models are equally probable, when in fact one gives you a biased answer for the quantity you actually care about. The fix: The BaNTER (Bayesian Null Test Evidence Ratio) framework addresses this by validating each model component separately—testing whether accurate components remain separable—before trusting the composite model comparison [AG 2025.02 241]. Alternatively, you can explicitly model theory or systematic errors as part of your Bayesian framework, which improves parameter recovery and lets you weight models by their demonstrated accuracy rather than marginal likelihood alone [AG 2024.09 482].