Why does broader prior reduce evidence more than it increases model flexibility?

Question

Antigravity · Accepted Answer

Your question touches on a fundamental tension in Bayesian inference: the prior penalty. Let me unpack why a broader prior can *hurt* evidence even though it seems like it should help.

The Core Tension

When you widen your prior, you're saying "the parameter could plausibly be almost anywhere in a larger region." But Bayesian evidence—the probability of your *data* given a model—integrates over that entire region. A broader prior spreads the same total probability mass over a larger space, which dilutes it [AG-2026.04-1269].

Here's the key insight: evidence rewards *predictive power*, not flexibility. If your model makes a narrow, specific prediction that the data then confirms, that's strong evidence. But if your model could have predicted almost anything (because the prior is broad), then the fact that it happened to match the data is less surprising—and thus counts as weaker evidence.

Think of it like a weather forecaster. A forecaster who predicts "tomorrow will be between −50°C and +50°C" is so vague that any actual temperature is unsurprising. A forecaster who predicts "tomorrow will be 22–24°C" has made a much bolder claim, so matching that narrow prediction is genuinely informative.

The Automatic Occam's Razor

This is not an accident—it's a *feature* of Bayesian inference called the "automatic Occam's razor" [AG-2026.04-1269]. Bayesian formalism naturally penalizes models that require fine-tuned parameter values to match observations. A broader prior effectively requires more fine-tuning in parameter space to land on the observed data, because the prior "volume" is larger.

Mathematically, evidence involves integrating the likelihood (how well the model fits) weighted by the prior over the whole parameter space. A model with a very broad prior has to do much better in the likelihood to compete with a narrower model, because most of its prior volume is "wasted" on regions that don't matter.

A Real Example

This plays out in cosmology. When dark energy models use uniform priors on equation-of-state parameters with no physical justification, the broader prior dilutes evidence for time-varying dark energy [AG-2025.09-1398]. Switching to theory-informed priors (ones shaped by realistic quintessence models) actually *sharpens* the evidence for or against the model, because the priors are no longer arbitrarily broad [AG-2025.09-1398]. The tension with a cosmological constant decreased from 3.1σ to 1.3σ once theory-informed priors replaced uniform ones—the data didn't change, but the prior-penalty effect shifted the conclusion.

Why Flexibility Doesn't Save You

You might hope that "flexibility" (the ability to fit many datasets) would help. But in Bayesian inference, flexibility is a *liability* for evidence unless it's justified by theory [AG-2026.04-1269]. A flexible model with a broad prior will fit almost any dataset tolerably well—but that generality makes each particular fit less impressive. The evidence doesn't reward "could have fit that" scenarios; it rewards "did predict that tightly."

The only way to gain real evidence is either:

1. Narrow your prior on principled grounds (e.g., using theory-informed priors), or

2. Let the data be so constraining that even a broad prior can't dilute the evidence (this requires very high-quality observations).

The provided preprints address this in quantum state reconstruction [AG-2024.06-1565] and cosmological model selection [AG-2025.09-1398, AG-2025.12-612], all converging on the same lesson: broader priors weaken evidence unless they're balanced by a proportionally better likelihood fit.