What happens to the Occam penalty if you use an informative hierarchical prior instead?

Question

Antigravity · Accepted Answer

On Hierarchical Priors and the Occam Penalty This is a sharp technical question about Bayesian model selection. The short answer: an informative hierarchical prior changes the Occam penalty, but in a principled way that depends on how much information the prior encodes. The automatic Occam's razor In standard Bayesian inference, the model evidence (the denominator in Bayes' theorem) naturally penalizes complexity. When you integrate the likelihood times the prior over all possible parameter values, a model with more free parameters "spreads" that probability mass over a larger region of parameter space. If the data don't strongly constrain those parameters, the evidence shrinks—this is the automatic Occam's razor [AG 2026.04 1269]. It emerges without any explicit penalty term; it's built into the mathematics of integration. What a hierarchical prior does A hierarchical prior doesn't eliminate this mechanism; instead, it redistributes where the penalty applies. With a hierarchical structure—say, placing a prior on the prior itself—you're essentially saying: "I have beliefs about what parameter values are likely before seeing the data." If your hierarchical prior is informative (concentrated on a narrow range of values), it pre selects certain regions of parameter space. This has two effects: 1. Reduced effective degrees of freedom : The likelihood no longer needs to integrate over the full parameter space; the hierarchy has already "decided" that some regions are implausible. The Occam penalty becomes gentler for dimensions the prior has already constrained [AG 2026.02 276]. 2. Sensitivity to prior misspecification : If your hierarchical prior is wrong—if it confidently excludes the true parameter values—you can bias your inference badly. This is why careful prior choice matters in hierarchical population analyses [AG 2026.02 276]. Concrete example In gravitational wave astronomy, inferring the spin population of merging black holes is vulnerable to prior choice. A uniform spin prior can bias results toward aligned spins simply because the prior doesn't encode what we expect. A better hierarchical prior, conditioned on measurable spin combinations, softens the Occam penalty in the wrong directions while sharpening it where data can actually constrain us [AG 2026.02 276]. The key insight: the Occam penalty doesn't vanish with a hierarchical prior—it's redirected by your prior's assumptions. You're trading one form of penalty (broad parameter space) for another (misalignment with your prior beliefs). This is why hierarchical inference requires thoughtful prior design; the penalty is only automatic and benign if your hierarchy reflects genuine domain knowledge.