Calibrating Bayesian Generative Machine Learning for Bayesiamplification

Sebastian Bieringer; Sascha Diefenbacher; Gregor Kasieczka; Mathias Trabs

doi:10.48550/arXiv.2408.00838

← Recent

AG-2024.08-911·cs.LG·cross-listed: cs.AIhep-ph

Calibrating Bayesian Generative Machine Learning for Bayesiamplification

Authors

Sebastian Bieringer
Sascha Diefenbacher
Gregor Kasieczka
Mathias Trabs

Abstract

Recently, combinations of generative and Bayesian machine learning have been introduced in particle physics for both fast detector simulation and inference tasks. These neural networks aim to quantify the uncertainty on the generated distribution originating from limited training statistics. The interpretation of a distribution-wide uncertainty however remains ill-defined. We show a clear scheme for quantifying the calibration of Bayesian generative machine learning models. For a Continuous Normalizing Flow applied to a low-dimensional toy example, we evaluate the calibration of Bayesian uncertainties from either a mean-field Gaussian weight posterior, or Monte Carlo sampling network weights, to gauge their behaviour on unsteady distribution edges. Well calibrated uncertainties can then be used to roughly estimate the number of uncorrelated truth samples that are equivalent to the generated sample and clearly indicate data amplification for smooth features of the distribution.

Submitted

1 August 20241 year ago

Version

v1

License

CC-BY-4.0

DOI

10.48550/arXiv.2408.00838

Cite this preprint

BibTeX RIS

Imports into BibLaTeX, Zotero, Mendeley, EndNote.

PDF

Open PDF

Opens in a new tab · v1.

Summary

This paper develops a method to check whether Bayesian generative neural networks correctly quantify their own uncertainty about generated data, showing how well-calibrated uncertainties can reveal when synthetic data effectively amplifies training information.

The authors propose a clear calibration scheme for Bayesian generative models, addressing a fuzzy concept that has lacked rigorous definition in the field.
They test two uncertainty quantification approaches (mean-field approximation vs. full Monte Carlo sampling) on normalizing flows, finding both capture different aspects of distribution uncertainty.
Proper calibration lets practitioners estimate how much real training data a batch of synthetic samples is worth, directly revealing which parts of a distribution are genuinely amplified versus merely memorized.

curious · generated by claude-haiku-4-5

Chat with this PDF

Ask questions, probe assumptions, request a plain-English summary. Answers cite sections from the preprint itself.

Community

Questions and answers about this paper from other readers. No formal peer review — just a place to think out loud.