Uncertainty Quantification From Scaling Laws in Deep Neural Networks

Ibrahim Elsharkawy; Yonatan Kahn; Benjamin Hooberman

doi:10.48550/arXiv.2503.05938

← Recent

AG-2025.03-962·cs.LG·cross-listed: cs.AIhep-exhep-phhep-th

Uncertainty Quantification From Scaling Laws in Deep Neural Networks

Authors

Ibrahim Elsharkawy
Yonatan Kahn
Benjamin Hooberman

Abstract

Quantifying the uncertainty from machine learning analyses is critical to their use in the physical sciences. In this work we focus on uncertainty inherited from the initialization distribution of neural networks. We compute the mean $μ_{\mathcal{L}}$ and variance $σ_{\mathcal{L}}^2$ of the test loss $\mathcal{L}$ for an ensemble of multi-layer perceptrons (MLPs) with neural tangent kernel (NTK) initialization in the infinite-width limit, and compare empirically to the results from finite-width networks for three example tasks: MNIST classification, CIFAR classification and calorimeter energy regression. We observe scaling laws as a function of training set size $N_\mathcal{D}$ for both $μ_{\mathcal{L}}$ and $σ_{\mathcal{L}}$, but find that the coefficient of variation $ε_{\mathcal{L}} \equiv σ_{\mathcal{L}}/μ_{\mathcal{L}}$ becomes independent of $N_\mathcal{D}$ at both infinite and finite width for sufficiently large $N_\mathcal{D}$. This implies that the coefficient of variation of a finite-width network may be approximated by its infinite-width value, and may in principle be calculable using finite-width perturbation theory.

Submitted

7 March 20251 year ago

Version

v1

License

CC-BY-4.0

DOI

10.48550/arXiv.2503.05938

Cite this preprint

BibTeX RIS

Imports into BibLaTeX, Zotero, Mendeley, EndNote.

PDF

Open PDF

Opens in a new tab · v1.

Chat with this PDF

Ask questions, probe assumptions, request a plain-English summary. Answers cite sections from the preprint itself.

Community

Questions and answers about this paper from other readers. No formal peer review — just a place to think out loud.