AG-2024.05-1213·physics.data-an·cross-listed: hep-exhep-phmath.ST
Statistical divergences in high-dimensional hypothesis testing and a modern technique for estimating them
Authors
- Jeremy J. H. Wilkinson
- Christopher G. Lester
Abstract
Hypothesis testing in high dimensional data is a notoriously difficult problem without direct access to competing models' likelihood functions. This paper argues that statistical divergences can be used to quantify the difference between the population distributions of observed data and competing models, justifying their use as the basis of a hypothesis test. We go on to point out how modern techniques for functional optimization let us estimate many divergences, without the need for population likelihood functions, using samples from two distributions alone. We use a physics-based example to show how the proposed two-sample test can be implemented in practice, and discuss the necessary steps required to mature the ideas presented into an experimental framework. The code used has been made available for others to use.
Submitted
10 May 20241 year ago
Version
v1
License
CC-BY-4.0
DOI
10.48550/arXiv.2405.06397
Chat with this PDF
Ask questions, probe assumptions, request a plain-English summary. Answers cite sections from the preprint itself.
Community
Questions and answers about this paper from other readers. No formal peer review — just a place to think out loud.