Statistical divergences in high-dimensional hypothesis testing and a modern technique for estimating them

Jeremy J. H. Wilkinson; Christopher G. Lester

doi:10.48550/arXiv.2405.06397

← Recent

AG-2024.05-1213·physics.data-an·cross-listed: hep-exhep-phmath.ST

Statistical divergences in high-dimensional hypothesis testing and a modern technique for estimating them

Authors

Jeremy J. H. Wilkinson
Christopher G. Lester

Abstract

Hypothesis testing in high dimensional data is a notoriously difficult problem without direct access to competing models' likelihood functions. This paper argues that statistical divergences can be used to quantify the difference between the population distributions of observed data and competing models, justifying their use as the basis of a hypothesis test. We go on to point out how modern techniques for functional optimization let us estimate many divergences, without the need for population likelihood functions, using samples from two distributions alone. We use a physics-based example to show how the proposed two-sample test can be implemented in practice, and discuss the necessary steps required to mature the ideas presented into an experimental framework. The code used has been made available for others to use.

Submitted

10 May 20241 year ago

Version

v1

License

CC-BY-4.0

DOI

10.48550/arXiv.2405.06397

Cite this preprint

BibTeX RIS

Imports into BibLaTeX, Zotero, Mendeley, EndNote.

PDF

Open PDF

Opens in a new tab · v1.

Chat with this PDF

Ask questions, probe assumptions, request a plain-English summary. Answers cite sections from the preprint itself.

Community

Questions and answers about this paper from other readers. No formal peer review — just a place to think out loud.