Towards Worst-Case Guarantees with Scale-Aware Interpretability

Lauren Greenspan; David Berman; Aryeh Brill; Ro Jefferson; Artemy Kolchinsky; Jennifer Lin; Andrew Mack; Anindita Maiti; Fernando E. Rosas; Alexander Stapleton; Lucas Teixeira; Dmitry Vaintrob

doi:10.48550/arXiv.2602.05184

← Recent

AG-2026.02-642·hep-th·cross-listed: cond-mat.dis-nncs.AIcs.LG

Towards Worst-Case Guarantees with Scale-Aware Interpretability

Authors

Lauren Greenspan
David Berman
Aryeh Brill
Ro Jefferson
Artemy Kolchinsky
Jennifer Lin
Andrew Mack
Anindita Maiti
Fernando E. Rosas
Alexander Stapleton
Lucas Teixeira
Dmitry Vaintrob

Abstract

Neural networks organize information according to the hierarchical, multi-scale structure of natural data. Methods to interpret model internals should be similarly scale-aware, explicitly tracking how features compose across resolutions and guaranteeing bounds on the influence of fine-grained structure that is discarded as irrelevant noise. We posit that the renormalisation framework from physics can meet this need by offering technical tools that can overcome limitations of current methods. Moreover, relevant work from adjacent fields has now matured to a point where scattered research threads can be synthesized into practical, theory-informed tools. To combine these threads in an AI safety context, we propose a unifying research agenda -- \emph{scale-aware interpretability} -- to develop formal machinery and interpretability tools that have robustness and faithfulness properties supported by statistical physics.

Submitted

5 February 20262 months ago

Version

v1

License

CC-BY-4.0

DOI

10.48550/arXiv.2602.05184

Cite this preprint

BibTeX RIS

Imports into BibLaTeX, Zotero, Mendeley, EndNote.

PDF

Open PDF

Opens in a new tab · v1.

Chat with this PDF

Ask questions, probe assumptions, request a plain-English summary. Answers cite sections from the preprint itself.

Community

Questions and answers about this paper from other readers. No formal peer review — just a place to think out loud.