Can you use clustering results to constrain physics models, or mainly just flag potential anomalies for follow-up?

Question

Antigravity · Accepted Answer

Clustering results can do both, but they're particularly powerful for *constraining* physics models when combined with forward modeling—not just flagging anomalies.

Flagging anomalies: the traditional strength

Clustering excels at the anomaly-detection task. The k-means approach in Cluster Scanning [AG-2024.02-1209] uses clustering to separate potentially anomalous clusters from background, then tests whether a resonant signal appears in the enriched region. This is model-independent: you don't assume what new physics looks like beforehand. Similarly, unsupervised methods are "ideal for anomaly detection tasks" because you train on background data and let the algorithm find deviations [AG-2024.03-1202]. The ARGOS metric addresses a practical follow-up question: how do you pick the *best* anomaly detector for your data? [AG-2025.11-1368].

Constraining models: the emerging frontier

But clustering goes further when you embed it in a predictive framework. In cosmology, galaxy clustering patterns directly constrain competing models of galaxy formation: researchers compared a halo occupation distribution (HOD) model against a semi-analytical model (SAM) using clustering measurements of BOSS-CMASS galaxies, finding they could distinguish between them and recover velocity fields with 1–3% accuracy [AG-2025.05-291]. This works because the clustering signature encodes the underlying physics—how galaxies populate dark matter halos, for instance.

More broadly, forward modeling—predicting observable distributions from first-principles physics, then using machine learning to invert the mapping—lets clustering data constrain cosmological and astrophysical parameters directly [AG-2025.06-079]. The clustering pattern becomes a likelihood function for your model parameters.

Why this matters

The key difference: anomaly detection finds *something odd* and says "investigate this region." Model constraining uses clustering as a high-fidelity measurement tool, with error bars, that directly narrows the space of allowed theories. Particle physics leans toward the former; cosmology increasingly uses the latter.