Homogeneous Stellar Parameters from Heterogeneous Spectra with Deep Learning

Jeff Shen; Joshua S. Speagle; Shirley Ho

doi:10.48550/arXiv.2604.25786

← Recent

AG-2026.04-1835·astro-ph.GA·cross-listed: astro-ph.IM

Homogeneous Stellar Parameters from Heterogeneous Spectra with Deep Learning

Authors

Jeff Shen
Joshua S. Speagle
Shirley Ho

Abstract

Large-scale spectroscopic surveys have collectively observed millions of stars across the Milky Way, but each derives stellar labels using independent pipelines with distinct modelling assumptions, introducing systematic offsets that obscure signals in chemical space and hinder large-scale Galactic archaeology. We present a unified deep-learning framework that delivers atmospheric parameters, chemical abundances for 20 elements, distances, and ages -- all on a single, self-consistent scale -- for an arbitrary number of spectroscopic surveys simultaneously. Our approach uses a Transformer model that ingests spectra of arbitrary wavelength range and resolution, trained end-to-end as a single model across all surveys, eliminating the need for post-hoc recalibration. We apply this framework to spectra from APOGEE DR17, GALAH DR3, DESI DR1, and $\textit{Gaia}$ RVS DR3, spanning resolutions from R ~ 2,000 to 28,000 and wavelengths from the optical to the near-infrared. On high-resolution APOGEE spectra the model achieves precisions of $18~$K in $\textrm{T}_{\rm eff}$, $0.04~$dex in $\textrm{log}\,\textit{g}$, $0.015~$dex in [Fe/H], and ${<}\,0.03~$dex across all abundances; on lower-resolution DESI spectra, typical precisions are $51~$K, $0.09~$dex, $0.04~$dex, and ${\sim}\,0.06~$dex, respectively. Cross-survey comparisons demonstrate that labels for the same stars observed by different surveys are consistent within model uncertainties; we further validate against external distance catalogs and open cluster metallicities and ages. The resulting homogeneous catalog enables Galactic archaeology at unprecedented scale and consistency, and the framework is readily extensible to forthcoming spectroscopic surveys such as SDSS-V, WEAVE, and 4MOST. The catalog is publicly available at https://doi.org/10.5281/zenodo.19830515.

Submitted

28 April 20263 months ago

Version

v1

License

CC-BY-4.0

DOI

10.48550/arXiv.2604.25786

Cite this preprint

BibTeX RIS

Imports into BibLaTeX, Zotero, Mendeley, EndNote.

PDF

Open PDF

Opens in a new tab · v1.

Summary

A deep-learning model trained on multiple stellar surveys simultaneously produces consistent stellar properties (temperature, composition, distance, age) across different telescopes and wavelengths, solving the problem of incompatible measurements that has plagued large-scale galaxy mapping.

Astronomers have millions of stellar spectra from different surveys, but each pipeline measures different values for the same star—a Transformer model fixes this by learning from all surveys at once rather than calibrating them separately afterward.
The model handles spectra of wildly different quality (from low-resolution to ultra-high-resolution) in a single framework, achieving precisions of ±18 K in temperature and ±0.015 dex in iron abundance on the best data.
This consistency unlocks Galactic archaeology: scientists can now trace how the Milky Way assembled by following chemical patterns across millions of stars without worrying that disagreements are just measurement artifacts.

curious · generated by claude-haiku-4-5

Chat with this PDF

Ask questions, probe assumptions, request a plain-English summary. Answers cite sections from the preprint itself.

Community

Questions and answers about this paper from other readers. No formal peer review — just a place to think out loud.