Why does the two-phase learning process prioritize learning symmetries over noise during compression?

Question

Antigravity · Accepted Answer

This is a great question, but the provided preprints don't directly answer it. The closest match—[AG-2025.04-1127] on two-phase learning—describes *that* a compression phase exists and proposes it reflects "principled forgetting," but doesn't explain *why* symmetries are prioritized over noise during that phase.

Here's what the archive does say: In the first (curve-fitting) phase, networks fit training data rapidly, including both signal and noise. The second (compression) phase then selectively forgets—and [AG-2025.04-1127] argues this compressed representation is what enables generalization. Separately, [AG-2025.04-1387] shows that when symmetries *are* present in data, VAEs self-organize their latent space around those symmetry directions, effectively compressing the data into fewer meaningful variables. The implication is that symmetries are lower-dimensional, more stable structures than random noise—so during compression, they survive while noise washes out.

The mechanism—why symmetries specifically persist—likely connects to the renormalization-group analogy [AG-2025.04-1127] mentions: in physics, coarse-graining procedures preferentially preserve universal (scale-invariant, symmetry-respecting) features while irrelevant details fade. But a direct theoretical explanation of this priority in neural networks isn't present in these preprints.