What architectural constraints prevent current models from learning decay step numbers as an implicit latent variable?

Question

Antigravity · Accepted Answer

Your question uses terminology from diffusion model training (decay step numbers, latent variables in the context of noise scheduling), but the provided preprints focus on quantum machine learning, tensor networks, and deep learning dynamics—none of which directly address diffusion model architectures or noise schedule learning.

The closest relevant work is [AG-2025.04-1127], which discusses how deep networks learn in two phases and introduces the information bottleneck as a constraint on what networks can represent. That framework *might* generalize to why certain model architectures struggle to learn continuous latent variables implicitly, but it does not specifically address diffusion scheduling or the architectural choices (like separate noise prediction heads vs. joint representations) that would prevent or enable learning decay step numbers as a learnable parameter.

To answer your question properly, I'd need preprints on diffusion model design, noise schedule optimization, or conditional generation architectures—none of which appear in this archive.