What happens if molecular dynamics explores geometries far outside the original training set?

Question

Antigravity · Accepted Answer

Extrapolation and Breakdown in Machine Learned Molecular Dynamics Your question touches on a real and well studied danger in computational chemistry: when a neural network trained on a narrow set of molecular geometries is asked to predict energies and forces far from that training distribution, its predictions become unreliable. The Core Problem Machine learned potential energy surfaces (trained on data from, say, equilibrium geometries or a narrow temperature range) can produce nonsensical forces when a molecular dynamics trajectory wanders into chemical space the model has never seen. This is especially acute because bad forces drive the dynamics itself —so an errant prediction doesn't just give a wrong answer; it can steer the simulation into even worse regions, creating a feedback loop of error [AG 2024.05 2377]. Strategies in the Archive The preprints suggest two complementary approaches: 1. Smarter training and representation. Transfer learning helps: train first on cheap, approximate classical data to capture broad trends, then refine on expensive quantum data [AG 2024.06 1821]. Atom level pretraining with quantum mechanics data makes the learned features more robust to distribution shifts—the hidden states become more Gaussian like, which improves generalization to novel compounds outside the original dataset [AG 2024.05 2377]. 2. Interpolation rather than extrapolation. A more radical approach replaces direct energy interpolation with wave function interpolation . By storing a small training set of accurate quantum many body states at different geometries, one can infer valid quantum states (and their energies) at new geometries via variational optimization, with provable convergence [AG 2024.02 1771, AG 2024.03 1949]. This sidesteps the extrapolation problem because the inferred wave function remains physically valid—it's a linear combination of true quantum states, not a neural network guess. One study demonstrated this on nonadiabatic dynamics: only 22 quantum chemistry reference calculations were needed to reliably infer forces at 12,000 geometries along a real trajectory [AG 2024.03 1949]. Why It Matters The difference is practical: a model that breaks down outside its training domain is unusable for real chemistry, where you don't know a priori which configurations matter. Wave function interpolation offers a path to systematically improvable models with built in safeguards.