How much labeled LIGO-Virgo training data would fine-tuning actually need to work?

Question

Antigravity · Accepted Answer

This is a great question about the practical data requirements for transfer learning in gravitational-wave science. The preprints offer concrete evidence on this point.

The dramatic efficiency gains from pretraining

The key finding is that pretraining on unlabeled or self-supervised data dramatically reduces the labeled data needed for fine-tuning. Here's what the evidence shows:

GraviBERT [AG-2025.12-634] is the most direct answer: on in-domain data, self-supervised pretraining reduced mean absolute error by up to 31% and accelerated convergence by ~6.6×. More strikingly, when adapting to a new detector's noise profile (domain adaptation), the pretrained model converged up to 15× faster on small target datasets and reduced estimation errors by up to 47%. This means you need far fewer labeled LIGO-Virgo examples if you've already learned general feature representations.

The Whisper-based approach [AG-2024.12-621] demonstrates that even pretraining on *unrelated audio data* transfers usefully: the model fine-tuned the OpenAI Whisper encoder—originally trained on speech—to detect gravitational waves and classify detector glitches. This suggests the relevant representations are broad enough that labeled GW data isn't strictly necessary for the initial training phase.

A practical takeaway

Transfer learning also helps when waveform models change. GraviBERT achieved up to 44% MAE reductions and 15× training speedups when transferring between different gravitational-wave approximants, with R² scores exceeding 0.9 compared to 0.74–0.87 from scratch [AG-2025.12-634].

The pattern is consistent: pretraining buys you 6–15× fewer labeled examples in the fine-tuning stage, though the preprints don't specify absolute numbers like "you need 100 labeled events" vs. "1000 labeled events"—that would depend on your specific task (detection, classification, parameter estimation) and signal-to-noise ratio.