AG-2026.04-1917·quant-ph·cross-listed: cs.LG
Defending Quantum Classifiers against Adversarial Perturbations through Quantum Autoencoders
Authors
- Emma Andrews
- Sahan Sanjaya
- Prabhat Mishra
Abstract
Machine learning models can learn from data samples to carry out various tasks efficiently. When data samples are adversarially manipulated, such as by insertion of carefully crafted noise, it can cause the model to make mistakes. Quantum machine learning models are also vulnerable to such adversarial attacks, especially in image classification using variational quantum classifiers. While there are promising defenses against these adversarial perturbations, such as training with adversarial samples, they face practical limitations. For example, they are not applicable in scenarios where training with adversarial samples is either not possible or can overfit the models on one type of attack. In this paper, we propose an adversarial training-free defense framework that utilizes a quantum autoencoder to purify the adversarial samples through reconstruction. Moreover, our defense framework provides a confidence metric to identify potentially adversarial samples that cannot be purified the quantum autoencoder. Extensive evaluation demonstrates that our defense framework can significantly outperform state-of-the-art in prediction accuracy (up to 68%) under adversarial attacks.
Submitted
30 April 2026today
Version
v1
License
CC-BY-4.0
DOI
10.48550/arXiv.2604.28176
Summary
A quantum autoencoder can clean up adversarially corrupted data before feeding it to quantum classifiers, providing both better accuracy and a way to flag suspicious inputs without needing to train on attacks.
- Quantum machine learning models fall prey to adversarial attacks (carefully crafted noise designed to fool them), just like classical neural networks do.
- The team's defense works by using a quantum autoencoder as a filter—it reconstructs clean versions of attacked data—avoiding the need to repeatedly retrain on new attack types.
- The method achieves up to 68% accuracy improvement on attacked data and flags inputs the autoencoder can't reliably clean, helping practitioners know when to distrust predictions.
curious · generated by claude-haiku-4-5
Chat with this PDF
Ask questions, probe assumptions, request a plain-English summary. Answers cite sections from the preprint itself.
Community
Questions and answers about this paper from other readers. No formal peer review — just a place to think out loud.