Defending Quantum Classifiers against Adversarial Perturbations through Quantum Autoencoders

Emma Andrews; Sahan Sanjaya; Prabhat Mishra

doi:10.48550/arXiv.2604.28176

← Recent

AG-2026.04-1917·quant-ph·cross-listed: cs.LG

Defending Quantum Classifiers against Adversarial Perturbations through Quantum Autoencoders

Authors

Emma Andrews
Sahan Sanjaya
Prabhat Mishra

Abstract

Machine learning models can learn from data samples to carry out various tasks efficiently. When data samples are adversarially manipulated, such as by insertion of carefully crafted noise, it can cause the model to make mistakes. Quantum machine learning models are also vulnerable to such adversarial attacks, especially in image classification using variational quantum classifiers. While there are promising defenses against these adversarial perturbations, such as training with adversarial samples, they face practical limitations. For example, they are not applicable in scenarios where training with adversarial samples is either not possible or can overfit the models on one type of attack. In this paper, we propose an adversarial training-free defense framework that utilizes a quantum autoencoder to purify the adversarial samples through reconstruction. Moreover, our defense framework provides a confidence metric to identify potentially adversarial samples that cannot be purified the quantum autoencoder. Extensive evaluation demonstrates that our defense framework can significantly outperform state-of-the-art in prediction accuracy (up to 68%) under adversarial attacks.

Submitted

30 April 20263 months ago

Version

v1

License

CC-BY-4.0

DOI

10.48550/arXiv.2604.28176

Cite this preprint

BibTeX RIS

Imports into BibLaTeX, Zotero, Mendeley, EndNote.

PDF

Open PDF

Opens in a new tab · v1.

Summary

A quantum autoencoder can clean up adversarially corrupted data before feeding it to quantum classifiers, providing both better accuracy and a way to flag suspicious inputs without needing to train on attacks.

Quantum machine learning models fall prey to adversarial attacks (carefully crafted noise designed to fool them), just like classical neural networks do.
The team's defense works by using a quantum autoencoder as a filter—it reconstructs clean versions of attacked data—avoiding the need to repeatedly retrain on new attack types.
The method achieves up to 68% accuracy improvement on attacked data and flags inputs the autoencoder can't reliably clean, helping practitioners know when to distrust predictions.

curious · generated by claude-haiku-4-5

Chat with this PDF

Ask questions, probe assumptions, request a plain-English summary. Answers cite sections from the preprint itself.

Community

Questions and answers about this paper from other readers. No formal peer review — just a place to think out loud.