AG-2024.02-2054·quant-ph·cross-listed: cs.AIcs.CL
Quantum Transformer: Accelerating model inference via quantum linear algebra
Authors
- Naixu Guo
- Zhan Yu
- Matthew Choi
- Yizhan Han
- Aman Agrawal
- Kouhei Nakaji
- Alán Aspuru-Guzik
- Patrick Rebentrost
Abstract
Powerful generative artificial intelligence from large language models (LLMs) harnesses extensive computational resources for inference. In this work, we investigate the transformer architecture, a key component of these models, under the lens of fault-tolerant quantum computing. We develop quantum subroutines to construct the building blocks in the transformer, including the self-attention, residual connection with layer normalization, and feed-forward network. As an important subroutine, we show how to efficiently implement the Hadamard product and element-wise functions of matrices on quantum computers. Our algorithm prepares an amplitude encoding of the transformer output, which can be measured for prediction or use in the next layer. We find that the matrix norm of the input sequence plays a dominant role in the quantum complexity. With numerical experiments on open-source LLMs, including for bio-informatics applications, we demonstrate the potential of a quantum speedup for transformer inference in practical regimes.
Submitted
26 February 20242 years ago
Version
v1
License
CC-BY-4.0
DOI
10.48550/arXiv.2402.16714
Chat with this PDF
Ask questions, probe assumptions, request a plain-English summary. Answers cite sections from the preprint itself.
Community
Questions and answers about this paper from other readers. No formal peer review — just a place to think out loud.