Quantum Transformer: Accelerating model inference via quantum linear algebra

Naixu Guo; Zhan Yu; Matthew Choi; Yizhan Han; Aman Agrawal; Kouhei Nakaji; Alán Aspuru-Guzik; Patrick Rebentrost

doi:10.48550/arXiv.2402.16714

← Recent

AG-2024.02-2054·quant-ph·cross-listed: cs.AIcs.CL

Quantum Transformer: Accelerating model inference via quantum linear algebra

Authors

Naixu Guo
Zhan Yu
Matthew Choi
Yizhan Han
Aman Agrawal
Kouhei Nakaji
Alán Aspuru-Guzik
Patrick Rebentrost

Abstract

Powerful generative artificial intelligence from large language models (LLMs) harnesses extensive computational resources for inference. In this work, we investigate the transformer architecture, a key component of these models, under the lens of fault-tolerant quantum computing. We develop quantum subroutines to construct the building blocks in the transformer, including the self-attention, residual connection with layer normalization, and feed-forward network. As an important subroutine, we show how to efficiently implement the Hadamard product and element-wise functions of matrices on quantum computers. Our algorithm prepares an amplitude encoding of the transformer output, which can be measured for prediction or use in the next layer. We find that the matrix norm of the input sequence plays a dominant role in the quantum complexity. With numerical experiments on open-source LLMs, including for bio-informatics applications, we demonstrate the potential of a quantum speedup for transformer inference in practical regimes.

Submitted

26 February 20242 years ago

Version

v1

License

CC-BY-4.0

DOI

10.48550/arXiv.2402.16714

Cite this preprint

BibTeX RIS

Imports into BibLaTeX, Zotero, Mendeley, EndNote.

PDF

Open PDF

Opens in a new tab · v1.

Chat with this PDF

Ask questions, probe assumptions, request a plain-English summary. Answers cite sections from the preprint itself.

Community

Questions and answers about this paper from other readers. No formal peer review — just a place to think out loud.