Folded Context Condensation in Path Integral Formalism for Infinite Context Transformers

Won-Gi Paeng; Daesuk Kwon; Kyungwon Jeong; Honggyo Suh

doi:10.48550/arXiv.2405.04620

← Recent

AG-2024.05-1163·hep-ph·cross-listed: cs.AIcs.CLcs.LGcs.NE

Folded Context Condensation in Path Integral Formalism for Infinite Context Transformers

Authors

Won-Gi Paeng
Daesuk Kwon
Kyungwon Jeong
Honggyo Suh

Abstract

In this work, we present a generalized formulation of the Transformer algorithm by reinterpreting its core mechanisms within the framework of Path Integral formalism. In this perspective, the attention mechanism is recast as a process that integrates all possible transition paths leading to future token states, with temporal evolution governed by the Feed-Forward Network. By systematically mapping each component of the Transformer to its counterpart in the Path Integral formulation, we obtain a more compact and efficient representation, in which the contextual information of a sequence is condensed into memory-like segments. These segments are recurrently processed across Transformer layers, enabling more effective long-term information retention. We validate the effectiveness of this approach through the Passkey retrieval task and a summarization task, demonstrating that the proposed method preserves historical information while exhibiting memory usage that scales linearly with sequence length. This contrasts with the non-linear memory growth typically observed in standard attention mechanisms. We expect that this quantum-inspired generalization of the Transformer architecture will open new avenues for enhancing both the efficiency and expressiveness of future Transformer models.

Submitted

7 May 20242 years ago

Version

v1

License

CC-BY-4.0

DOI

10.48550/arXiv.2405.04620

Cite this preprint

BibTeX RIS

Imports into BibLaTeX, Zotero, Mendeley, EndNote.

PDF

Open PDF

Opens in a new tab · v1.

Chat with this PDF

Ask questions, probe assumptions, request a plain-English summary. Answers cite sections from the preprint itself.

Community

Questions and answers about this paper from other readers. No formal peer review — just a place to think out loud.