What is Transformer?
The neural network architecture behind modern AI, using self-attention to process entire sequences in parallel rather than word by word.
Full Definition
The transformer is the neural network architecture introduced in the 2017 paper 'Attention Is All You Need' (Vaswani et al.). Its core innovation is the self-attention mechanism, which allows every token in a sequence to attend to every other token simultaneously, capturing long-range dependencies without processing words one by one as older recurrent networks did. This parallelism makes training on massive datasets feasible with modern GPUs and TPUs. Transformers power virtually all modern LLMs, diffusion models (via their denoising networks), and multimodal models. The architecture consists of stacked encoder and/or decoder layers, each containing multi-head attention sublayers and feed-forward sublayers, connected by residual connections and layer normalization.
Tools that use Transformer
ChatGPT
The most widely used AI assistant with 900M+ weekly users
Claude
Best-in-class reasoning with 1M token context window
Gemini
NewGoogle's AI assistant with deep Workspace integration and 1M token context
GitHub Copilot
AI coding assistant with 4.7M+ paying subscribers