Glossary

What is Transformer?

The neural network architecture behind modern AI, using self-attention to process entire sequences in parallel rather than word by word.

Full Definition

The transformer is the neural network architecture introduced in the 2017 paper 'Attention Is All You Need' (Vaswani et al.). Its core innovation is the self-attention mechanism, which allows every token in a sequence to attend to every other token simultaneously, capturing long-range dependencies without processing words one by one as older recurrent networks did. This parallelism makes training on massive datasets feasible with modern GPUs and TPUs. Transformers power virtually all modern LLMs, diffusion models (via their denoising networks), and multimodal models. The architecture consists of stacked encoder and/or decoder layers, each containing multi-head attention sublayers and feed-forward sublayers, connected by residual connections and layer normalization.