Glossary

What is Text-to-Speech (TTS)?

AI technology that converts written text into natural-sounding spoken audio, enabling voice cloning and expressive narration at scale.

Full Definition

AI-powered text-to-speech (TTS) systems convert written text into synthetic speech that sounds increasingly indistinguishable from a human voice. Modern neural TTS systems use architectures such as Tacotron, FastSpeech, and diffusion-based vocoders to produce high-fidelity audio with natural prosody, pacing, and emotion. Voice cloning — generating speech in a specific person's voice from as little as a few seconds of reference audio — has become commercially available through platforms like ElevenLabs, Murf AI, and Descript. Applications include audiobook production, podcast creation, e-learning narration, accessibility features (screen readers), IVR systems, and dubbing content for localization. Responsible use concerns around voice fraud and non-consensual voice cloning have led to watermarking research and platform usage policies.