Text To Speech
Speech Synthesis is the part of the system that renders and creates an audio-friendly form. This includes what you hear as well as how the mouth moves. Multiple systems work together to both create the audio through Text To Speech (TTS) as well as mapping of phonemes (how your mouth moves).
info
There is a tradeoff between speed and quality. We can achieve higher quality voices, but it feels like you are on a badly lagging video call and can be disruptive.
Quality | Latency | Example |
---|---|---|
Low | Very Fast | |
Middle | Fast | |
High | Some | |
Ultra | High |