Skip to main content

Text To Speech

Speech Synthesis is the part of the system that renders and creates an audio-friendly form. This includes what you hear as well as how the mouth moves. Multiple systems work together to both create the audio through Text To Speech (TTS) as well as mapping of phonemes (how your mouth moves).

info

There is a tradeoff between speed and quality. We can achieve higher quality voices, but it feels like you are on a badly lagging video call and can be disruptive.

QualityLatencyExample
LowVery Fast
MiddleFast
HighSome
UltraHigh