Ultra-lightweight English Text-to-Speech (~9M parameters, ~20MB on disk)
This space runs on CPU efficiently and synthesizes high-quality audio faster than real-time.