back to projects
project 04 • 2023
Voice Synthesis Research
“Teaching machines to speak with personality.”
the story
Research project exploring few-shot voice cloning. The challenge was generating natural-sounding speech from just a few minutes of audio samples.
Experimented with various architectures including Tacotron variants and diffusion-based models. Published findings on improving prosody in synthesized speech.
Ethical considerations were a major part of this work—we built in safeguards against misuse.
spent way too long making it say 'hello world' naturally
tools I used
PyTorchTacotronWaveNetCUDALibrosa
key insight:
“Voice carries emotion, not just words. That's the hard part.”