project 04 • 2023

Voice Synthesis Research

“Teaching machines to speak with personality.”

the story

Research project exploring few-shot voice cloning. The challenge was generating natural-sounding speech from just a few minutes of audio samples. Experimented with various architectures including Tacotron variants and diffusion-based models. Published findings on improving prosody in synthesized speech. Ethical considerations were a major part of this work—we built in safeguards against misuse.

spent way too long making it say 'hello world' naturally

tools I used

PyTorchTacotronWaveNetCUDALibrosa

key insight:

“Voice carries emotion, not just words. That's the hard part.”

view the code

next up:Neural Search Engine

(click to flip the page)