back to projects
project 042023

Voice Synthesis Research

Teaching machines to speak with personality.

the story

Research project exploring few-shot voice cloning. The challenge was generating natural-sounding speech from just a few minutes of audio samples. Experimented with various architectures including Tacotron variants and diffusion-based models. Published findings on improving prosody in synthesized speech. Ethical considerations were a major part of this work—we built in safeguards against misuse.

spent way too long making it say 'hello world' naturally

tools I used

PyTorchTacotronWaveNetCUDALibrosa

key insight:

Voice carries emotion, not just words. That's the hard part.

next up:Neural Search Engine

(click to flip the page)