Three months ago, Baidu showed off DeepVoice, a system that turns text into speech which could produce speech that is eerily similar to a human voice and in almost in real time, but it could only learn one voice at a time and require many hours of audio to build a sample. Fast forward to today, and the company has just released DeepVoice 2 which can learn a persons voice in just 30 minutes of audio and a single system and imitate hundred different speakers.
DeepVoice 2 learns the common traits shared across hundreds of speakers to build a human voice and tweaks it to craft different characters without any human aid. Baidu is targeting digital assistants that use voice commands and ebooks to show the different characters, giving a unique experience to ebook lovers.
However Baidu is not the only one that is experimenting with this technology, Google has published a research on WaveNet, a vocoder that made huge gains in audio quality over traditional speech systems and Lyrebird, a Canadian startup, showed a system that could imitate the voice of famous figures based on one minute of audio data.