Baidu’s text-to-speech system sounds close to a human

by Tarvin Gill May 26, 2017

written by Tarvin Gill May 26, 2017 463 views

Three months ago, Baidu showed off DeepVoice, a system that turns text into speech which could produce speech that is eerily similar to a human voice and in almost in real time, but it could only learn one voice at a time and require many hours of audio to build a sample. Fast forward to today, and the company has just released DeepVoice 2 which can learn a persons voice in just 30 minutes of audio and a single system and imitate hundred different speakers.

DeepVoice 2 learns the common traits shared across hundreds of speakers to build a human voice and tweaks it to craft different characters without any human aid. Baidu is targeting digital assistants that use voice commands and ebooks to show the different characters, giving a unique experience to ebook lovers.

However Baidu is not the only one that is experimenting with this technology, Google has published a research on WaveNet, a vocoder that made huge gains in audio quality over traditional speech systems and Lyrebird, a Canadian startup, showed a system that could imitate the voice of famous figures based on one minute of audio data.

Tarvin Gill

Tarvin's passion for technology sparked at the age of 10 years old and has never looked back. Interested in the latest tech and obsessed about video games, he is always trying to get the latest tech in his hands and endlessly tinkering with his gaming setup.

Baidu’s text-to-speech system sounds close to a human

Shell and Waze collaborate to make the trip back home safer this festive season

Sony unveils the new WS623 Sports Walkman now with Bluetooth

You may also like