Home News Baidu’s text-to-speech system sounds close to a human

Baidu’s text-to-speech system sounds close to a human

by Tarvin Gill
598 views

Three months ago, Baidu showed off DeepVoice, a system that turns text into speech which could produce speech that is eerily similar to a human voice and in almost in real time, but it could only learn one voice at a time and require many hours of audio to build a sample. Fast forward to today, and the company has just released DeepVoice 2 which can learn a persons voice in just 30 minutes of audio and a single system and imitate hundred different speakers.

DeepVoice 2 learns the common traits shared across hundreds of speakers to build a human voice and tweaks it to craft different characters without any human aid. Baidu is targeting digital assistants that use voice commands and ebooks to show the different characters, giving a unique experience to ebook lovers.

However Baidu is not the only one that is experimenting with this technology, Google has published a research on WaveNet, a vocoder that made huge gains in audio quality over traditional speech systems and Lyrebird, a Canadian startup, showed a system that could imitate the voice of famous figures based on one minute of audio data.

You may also like