Baidu’s text-to-speech system sounds close to a human

Three months ago, Baidu showed off DeepVoice, a system that turns text into speech which could produce speech that is eerily similar to a human voice and in almost in real time, but it could only learn one voice at a time and require many hours of audio to build a sample. Fast forward to today, and the company has just released DeepVoice 2 which can learn a persons voice in just 30 minutes of audio and a single system and imitate hundred different speakers.

DeepVoice 2 learns the common traits shared across hundreds of speakers to build a human voice and tweaks it to craft different characters without any human aid. Baidu is targeting digital assistants that use voice commands and ebooks to show the different characters, giving a unique experience to ebook lovers.

However Baidu is not the only one that is experimenting with this technology, Google has published a research on WaveNet, a vocoder that made huge gains in audio quality over traditional speech systems and Lyrebird, a Canadian startup, showed a system that could imitate the voice of famous figures based on one minute of audio data.

Related posts

Acer unveils Predator Helios Neo 14 and refreshed Nitro 16 gaming laptops

Apple officially launches HomePod and HomePod mini in Malaysia

From Spreadsheets to Netflix: The LG MyView Does it All!