Baidu’s text-to-speech system sounds close to a human

Three months ago, Baidu showed off DeepVoice, a system that turns text into speech which could produce speech that is eerily similar to a human voice and in almost in real time, but it could only learn one voice at a time and require many hours of audio to build a sample. Fast forward to today, and the company has just released DeepVoice 2 which can learn a persons voice in just 30 minutes of audio and a single system and imitate hundred different speakers.

DeepVoice 2 learns the common traits shared across hundreds of speakers to build a human voice and tweaks it to craft different characters without any human aid. Baidu is targeting digital assistants that use voice commands and ebooks to show the different characters, giving a unique experience to ebook lovers.

However Baidu is not the only one that is experimenting with this technology, Google has published a research on WaveNet, a vocoder that made huge gains in audio quality over traditional speech systems and Lyrebird, a Canadian startup, showed a system that could imitate the voice of famous figures based on one minute of audio data.

Related posts

Yaber launches T2 series 1080p portable projector for RM1799

Infinix set to introduce its most expensive smartphone next week

This Casio watch has Bluetooth and tracks your steps for RM369