Naver develops AI voice synthesizing tech capable of expressing emotions

By Park Sae-jin Posted : November 14, 2019, 14:55 Updated : November 14, 2019, 14:55

[Courtesy of Naver]

SEOUL -- Naver, a top web portal operator in South Korea, has developed an artificial intelligence-based voice synthesis technology using a real person's voice to create a synthesized robot voice capable of expressing emotions. Such technology can be adopted by call centers and electronic audio bookmakers to give customers more realistic services.

On Thursday, Naver unveiled natural end-to-end speech synthesis (NES) through the website of Clova, an AI voice assistant service. "Everyone can make voice fonts easily and conveniently," Naver Clova Voice research head Kim Jae-min was quoted as saying. Naver plans to add more features such as the voices of popular figures and various emotions to NES.

Naver said that AI synthesized voices can be created by studying voice recordings of about 40 minutes (about 400 sentences). Similar technologies developed by tech companies so far needed to analyze and study at least 40 hours of actual voice recordings to create an artificial voice.

NES can control the emotions of the artificial voice to make it sound happy or sad. Synthesized voice technology can be useful in service, electronic book and other sectors. In November last year, Naver released an audiobook service using synthesized voices of Yoo In-na, a 36-year-old actress who served as a radio DJ, using Hybrid DNN Text-to-Speech (HDTS), a technology that converts texts into synthesized voices.
기사 이미지 확대 보기
닫기