Google’s Tacotron 2 Text To Speech AI Produces Sounds Indistinguishable From Human Speech

Zara Ali • Dec 27, 2017 at 02:52pm EST

Google has been up to a lot when it comes to experimenting in the field of Artificial Intelligence. Today, the tech giant has taken yet another step to advance further in the field. Google touts that its latest version of AI-powered speech synthesis system, Tacotron 2, falls pretty close to human speech. It has also uploaded some speech samples of the Tacotron 2 so that listeners can experience the ultimate technology.

Uses two deep neural networks for output

The Tacotron 2 is Google's second generation of the speech-to-text technology, it comes with two deep neural networks for flawless output. The first neural network is responsible for translating the text into a spectrogram (pdf), which visually renders audio frequencies. After converting to spectrogram, it is then fed into WaveNet, which is a system developed by Alphabet’s AI research lab DeepMind. Wavenet reads the spectrogram chart and produces the similar audio elements.

Speech-to-text is not a new technology of course, for the Mac users, it has been there for quite some time. However, Google claims that its text-to-speech technology superior to most and is almost indistinguishable from human speech.

Responds to punctuations too

The Tacotron 2 uses context to pronounce perfectly even identical words like ‘read’ (to read) and ‘read’ (has read). It also responds to the punctuations used in the text and can also learn to stress on the particular words, when they are written in caps.

In a post, Quartz's Dave Gershgorn explained the working of the Tacotron 2, he wrote:

The system is Google’s second official generation of the technology, which consists of two deep neural networks. The first network translates the text into a spectrogram (pdf), a visual way to represent audio frequencies over time. That spectrogram is then fed into WaveNet, a system from Alphabet’s AI research lab DeepMind, which reads the chart and generates the corresponding audio elements accordingly.

You can check out all the comparative audio samples by clicking on this link. There are two audio samples for every single text and Google has not made it clear that which one is generated by Tacotron 2 and which one is human speech. But if you dig down deeper and view the file source, you can figure out which audio sample is from Tacotron 2.

Impressive much?

After listening to the samples and figuring out Tacotron 2 samples by viewing the source code, we can say that Google has achieved some impressive results here. The voice is pretty much similar to the human speech, not utterly human, but close enough. Better than other speech-to-text technologies that sometimes sound too mechanical. Also, it takes notes of punctuations in the text and changes the pace accordingly.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on Google’s Tacotron 2 Text To Speech AI Produces Sounds Indistinguishable From Human Speech

Google’s Tacotron 2 Text To Speech AI Produces Sounds Indistinguishable From Human Speech

Uses two deep neural networks for output

Responds to punctuations too

Impressive much?

Trending Stories

RTX Spark’s 20-Core CPU Disappoints In Cinebench 2026’s Multi-Core Leak Despite Set To High Performance Mode, Single-Core Results Show Promise But Only Against M3 Max

Intel’s Former CEO Gelsinger Admits Firm ‘Scoffed’ at NVIDIA’s GPUs While Riding High on CPU Dominance & Makes Big Quantum Computing Claims

GameStop May Have Leaked Zelda: Ocarina of Time Remake Pre-Orders for August 4, Hinting First Real Footage Isn’t Far

Micron Becomes Automobile Sector’s Guardian Angel During DRAM Supply Crunch, Secures Long-Term Deals To Prevent Potential One Million Layoffs In The U.S. Alone

China’s Kimi K3 Identifies Itself As Anthropic’s Claude In At Least One Conversation, Betraying Its Distilled Origins

Popular Discussions

AMD Radeon Drivers Silently Add Multi Frame Generation “MFG 8x”, Ray Regeneration, and Neural Radiance Overrides, Hinting At A Bigger FSR Push

AMD Ryzen 7 7700X3D 4.5 GHz “3D V-Cache” CPU Review: The Budget X3D Champ For AM5

NVIDIA GeForce RTX 50 SUPER GPUs Have Reportedly Arrived At AIBs, But Are On Hold Due To Undecided Memory Prices

AMD Ryzen 7 5800X3D Outsells Ryzen 7 7800X3D For The Same Price On Amazon Despite Being Weaker

AMD Ryzen 7 7800X3D CPU Drops To $299 A Day Ahead of 7700X3D’s Launch, Bringing 3D V-Cache Goodness To Mainstream Gamers

Google’s Tacotron 2 Text To Speech AI Produces Sounds Indistinguishable From Human Speech

Uses two deep neural networks for output

Related Story Samsung Reportedly Outsources Google’s TPU I/O Late-Stage Design, Says Report

Responds to punctuations too

Impressive much?

Further Reading

Trending Stories

Popular Discussions