At some point or another audio translations have had to be used and in those times the distinction between the voice of the translation and the original one is highly noticeable. The most obvious change is the swap from a male voice to a female one, or vice versa.
Google's translation team has been working hard to minimize the audio changes, and its audio translator can now keep the voice and tone as close as possible to that of the original speaker.
There are still some noticeable, yet distinctly smaller, differences. These have been dramatically minimized in comparison to other translation engines.
How does it all work?
Google's AI translator directly converts the audio input to the audio output without any further in-between steps.
Traditionally speaking, translation systems convert audio into text, the text is then translated, and finally, the audio is resynthesized. Somewhere in the middle, the original voice is lost and a new, distinctly different, one is used in its stead.
What Google has done is to create and use a new system, named the 'Translatotron', an end-to-end speech-to-speech translation system. The Translatotron comprises three steps:
- Audio spectrograms from input languages into output ones trained to map each other.
- A conversion of spectrograms into an audio wave.
- The third component layers the original speaker's voice back onto the final output.
What difference will this make?
This is a positive tick in the box for all matters linked to audio translation, not only due to the fact it creates more nuanced translations but because it also minimizes room for errors. As there are fewer steps in the translation process, there are fewer chances for mistakes to happen.