AI-powered voices: A game-changer for productivity and language learning

AI voices are becoming more and more realistic. What do they mean for translation and interpreting? Will they replace us? (TL;DR: No.😉)

AI voices can enhance proofreading, preparation, language learning, and interpreting practice. In this article, I’ll show you how – and share the best free and paid tools to help you find what works for you.

A brief introduction to AI voices

What are AI voices?

AI voices are how virtual assistants – and some videos on the internet – “speak” to you. If you’ve ever gotten a response from Alexa, or watched an Instagram reel with text-to-speech (TTS) narration, you‘ve heard an AI voice.

AI voices are generated through speech synthesis, a complicated process involving text analysis, prosodic analysis, and speech generation.

Speech synthesis has been around for a long time, but is rapidly evolving alongside AI.

While in the past synthetic voices sounded stilted – think of your built-in screen reader – today, they’re more human-like than ever. That’s because AI-powered speech synthesis does a better job of incorporating tone, rhythm, emotion, and even those uh’s, um’s, and hm’s that we interpreters try so hard to avoid!

Today’s AI-powered synthetic voices are also multilingual. Most AI TTS tools offer a wide selection of languages, making them a great resource for translators, interpreters, and language learners. (More on that below!)

Why AI text-to-speech should be on your radar

Originally developed for accessibility – helping people with visual impairmentsaccess written content and navigate computers – TTS is now used by businesses and individuals in broader ways.

Talk of AI voices replacing voice professionals like actors, narrators, announcers, and yes, interpreters, exists – but exploring TTS should convince you they’re not replacements for human experts. 😌

Remember, AI can’t replicate human emotions or handle unpredictable situations – which is why 99% of people prefer human customer support.

And when it comes to speech synthesis, AI isn’t perfect. Sometimes the pronunciation is wrong, the pacing is awkward, and/or the intonation is off.

Also, because AI doesn’t always grasp the meaning of what’s being said, it isn’t 100% trustworthy, especially in high-stakes situations. And you shouldn’t trust AI with confidential information.

However, AI voices are fun – and can be useful in many circumstances.

Auditory learner? Use text-to-speech to boost understanding and retention. Multitasker? Consume text-based content on the go, or while driving, exercising, or cooking. Productivity geek? Process information more efficiently. Tired eyes? Switch from reading to listening for a screen break.

How translators, interpreters and language learners can benefit from AI voices

AI voices can enhance your experience and boost your productivity in a variety of ways:

  • Language learning. Practice your listening skills in different languages and dialects by getting AI to read real-world texts (like news articles) to you.

  • Translation. Hear what your translation sounds like with help from an AI voice and make adjustments to the pacing and rhythm of the text as needed.

  • Interpreting practice. Practice simultaneous interpretation with AI as your study buddy. First, ask it to write a speech on a topic (check out this blog post for AI speechwriting tips). Then, have it read it to you while you interpret and record yourself.

  • Interpreting preparation/research. Give AI a list of terms and ask it to generate content using that terminology. Then, have it read the content to you to help with pronunciation* and retention. Alternatively, ask AI to generate a podcast based on all your reference materials for an overview. (I recommend NotebookLM, covered here.)

*AI sometimes mispronounces words. Unsure about pronunciation? Check on Forvoor Youglish.

Free tools with text-to-speech features

Did you know your favorite AI assistants have text-to-speech features? Here’s what you can do – for free – with each AI model.

ChatGPT’s Read Aloud and Voice Mode features

ChatGPT has two voice interaction features: Read Aloud and Voice Mode. Both are available on web and mobile (Android, iOS).

Use the Read Aloud feature to get ChatGPT to generate content in multiple languages and accents and read it to you.

First, ask ChatGPT to generate a speech. Specify the topic, word count, language, and accent/language variant. For example, “Generate a 300-word speech about the game Settlers of Catan in EN-US.”

Pro tip: Specify ISO language codes for language and country like EN-US, ES-MX, or PT-BR.

Click the speaker icon (Read Aloud button) in the interaction menu below. To pause the reading, click the button again.

Audio option icon for listening to text in ChatGPT.

Read aloud icon in ChatGPT

To voice chat with ChatGPT, click the microphone icon near the input box. Press X when done. ChatGPT transcribes the conversation and lets you replay its responses.You can interrupt ChatGPT to change the language or subject or ask questions at any time.

The Voice Mode allows you to interact with ChatGPT using your voice

Use Voice Mode to send messages using voice input for faster, easier communication.

💡 Tip: Before using Voice Mode, protect your privacy. Click your profile picture and choose Settings. Select Data Controls. Turn off “Improve the model for everyone.” Turn off “Include audio recordings” on your phone. To change ChatGPT’s voice, select Speech > Voice, and choose from the nine options.

Gemini’s Listen and Live features

Gemini (Google’s AI) can read text out loud (web, iOS, Android) as well as have live conversations (mobile only).

To use Gemini’s Listen feature, ask it to generate a speech (specify the topic, language and word count) or paste in a text. Then, click the three dots at the bottom (More button) and choose Listen. To pause, click the More button, then Pause.

Listen option in dropdown menu for reading a text aloud.

Activate the text-to-speech Listen feature to hear the content read aloud.

To voice chat with Gemini via Gemini Live, open the mobile app and tap the sparkle icon (Live button) on the bottom right. You can interrupt Gemini to change the language or subject at any time. Gemini Live transcribes the conversation and lets you listen to its responses.

Mobile interface showing voice assistant activation icon for voice commands.

Access Gemini Live commands easily on your mobile device thanks to the app

To change the AI voice, tap your profile picture, then choose Settings > Gemini’s Voice and choose from 10 options.

Perplexity’s voice conversation feature

Perplexity (available on iOS and Android) supports voice conversations and is free for a limited number of queries.

One neat thing: This tool displays a live transcription of what the AI voice is saying as it speaks – which neither ChatGPT nor Gemini currently do.

Scrabble board with the word 'FOCUS' and text-to-speech audio visualization in Perplexity

Words appear in real time on Perplexity

To use voice interaction on Perplexity, tap the sound wave button on the bottom right, then press and hold the mic icon to speak. Press X when done.

Voice input icon in a search bar allowing voice-based queries in Perplexity

Search hands-free with voice input functionality with Perplexity

NotebookLM’s podcast feature

NotebookLM’s podcast feature NotebookLM (covered in this blog post) lets you create a “podcast” from multiple sources (including links, Word docs, Google docs, audio and video files, and PDFs).

To use this feature, upload your sources. Under Audio Overview, click “Generate” to generate a podcast in English or “Customize” for another language.

(To create a podcast in another language, instruct NotebookLM that the hosts should ONLY use that language. More on multilingual podcasts here.)

Paid apps with advanced text-to-speech

While the free tools mentioned above can only read aloud their generated content, paid AI text-to-speech tools will read any text, in almost any language, in a myriad of voices.

With 200+ AI voices covering 50+ languages and support for 20+ file formats, NaturalReader (web, iOS, Android, and Chrome extension) is the market leader and the TTS tool governments and Ivy League schools use for accessibility.

Two neat features in NaturalReader: Change the reading speed (great for language learners) or “train” it to pronounce tricky words correctly.

ElevenLabs (web, iOS, Android) features 1000+ voices in 29 languages (and variants), and reads any text you type or paste.

Like NotebookLM, it turns written content into podcasts – and you can edit the podcast script.

See what these tools can do for you

Ready to explore the best free and paid text-to-speech tools available today? See how they might support your language learning, proofreading, assignment prep, practice sessions, and research.

While exploring all their nifty features, note their limitations. Issues like faulty pronunciation, high latency, and weird pacing/intonation make AI voices less effective than human voices – making us, at least for now, irreplaceable.

Previous
Previous

Celebrate our LinkedIn challenge champions! 🏔

Next
Next

Make your interpreting glossaries pop with visuals, audio, and links