Perfect Pronunciation Tips and Tricks

There are a few simple changes you can make to your script to perfect your TTS voice's pronunciation and naturalness

By adding common punctuation marks and descriptive language, you can enhance your content and improve the naturalness of your synthetic voice and the overall delivery.

Adding Pauses

  • Adding a Period (. full stop) to your text would create a long pause in the speech, and you would be best to use it in the end of a thought or sentence. Writing text for audio output has slightly different rules than writing text for reading, so adjust your pausing accordingly.
  • If you need a shorter pause, you can use a comma (,) to separate clauses and emphasize a part of a sentence.
  • Using a dash (-), a double dash (--), an em dash (β€”) can add emphasis and it introduces longer pauses. Adding an ellipsis (...) to your text can introduce longer pauses, which can convey hesitation or nervousness. Use them between words and experiment until the appropriate emotion is conveyed.

"Um... well, I’m not sure..."

"We - need - to hurry."

Adding Emotion

  • Question marks (?) and exclamation marks (!) carefully positioned in your text, will change the nuance and the emotion of the voice model.
  • Using dialogue tags is another great way to ensure that the TTS voice will express a specific emotion. For example, you can use some basic dialogue tags: said, replied, asked, answered, or some emotion-heavy ones such as: laughed, sighed, shouted angrily, confused etc.

"I don’t know," she said.

"Are you sure this is the right way?" he asked.

"Yes, that makes sense," she replied slowly.

"This is amazing!" he laughed.

"I guess there's no other choice...", she sighed.

"Stop doing that!" ,she shouted angrily.

"Wait, what do you mean?" he asked, confused.


If there's a specific word that is being mispronounced, you can also change the specific pronunciation of a given word using the Dictionary.


What’s Next

Find out more about the Dictionary