Voices & Text-to-Speech

Basics

The latest generation of AI models can transform written text into high quality Speech and offer a lot of possibilities to make speech expressive. However, it involves specific knowledge (for example understanding machine learning) and careful consideration to make synthetic speech sound convincing for any given use case: it often requires careful annotation, parameter tuning, content editing to account for inefficiencies in the voice, as well as additional processing of the speech produced.

AudioStack not only offers a wide variety of different voice models from various providers, but also offers guardrails to make the production of synthetic speech robust and scalable.

AudioStack also makes it easy to create a frontend for your users to interact with synthetic Speech, assures quality for well known TTS challenges such as the pronunciation of names or to directly connect to an established source to minimise failure rate.

Want to Browse Available Voices?

The AudioStack voice library gives examples of the range of voices available, across a wide variety of providers and languages. To browse the voice library, click here.

Code Example

Let's get started generating speech using some of the available voices from the AudioStack library. Firstly, we recommend to create a script and then, render speech from it. Below, we'll use 4 different voices and download the TTS file.

import audiostack
import os

audiostack.api_key = "APIKEY"

script = audiostack.Content.Script.create(
  scriptText="Our Text-to-speech provides harmonious access to more than 8 external TTS providers.\
              Our single interface ensures no matter the provider your script content will be \
              synthesized to the highest quality."
)
  
print("Response from creating script", script.response)

#store scriptId for later
scriptId = script.scriptId

#Let's do  4 requests - 1 for each voice :) 
for v in ["sara", "joanna", "conrad", "liam"]:
    item = audiostack.Speech.TTS.create(scriptItem=script, voice=v)
    print(item.response)

# We'll get our files with the list method
tts_files = audiostack.Speech.TTS.list(scriptId=scriptId)
print(tts_files.response)

for tts in tts_files:
    print("getting", tts.speechId)
    item = audiostack.Speech.TTS.get(tts.speechId)
		# We'll download each file
    item.download(fileName=item.speechId)

# We'll list the rendered files
tts_files = audiostack.Speech.TTS.list(scriptId=scriptId)
print(tts_files.response)

for tts in tts_files:
    # At this point we can delete the files (not needed anymore)
    item = audiostack.Speech.TTS.get(tts.speechId)
    r = item.delete()
    print(r)

print("Cost for this session: ", audiostack.credits_used_in_this_session())