Text-to-Speech

Basics

The latest generation of AI models can transform written text into high quality speech and offer a lot of possibilities to make speech expressive. However, it involves specific knowledge (for example understanding machine learning) and careful consideration to make synthetic speech sound convincing for any given use case: it often requires careful annotation, parameter tuning, content editing to account for inefficiencies in the voice, as well as additional processing of the speech produced.

AudioStack not only offers a wide variety of different voice models from various providers, but also offers guardrails to make the production of synthetic speech robust and scalable.

AudioStack also makes it easy to create a frontend for your users to interact with synthetic speech, assures quality for well known TTS challenges such as the pronunciation of names or to directly connect to an established source to minimise failure rate.

Code example

First we recommend to create a script and render speech from it. Bellow we'll use 4 different voices to and download the TTS file.

import audiostack
import os

#Make sure you add your own API key here
audiostack.api_key = os.environ["AUDIO_STACK_DEV_KEY"]

script = audiostack.Content.Script.create(
  scriptText="Our Text-to-speech provides harmonious access to more than 8 external TTS providers.\
              Our single interface ensures no matter the provider your script content will be \
              synthesized to the highest quality."
)
  
print("Response from creating script", script.response)

#store scriptId for later
scriptId = script.scriptId

#Let's do  4 requests - 1 for each voice :) 
for v in ["sara", "joanna", "conrad", "liam"]:
    item = audiostack.Speech.TTS.create(scriptItem=script, voice=v)
    print(item.response)

# We'll get our files with the list method
tts_files = audiostack.Speech.TTS.list(scriptId=scriptId)
print(tts_files.response)

for tts in tts_files:
    print("getting", tts.speechId)
    item = audiostack.Speech.TTS.get(tts.speechId)
		#We'll download each file
    item.download(fileName=item.speechId)

#We'll list the rendered files
tts_files = audiostack.Speech.TTS.list(scriptId=scriptId)
print(tts_files.response)

for tts in tts_files:
    # At this point we can delete the files (not needed anymore)
    item = audiostack.Speech.TTS.get(tts.speechId)
    r = item.delete()
    print(r)

print("Cost for this session: ", audiostack.credits_used_in_this_session())