How to use SSML Harmonisation with multiple voice providers

Creating a Custom Audio Program with AudioStack Using SSML

In this tutorial, we'll create a custom audio program using the AudioStack API and SSML (Speech Synthesis Markup Language). We will define a script with various SSML tags, convert it to speech, create an audio mix, and then encode the mix into an MP3 file.

πŸ“˜

Custom SSML tags

We created our our custom SSML tags, and these work across most providers, if they don't work they'll break with no issues. Allows you to quickly experiment with different voice providers.

Prerequisites

  1. Python installed on your system.
  2. An API key from AudioStack. http://platform.audiostack.ai
  3. The audiostack library installed. You can install it using pip:
    pip install audiostack
    

Steps to Create the Audio Program

  1. Import Required Libraries: Start by importing the necessary libraries.

    import audiostack
    import os
    
  2. Set Up Your API Key: Assign your AudioStack API key to the api_key variable.

    audiostack.api_key = os.environ['AUDIOSTACK_API_KEY']
    

    Replace "AUDIOSTACK_API_KEY" with your actual API key.

  3. Define the SSML Script: Create a string variable with your SSML content.

    script_text = """
    <as:section>
        This is a normal sentence.
        <as:break time="2s"/>
        This sentence has a <as:prosody pitch="x-high">high pitch</as:prosody>.
        <as:break strength="strong"/>
        Check the <as:spell-out characters="FAQ"/> section on our website.
        She won the <as:ordinal number='1st'/> place in the competition.
        My phone number is <as:telephone number="555 536269"/>.
    </as:section>
    """
    

    This script uses various SSML tags:

    • <as:break> to insert pauses.
    • <as:prosody> to change the pitch of the speech.
    • <as:spell-out> to spell out characters.
    • <as:ordinal> to read out numbers as ordinals.
    • <as:telephone> to format telephone numbers.
  4. Create the Script: Use the AudioStack API to create a script from your SSML text.

    print("Creating your script...")
    script = audiostack.Content.Script.create(scriptText=script_text)
    print(script)
    
  5. Generate Speech: Convert the script to speech. We are going to use three voices from three different providers.

# Generate speech from the script
print("Generating speech...")
for voice_name in ["wren", "bernardo", "bernadette"]:
    speech = audiostack.Speech.TTS.create(scriptItem=script, voice=voice_name)
    print(speech)
  1. Create the Mix: Combine the speech with a mastering preset to create an audio mix.
print("Creating your mix...")
mix = audiostack.Production.Mix.create(
        speechItem=speech,
        masteringPreset="podcast", public=True
    )
print(mix)

7.Encode the Mix: Encode the mix into the desired format (MP3).

encoder = audiostack.Delivery.Encoder.encode_mix(productionItem=mix, preset="mp3")
encoder.download(file=".")

This will download the encoded audio file to your current directory.

Full Code

Here is the complete code to create a custom audio program with SSML:

import audiostack
import os

# Set up your API key
audiostack.api_key = os.environ['AUDIOSTACK_API_KEY']

# Define the SSML script
script_text = """
<as:section>
    This is a normal sentence.
    <as:break time="2s"/>
    This sentence has a <as:prosody pitch="x-high">high pitch</as:prosody>.
    <as:break strength="strong"/>
    Check the <as:spell-out characters="FAQ"/> section on our website.
    She won the <as:ordinal number='1st'/> place in the competition.
    My phone number is <as:telephone number="555 536269"/>.
</as:section>
"""

# Create the script from the SSML text
print("Creating your script...")
script = audiostack.Content.Script.create(scriptText=script_text)
print(script)

# Generate speech from the script
print("Generating speech...")
for voice_name in ["wren", "bernardo", "bernadette"]:
    speech = audiostack.Speech.TTS.create(scriptItem=script, voice=voice_name)
    print(speech)

    # Create an audio mix from the speech
    print("Creating your mix...")
    mix = audiostack.Production.Mix.create(
            speechItem=speech,
            masteringPreset="podcast", public=True
        )
    print(mix)

Explanation

  • Creating the Script: We first create a script using the provided SSML text. This script is then used as input for the text-to-speech (TTS) process. One of the key things we've added here is we're using our SSML harmonisation functionality.
  • Generating Speech: The script is converted to speech using the TTS API. We use three voices here. wren from ElevenLabs, bernando from Cereproc, and bernadette from PlayHT
  • Creating the Mix: The speech is combined with a mastering preset to create a polished audio mix.
  • Encoding the Mix: The final mix is encoded into a specific format (MP3 in this case) and downloaded.

This tutorial demonstrates the basic workflow for creating a custom audio program with AudioStack using SSML, from defining the script to downloading the final audio file.


What’s Next

You should read the SSML tags docs in more detail