Using the API to generate fully produced audio

An end to end example to generate high quality audio with the API

Introduction

AudioStack enables a number of methods to produce audio. One challenge you'll run into as you mix together speech and sound design (or music) is that it won't sound good out of the box. For this reason we built our own Mastering service. This will "enhance" the quality of your audio by using it.

AudioStack's Speech (Voices) and Sound Designs (Sound Templates) are very diverse in quality, frequency ranges and dynamics.

Check out AudioStack's voice library and sound library to see what we offer. We are constantly expanding the sources for these.

  • Our Voices are from several providers and vary in tonalities, dynamics and frequency ranges.

  • Music and Sound Effects (sfx) come at different loudnesses (RMS) and have diverse tonal qualities.

Why do I want to mix audio with AudioStack

To manage the diversity and variability of our different Voices and Sound Designs, it is necessary to standardize (master) the quality of our mixes to produce great sounding audio.

AudioStack's Auto Mixing Service analyses each component in each mix and applies an individualized mastering signal chain to enhance the output quality. These signal chains will be different based on the music, sound effects and voice characteristics.

[Callout] This is an out of the box solution that improves sound quality by applying an audio treatment tailored to enhance each individual mix.

How to mix auid via AudioStack's python SDK

Prerequisites

First step we'll set up our API connection and we'll write a short script.

import audiostack

audiostack.api_key = "<your API Key>"  ##  and fill me in!

2. Create your script

script_text = """
<as:section name="main" soundsegment="main">
Audio mixing and mastering enhance the overall quality of a music or audio production by balancing and refining its elements, resulting in a polished and professional sound that captivates and engages the audience.
</as:section>
"""

script = audiostack.Content.Script.create(
    scriptText=script_text, scriptName="my_test_script", projectName="ams_tutorial"
)

print(script)

[Callout] If you see an error like Exception: Access denied. Could not find a valid organization., please check you API Key is valid.

3. Create some speech and mix in some music

We'll use two different voices from our voice library (for variance). We'll keep the preset and the sound templates constant.

For each voice we will

  1. Render the script into speech using AudioStack's text-to-speech.
  2. Mix in the background music
voices = ["sara", "quincy"]  # Use the API Alias from our voice library

for voice in voices:
    # Creates text to speech
    speech = audiostack.Speech.TTS.create(
        scriptItem=script,
        voice=voice,
    )
    print(speech)
    # Mixes speech and sound design.
    mix = audiostack.Production.Mix.create(
        speechItem=speech,
        soundTemplate="funky_summer",
        masteringPreset="balanced",
    )
    print(mix)
    # Mixes speech and sound design.
    mix.download(fileName=f"ams_tutorial_{voice}_balanced_funky_summer")

Have a listen to the two files created. For each audio file, AMS optimised the mix between the speech and music to create a quality output.

What is in a mix?

Let's break down these calls.

To create a mix, we need:

  • speechItem - this is the output speech from our text-to-speech processes - i.e. what is returned from audiostack.Speech.TTS.create()
  • soundTemplate - a sound template or sound design is a collection of sound effects, background music and various other effects.
  • masteringPreset you'll often want to make sure that your audio sounds good for your environment. We have a range of presets as well. See here

We call Production.Mix.create and we create a Production.Mix object from that. In this example we mix (speech, soundTemplate, masteringPreset).

We can then download and listen to the mix using the download call.

What next?

To find out more checkout these resources: