Comparing Two Voices by the Length of their Speech

Introduction

In this tutorial, we will compare two different text-to-speech voices by measuring the length of the generated speech in seconds using the AudioStack API. We'll walk through the process of setting up the API, defining the text to be converted to speech, generating the speech, and finally comparing the lengths of the two voices.

This will help you pick the correct voice, since different voices speak faster than others. This is built on our proprietary model

🚧

Error message

If you get the following error message, please reach out to support with what voice you're using. We're constantly updating our voices and sometimes there's a lag time with our ML model. Please email [email protected] with the error message and your code.

Exception: prediction cannot be calculated. Errors listed as follows:
	Unable to find a profile for the selected alias
 

Prerequisites

Before we begin, make sure you have the following:

  1. An AudioStack API key.
  2. Python installed on your machine.
  3. The audiostack Python package installed.

You can install the audiostack package using pip:

pip install audiostack

Step 1: Set Up the API Key

First, we need to set up our API key. This key is necessary to authenticate our requests to the AudioStack API. We will store the API key in an environment variable for security reasons.

import audiostack
import os

# Set up your API key
audiostack.api_key = os.environ['AUDIOSTACK_API_KEY']

Make sure you have your API key stored in the AUDIOSTACK_API_KEY environment variable. You can set this variable in your terminal session or in your Python script directly (not recommended for security reasons).

Step 2: Define the Text to Convert to Speech

Next, we define the text that we want to convert to speech. The text is structured using the AudioStack markup to define sections and subsections. This helps in organizing the text for different sound segments.

long_text = """
<as:section name="section_1" soundSegment="intro">
    Who was your footballing hero?
</as:section>

<as:section name="section_2" soundSegment="main">
    <as:sub name="sub_1"> Puyol from Barcelona and Maldini from Milan </as:sub>
    <as:sub name="sub_2"> The Barca the dream team that won the first European cup was my team and the hero
was I'd say Ronald Koeman, who was the centre back for Barcelona at the time but I had so many heroes
at that team which was Michael Laudrup, you've got Pep Guardiola, you've got so many
but Ronald Koeman was always the one for me. </as:sub>
</as:section>
"""

Step 3: Generate Speech for Each Voice

We will generate speech for two different voices and measure the length of the generated speech. The audiostack.Speech.Predict.predict function will be used to convert the text to speech and return the data, including the length of the audio.

voices = ["sara", "jeff"]
for v in voices:
    data = audiostack.Speech.Predict.predict(text=long_text, voice=v)
    print(v, data.length)

Here, we have chosen two voices: "sara" and "jeff". The predict function is called for each voice, and the length of the generated audio is printed.

Output

When you run the script, you will see the length of the speech generated for each voice. For example:

sara 27.06657932827256
jeff 21.695155698420557

This output shows that the speech generated with the voice "sara" is approximately 27.07 seconds long, while the speech generated with the voice "jeff" is approximately 21.70 seconds long.

Conclusion

In this tutorial, we demonstrated how to use the AudioStack API to compare the lengths of speech generated by two different voices. This can be useful for various applications, such as selecting the most suitable voice for a specific purpose based on the length of the output. By following the steps outlined, you can easily adapt this process to compare other voices or texts.