Reduce Speech Duration

Reduce the duration of your speech file to fit within a fixed duration.

Often, our users need to "condense" a Speech file to make sure that it fits alongside their other assets. Common use cases include meeting a 10 or 15 second maximum duration for advertising purposes, such as in a "tag" at the end of a commercial.

To do this, you can use the speech/TTS/reduce endpoint.

tts = audiostack.Speech.TTS.reduce(speechId=speechId, targetLength= 5)

Our specialised engine takes your speech and intelligently removes silence, reduces punctuation, speeds up speech, and removes padding to achieve the desired duration while maintaining the "naturalness" of speech.

How to reduce your speech length to fit in a fixed duration using the Python SDK

Reducing the duration of your speech is very straightforward. Try running the following example, using your own API key in line 5:

import audiostack
import os

audiostack.api_key = "APIKEY"

def tts_reduce_examples(req):
    scriptText = f"""
    <as:section name="main" soundsegment="main"> 
    {req["text"]}
    </as:section>"""

    print(f"Generating your script")
    script = audiostack.Content.Script.create(scriptText=scriptText, scriptName="test")

    print(f"Synthesizing speech")
    tts = audiostack.Speech.TTS.create(scriptItem=script, voice=req["voice"], speed=req["speed"])

    print(f"Reducing length to meet target")
    tts = audiostack.Speech.TTS.reduce(speechId=tts.speechId, targetLength=req["targetLength"])
    
    timelineProperties= {
        "forceLength" : req['targetLength'],
        "speechStart" : 0,
        "fadeIn" : 0,
        "fadeOut" : 0,
    }

    print(f"Applying auto mixing and mastering")

    mix = audiostack.Production.Mix.create(speechItem=tts, exportSettings={"ttsTrack" : True}, masteringPreset="", timelineProperties=timelineProperties)

    print(f"Preparing for download")
    encoder = audiostack.Delivery.Encoder.encode_mix(
        productionItem=mix,
        preset="custom",
        sampleRate=44100,
        bitDepth=16,
        public=False,
        format="wav",
        channels=2,
        loudnessPreset="podcast"
    )

    encoder.download(fileName=req["name"])
    print(encoder)

if __name__ == "__main__":

    examples = [
        {
            "name": "reduced_example1",
            "text": "Parrots are highly intelligent and social birds known for their vibrant plumage and remarkable ability to mimic sounds, including human speech. These colorful avian companions are found in tropical regions around the world and are known for their playful and affectionate nature, often forming strong bonds with their human caregivers. Some parrot species, like the African grey parrot, are renowned for their exceptional problem-solving skills and complex vocalizations, making them one of the most captivating and cherished pets in the avian kingdom..",
            "voice": "Bryer",
            "speed": 1.00,
            "targetLength": 20
        },
        {
            "name": "reduced_example2",
            "text": "Parrots exhibit an astonishing diversity of colors and patterns, with some species showcasing brilliant blues, radiant reds, and vivid greens in their plumage, creating a mesmerizing spectacle in their tropical habitats. These avian marvels possess a remarkable capacity for learning and communication, often forming tight-knit social bonds within their flocks and demonstrating an uncanny ability to mimic a wide range of sounds, from melodies to everyday noises. Parrots play a crucial role in their ecosystems as seed dispersers, aiding in the propagation of various plant species by consuming fruits and excreting seeds in new locations, contributing to the biodiversity of their environments.",
            "voice": "Cosmo",
            "speed": 1.20,
            "targetLength": 25
        },
    ]

    for req in examples:
        res = tts_reduce_examples(req)
        print(res)

Updated 11 months ago