Introduction to some AudioStack concepts

This section is all about what concepts you'll need to understand to use AudioStack. We'll link to some other material, but after reading this you should have a solid understanding of the concepts behind AudioStack

What can AudioStack do?

The AudioStack platform brings together powerful functionality across the end-to-end audio production process to empower you to create beautiful, professional-sounding audio in seconds.

API vs User interface

This Guide focuses on the Audiostack application programming interface (API). Audiostack also features focussed user interfaces to leverage the API in a simple way. The documentation for the workflows can be found here.

The API is architected around Audioform. This technology allows you to describe and build fully featured pieces of audio containing voices, sounds and music, professionally mixed, using the content provided by Content libraries.

The API is divided in Content libraries management and Audioform generation.

Content libraries

Audioform combine text and different kind of content into a piece of audio. This content is available in pre-populated and user-defined libraries:

Voice library (Audiostack): generate speech with voices and text
Sound template library (Audiostack, and User-defined): use sounds beds for your audio.
Sound effect library (Audiostack): bring life to your audio by adding sound effects
Media file library (User-defined):
- use pre-existing voice recordings to clone voices
- use pre-existing voice recordings and change the voice of the recording
- re-use pre-existing mixes and sounds

Audioform

Audioform is the combination of two concepts:

The Audioform format: it is the description of a piece of audio which utilises elements from our libraries
The Audioform service: use the service to build an Audioform

Audioform format

An Audioform is written in JSON. It is a JSON object that contains key information:

a header describing the Audioform version
an assets property containing various elements of our Content libraries as well as TTS and STS assets to generate speech
a production property containing the arrangement and various mixing options. The arrangement is the layout of all the assets over time.
a delivery property containing the delivery format, such as loudness and audio file format.

The Audioform defines TTS and STS Assets to generate speech:

a TTS Asset use a Voice from the Voice library with a text to produce spoken words with that Voice
a STS Asset use a Voice from the Voice library and a recording of a voice stored in the Media file library to produce spoken words with that Voice.

Audioform service

This is the Audiostack endpoints that allow you to:

submit an Audioform to be built
retrieve the build result, an download the audio file

The service also contains some helper services, such as our Advertising Brief endpoint that takes an Ad script or an Ad description and turn it into one or several Audioform.

Content services

This is a collection of Audiostack endpoints that allow you to access the libraries:

Sound template service
Voice service
Sound effect service
Files service

Those endpoint enable listing the content of the libraries, favourite some elements, and customise the content to your need.