Architecture Deep Dive

This guide provides an overview of AudioStack's API core units

AudioStack allows you to organize content in a way that is optimized to programatically create professional sounding audio.


AudioStack is organized along the lines of a “traditional” audio production process which has 4 parts: Content, Speech, Production and Delivery.

First create the text which normally would be spoken and recorded by a voice. Then the Speech functionality allows you to generate speech from text (text-to-speech aka TTS) with a plethora of synthetic voices being available. Next, select background music for your content from a growing list of professional sound designs, ranging across several genres. Once selected, our sound design engine will do its magic: mixing and mastering of the speech and music, applying EQ, filters, fades, and effects to enhance the sound of the overall track.

The content is organized in Scripts. A script has to be created in the same way you'd write a script for a movie. Afterwards you'll get a scriptIdand you'll be able to refer to this for a long time.

You can also upload existing media files, and manage them. See Files & Foldersfor further information.

For complex use cases, we created the option to generate a project. A project can have multiple modules where your scripts live.

Speech is exactly what it sounds like. It allows you the ability to create beautiful speech, and also specify things like speed of the voice. If you want just to use the best text to speech voices in the market with an easy to use API, you've come to the right place!

For speech, some key variables are voices. We have a voice library that stores 600+ voices from 11 different providers, including our own in-house cloned voices.

Production means that we create a full audio asset, so text to audio, not simply text to speech. This allows you to improve the quality of your speech, to change your format (mp3, alexa, wav), and also adjust aspects of your audio.

Optionally, you can add a background sound to your audio from our sound library. Some of these effects have more and more advanced features, which you'll discover as you explore the API.

Finally, a file (eg an .mp3 or a .wav file) is created. In order to make it sound like a professional audio production, it usually requires various effects to be applied. This is commonly referred to as post processing or mastering.

Core resources

AudioStack relies on a hierarchical data model which makes it easy for you to administer content and keep workloads low while making sure your user data and/or your users’ data is safe. The latter is important as content can be sensitive, most musical assets are subject to licensing and voice data (as well as any voice model imitating that voice) is usually considered personal identifiable information (for more information see our security and ethics section).


When you sign up for AudioStack, an organisation is created. You receive an API key which grants access to the organisation and all data and information contained within the organisation. Unless you are in the enterprise plan you can only create one single organisation and can not share any information outside that organisation. Hence the name of your organisation is normally your company or user name.


Should you need more than one organisation (typically needed when your application allows your users to create audio, rather than you creating audio for your application and users), you will need to sign up for our enterprise solution. This will enable you to issue API keys and create any number of organisations for your users, share certain resources between the organisations of your users, take care of invoicing, reporting and other challenges.


Within the organisation, you can create any number of projects that can help you to keep your content organised. Find out more about projects here.


Within a project, you can create any number of modules. A module can be used for example to thematically group together different versions of the same content. Find out more about modules here.


As mentioned above, a script is an annotated piece of written content that will ultimately be rendered into an audio file. Within a module, you can create any number of scripts.


A script can be divided into script sections. A script section does not only help to organise content but also makes it possible to change parameter settings within a script (e.g. you need to create new script section to switch a speaker).

What’s Next

Check out our Quick Start Guides to get up and running 🚀