FAQs

Find out more about what the AudioStack API can do πŸš€

What does AudioStack do?

AudioStack facilitates the creation of professional-sounding audio in a matter of seconds. You can use it to:

With AudioStack, you do not need any audio expertise to create high-quality audio :musical-note:

How long does it take to learn AudioStack?

It generally takes a developer around ten minutes to run the most basic use case. (check our example from Get Started :eyes:) It probably takes another hour to understand the concepts of AudioStack, and for it to feel a bit more natural. If you have questions, don't be shy, we're eager to help. Just send an email to [email protected].

Who uses AudioStack? Engineers or non-technical people?

Anyone who knows some Python or JavaScript and wants to create aprofessional-sounding audio in a matter of second can use AudioStack pretty easily. At this point, a good chunk of AudioStack users are not very technical and we are always working on making it easier and more accessible to everyone while giving more advanced developer all the controls they need.

What kind of software do people build with AudioStack?

Starting with audio advertisments, AudioStack is currently in use to personalise children audiostories and audio toys, to add audio personalisation features to fitness, wellbeing, and meditation apps, to enriching podcast audio, to add voice overs to sales videos, to make conversational AI avatars speak, and to make smart speaker skills more engaging. Stay tuned, it's just the beginning! :wink:

What SDKs do you provide?

AudioStack currently offers SDKs in Javascript and Python. Find out more.

What can you not do with AudioStack?

The goal of AudioStack is to enable everyone with little or no audio expertise to programmatically create high quality audio that can compete with a professional production. This also means that it is not designed for the full control you might want for your musical creation that is supposed to win a Grammy. :sweat-smile: However, if you are struggling to realize a project, we'd love to help or hear about your feature requests. Just send us an email at [email protected].

What if I want a component that AudioStack doesn't yet have?

If you have a use case that is not handled by AudioStack's built-in components, you can build your own custom component to solve that use case. We welcome contributions and you can also submit a feature request. πŸ’ͺ

Is AudioStack secure? Where's my data stored?

We treat Security seriously at AudioStack. If you need further information, please let us know.

How does it compare to Text-to-Speech services and providers?

AudioStack goes beyond text-to-speech. Our product is created by audio enthusiasts that know all about making audio sound good. The cloud technology giants provide amazing Text-To-Speech voices that we are actually offering through AudioStack as well.

However, the focus of AudioStack is to make it easy for developers to use such voices (among others) to build innovative audio solutions. This includes:

  • a wide voice library (including voices for niche use-cases)
  • guardrails to handle text content so it sounds good when converted to speech
  • audio processing that makes the difference between a pure speech track (think GPS or smart speaker)
  • an engaging, fully produced audio piece (think radio ad or podcast) along with connectors that make it easy to integrate audio creation into an existing product or project and make audio creation scalable.

We also have already thought thorough and solved some very complicated problems that developers tend to encounter when building audio and offer tools here that developers love us for. We don't want to brag, but AudioStack often makes it possible to build in a days what takes a development team multiple months when plugging in directly into a Text-To-Speech provider.

Do you offer streaming audio?

At this time we don't support streaming. You will have to download your audio files

I can't seem to be able to download my files. Why?

In order to download your audio files you have two options:

Private (standard)
Please make sure to send your api-key in the header because the file is private.

curl --location 'https://v2.api.audio/speech/tts' --header 'Content-Type: application/json' --header 'x-api-key: <your-api-key>' --data '{ "scriptId":"e94059bb-3677-41d3-a409-632dc5509dd7", "voice":"joanna" }'

Return "url": "https://v2.api.audio/file/ea8616ec-d863-494f-955c-20b4c7c91d3c",

Needs a 2nd request to get the file with key in the header.

curl --location 'https://v2.api.audio/file/ea8616ec-d863-494f-955c-20b4c7c91d3c' --header 'x-api-key: <your-api-key>'

Public
If you add public=True in the tts post request, you’ll be able to download the file without any authentication.

Example:
curl --location 'https://v2.api.audio/speech/tts' --header 'Content-Type: application/json' --header 'x-api-key: <your-api-key>' --data '{ "scriptId":"e94059bb-3677-41d3-a409-632dc5509dd7", "voice":"joanna", "public": true <- this parameter will set the url public }'

Return: "url": "https://v2.api.audio/public-file/12cf08f7-cf63-4ae2-93d4-ae272c91e298", (public url)

How many languages do you support?

We currently support 60+ human languages. More are coming soon. :eyes:

How easy is it to implement the API?

You can get up and running with our API in less than 30 minutes. We are continuously focusing on making that experience easier. You can check how to get started here.

Is the audio output in real-time? How fast and how scalable is it?

Depending on what you want to do, it can take seconds to several minutes to produce a piece of audio. This - along with the strategy to pre-produce certain parts of the creation process - is normally quick enough for the vast majority of the use cases we encounter.

We are prepared event for cases where this is not fast enough - such as conversational use-cases (ie AI chatbots) or cases where users dynamically create audio and really quick feedback. For these we offer a synchronous Text-To-Speech API with response times <=1s as well as some other tools. In terms of scaling, even your most ambitious use case should not be an issue. :sunglasses:

How does "clone your own voice" work? Should the customer record their voice and send it to you?

We have developed a process that allows your users to record their voices in a way that is usable for us to voice model. The resulting model is then available through AudioStack, so you can use it in your audio creation process.

Who is behind AudioStack?

AudioStack is an audio creation environment developed by the AudioStack team - find out more about us here.

I have another question :owlbert-thinking:

Great! We'd love to answer it. Send us an email to [email protected], so we can direct you to the right person. πŸ‘‹