How to Clone any Voice

There are multiple options available for cloning your voice using AudioStack API, depending on your needs.

With AudioStack API, you can access a bigger range of voice cloning options, enabling you to get the right quality for your usecase.

When you clone a voice, you will create a custom voice that will be available for use in the Audiostack API (only to the user that created the voice). To do this, you may use any files uploaded within your organisation.

How can I upload my files for cloning?

There are different options available for uploading your files, depending on whether you're integrating with the API, uploading the files as a developer, or offering voice cloning to people who would prefer to record using AudioStack's simple recording workflow in the Platform:

Using the API

Using the Platform

Using the Python SDK

Which type of cloning should I use?

Standard Cloning

For better quality and more control over the voice, use voice_engine_2. This engine requires at least 20 minutes of data and an audio file where the speaker agrees to have their voice cloned by AudioStack. It's not instant, it typically takes up to few hours for your voice to be available in library.

For non-english voices, we recommended using at least 45 minutes of data.

1200 credits will be charged upon successful voice creation.

Instant Cloning

If you want to clone a voice instantly or just with a little data, use voice_engine_3. Here, you can create a clone with just a few minutes of recordings (although the quality will improve with more data).

300 credits will be charged upon successful voice creation.

How to clone a voice


Create your voice in two steps.

First, make a POST request to with the following payload:

  "fileIds": ["file_id1", "file_id2", "file_id3"],
  "alias": "my-instant-clone",
  "engine": "voice_engine_3",
  "metadata": {
    "gender": "female"

If the files are correct and the alias is globally unique, you will receive a response with a status code of 202, meaning your voice is being created.

Then, make a GET request to to check the status of your voice. If the status is succeeded, that means your voice is ready to be used in AudioStack!

You'll notice that a state contains a discardedFiles field. This field will contain the IDs of the files that were not used in the voice cloning process. You'll also see the reason for their discard.

Engine Details

Max. files amount50025
Min. single file duration1.5 seconds1.5 seconds
Min. total audio duration20 minutes1.5 seconds
Max. total file size500MB10MB
SSML supportYesNo
Avg. synthesis latency for 500 characters4 seconds20 seconds