📣 New Feature Launch: AutoFix 📣

🛠️ We are thrilled to introduce AutoFix, a powerful new feature designed to enhance the quality of Text-to-Speech (TTS) output by automatically detecting and fixing hallucinations and artifacts generated by TTS models across all 15 providers in AudioStack.

Why use AutoFix?

As we continue to onboard more niche TTS providers, some models may generate unwanted audio hallucinations or artifacts due to rapid development and technical limitations.

Traditionally, users have had to manually review and regenerate these problematic assets, which is both time-consuming and cumbersome. AutoFix streamlines this process by automatically detecting and regenerating faulty assets, saving you time and ensuring high-quality audio outputs.

Examples of Issues AutoFix Resolves:

AutoFix eliminates various types of hallucinations and artifacts, such as 👻 ghostly sounds, 🧟 distorted voices, and 🚌 background noise. Listen to some examples here

How Does AutoFix Work?

  • Quality Scoring: AutoFix evaluates TTS assets for speech quality and background noise using an internal scoring system.
  • Automatic Regeneration: If an asset is flagged for hallucinations or artifacts, AutoFix automatically regenerates it, ensuring cleaner, higher-quality output.
  • Consistent Results: This automated process reduces manual quality assurance and improves the reliability of your TTS assets.

Pricing

AutoFix is available at a cost of 💰5 production credits per minute of audio. You can easily activate AutoFix by setting useAutofix = True in your API call.

Example API call:

curl --location '<https://v2.api.audio/speech/tts>' \\
--header 'Content-Type: application/json' \\
--header 'Accept: application/json' \\
--header 'x-api-key: <your key here>' \\
--data '{
    "scriptId": "<your script id here>",
    "voice": "jeremy",
    "useAutofix": true
}'
🚧

Disclaimer

While AutoFix significantly reduces artifacts in generated assets, we strongly encourage users to manually review the quality of all assets before publishing.


🗣️STS (Speech to Speech) Voices Update

We have just added more STS voices to our library, bringing the total of the voices that support STS technology to an astonishing 287 🤯.

This makes 🦜 AudioStack's STS library the biggest and most diverse. STS is delivering incredible value to creative customers by producing lifelike speech with unparalleled naturalness!

📘

One thing to note:

The accent of the source speaker will be transferred to the resulting STS asset, so it is recommended to choose voices that are closer to your accent for the speech transfer (i.e. if your accent is American, choose a voice whose accent is tagged as American)


🎙️ More Azure Multilingual Voices!

We’re excited to announce that our Text-to-Speech (TTS) library just got an upgrade! 🌍🎙️

We’ve added 22 new multilingual voices ,that speak 50+ languages 🥳 to the library, making it more diverse and flexible for all your projects! 🗣️✨ Whether you need voices for different languages, accents, or styles, we’ve got you covered.

Try them out and let us know what you think! 😄 Here are their aliases:

'Wei-wei 🇨🇳’, 'Denisa 🇪🇸’, ‘Sunny 🇰🇷’, ‘Ezra 🇺🇸’, ‘Branson 🇺🇸’, ‘Marieta 🇺🇸’, ‘Meral’ 🇺🇸, ‘Hale 🇺🇸’,
‘Haoyu’ 🇨🇳, ‘Christophe 🇨🇳’, ‘German 🇧🇷’, ‘Tan 🇫🇷’, ‘Antonette 🇺🇸’, ‘Thanos 🇺🇸’, ‘Sharita’ 🇺🇸,
‘Deno’ 🇺🇸, ‘Michae 🇺🇸’ , ‘Paden 🇺🇸’, ‘Furkan 🇺🇸’, ‘Chrishawn 🇺🇸’, ‘Sura’ 🇺🇸, ‘Deven:us:’

Happy creating with AudioStack! 🎶

In the last release of the AudioStack Platform, we added lots of new functionality.

SonicSell:

  • You can now specify the accent you'd like a voice to have. Simply select the language of your script and choose an appropriate accent from the dropdown.
  • We added the audioform ID (used to identify your asset) to all ad cards, to make it easier to work out if you're editing the right version.

Recording Booth:

  • Made it easier to record without a script, and added a "Save As" button to make it easier to find your recorded files.

Platform:

  • Report an Issue with the click of a button. Our team will be notified so can more easily troubleshoot issues.
  • Clarified the acceptable file names in our file upload modal, to improve the UX of file upload.

Speech Playground:

  • Added option to customise the asset name so that when you share an asset, the recipient can tell what it is.

Developers:

  • We renamed the "AudioStack for Developers" page to "API Key", based on feedback, to make it easier to find your API key.

In this release, we have added small new features to several workflows as well as bug fixes and UX improvements across the platform.

Platform Updates

  • Workshop: You can now undo and redo changes.
  • Workshop: Improved UX of the asset length. This will either be autodetected, in which case the estimated length will be shown, or it can be selected by the user, in which case a character limit is displayed.
  • Workshop: Pronunciation tips have been added to help you to add expression to your TTS using punctuation.
  • Workshop and SonicSell: The customise button has been changed to Advanced Controls and repositioned to make its function clearer.
  • Workshop and SonicSell: Increased the amount of default fade out time on audio ads based on customer feedback.
  • Voice Library: We added tags to voice cards to make it clearer which voices can be used for Speech-to-speech.
  • Recording Booth: Made it possible to easily view your saved file when recording is completed.
  • Platform: We added a button to easily report when something has gone wrong in your session. You can use this to report bugs directly to the development team.

Bug Fixes 🐛

  • Resources/ Content: Fixed overflow of private sound table in sound library modal
  • Resources/ Content: Fixed folder select dropdown overflows box
  • Resources/ Content: Fixed a bug where speech-to-speech was downloading with the wrong file extension
  • Media File Uploads: Added forward slashes when joining folder names with file names to avoid path issues
  • Workshop: Disabled preview button when template sample is missing
  • Workshop: Fixed safari clipboard permissions

Platform Updates

  • Dictionary: Improved layout for the dictionary, to make it easier to use and understand.
  • SonicSell: Made it easier to go back to SonicSell when opening your results in Workshop.
  • SonicSell: Voice selection will now exclude less relevant voices such as whispering and child voices.
  • Save to files: Easily upload created assets from your workflows to the Resources/ Content files area.

😿 As of September 2, 2024, the following voices from our provider messner will be retired: cynthia, senior_ralph, albert, shelly. If you are actively using these voices, please reach out to us for alternative voice suggestions to replace them in your content.

Please note that a number of voices from our provider Eleven Labs have been deprecated today and will no longer be available for use. The voices are: ‘russ’, ‘violet’, ‘steve’, ‘aubrey’, ‘cesar’, ‘kennard’, ‘austin’, ‘talia’, ‘katie’, ‘cole’, ‘nigel’, ‘doug’, ‘duncan’, ‘clarion’.

If you have any questions please get in touch at support [at] audiostack [dot] ai.

Our latest release brings AudioStack's workflows to a whole new level when it comes to stability. We've added:

Platform Features

  • You can now download stems for an audio asset in Workshop.
  • It's now possible to create an audio asset with no sound template in Workshop.
  • You can now view assets created in SonicSell or Workshop in the Library using the dropdown menu.
  • We added status messages to SonicSell, so you can keep track of how your ad generation is going.
  • Improved error messages in SonicSell, so it's easier to tell when something's gone wrong.

UX Improvements

  • We renamed Dashboard to Home (it makes for a better German translation!)

API Improvements

  • Improved /voice/select and /sound/select endpoints to make sure that users always receive a large, varied selection of voices or sound designs.
  • Audioform (asset) generation process is now asynchronous, improving performance and scalability.

Bug Fixes:

  • Fixed missing audio processing module on produce one section route leading to better reliability when generating speech.
  • Corrected usage of logging module for SES, reducing crashes when generating speech.
  • Fixed a bug with asset length being incorrectly set in workshop.

We're pleased to announce the addition of 15 new voices to the AudioStack voice library, from our provider Wellsaid Labs.

The voices are:

  • conversational abbi
  • promo antony
  • promo fiona
  • conversational hannah
  • conversational issa
  • convesational jack
  • conversational jay
  • promo jay
  • conversational jimmy
  • conversational lorenzo
  • conversational lulu
  • conversational oliver
  • conversational shelby
  • promo shelby
  • promo terra
  • In the voice library, you can now filter voices by technology, to easily find which voices can use speech-to-speech or TTS.
  • We also added support for organisations who wish to compare different (human) voice actors' sample recordings inside the AudioStack library - please contact us if you're interested in adding this for your organisation.
  • We levelled up our file browsing experience in the Resources/Content page, to better support a large volume of files.

New voices added 💯

  • We added a few new voices from Eleven Labs- try them out!
  • Daymon & Deena (great for natural, conversational speech)
  • Laureen & Will (great for upbeat, expressive narration).

You can try them out at library or in the API directly.

Better multi-lingual STS

  • STS now uses multilingual model - leading to better quality for non-english speech. Try it out in German or Hindi