How to Detect and Fix Imperfect Audio

AutoFix is a powerful new feature designed to enhance the quality of Text-to-Speech (TTS) output by automatically detecting and fixing hallucinations and artifacts generated by TTS models across all 15 providers in AudioStack.

As we continue to onboard more niche TTS providers, some models may generate unwanted audio hallucinations or artifacts due to rapid development and technical limitations.

Traditionally, users have had to manually review and regenerate these problematic assets, which is both time-consuming and cumbersome. AutoFix streamlines this process by automatically detecting and regenerating faulty assets, saving you time and ensuring high-quality audio outputs.

Examples of Issues AutoFix Resolves:

AutoFix eliminates various types of hallucinations and artifacts, such as πŸ‘» ghostly sounds, 🧟 distorted voices, and 🚌 background noise. Listen to some examples here

How Does AutoFix Work?

  • Quality Scoring: AutoFix evaluates TTS assets for speech quality and background noise using an internal scoring system.
  • Automatic Regeneration: If an asset is flagged for hallucinations or artifacts, AutoFix automatically regenerates it, ensuring cleaner, higher-quality output.
  • Consistent Results: This automated process reduces manual quality assurance and improves the reliability of your TTS assets.

Pricing

AutoFix is available at a cost of πŸ’°5 production credits per minute of audio. You can easily activate AutoFix by setting useAutofix = True in your API call.

Example API call:

curl --location '<https://v2.api.audio/speech/tts>' \\
--header 'Content-Type: application/json' \\
--header 'Accept: application/json' \\
--header 'x-api-key: <your key here>' \\
--data '{
    "scriptId": "<your script id here>",
    "voice": "jeremy",
    "useAutofix": true
}'

🚧

Disclaimer

While AutoFix significantly reduces artifacts in generated assets, we strongly encourage users to manually review the quality of all assets before publishing.