[Audio] Bug fixes and usability enhacements
by Peadar CoyleSome bug fixes and usability enhancements
- We added better filtering to our sound templates, thanks to Pascal who pointed out this bug and we fixed it👷
- We made some security improvements to enhance customer trust!
- We fixed a bug in correct MIME type of uploaded audio in our Audio Engine 🐛
Voice Intelligence Layer updates
We're continuously working on our Voice Intelligence Layer.
- We fixed some 11labs errors and artefacts when script texts are long 🐛 Thanks to our customers for pointing this out 💯
- Allows to process long scripts as each section is fragmented in several sections when necessary. This means longer scripts will be processed faster no timeouts 💯
SSML Harmonisation - Break time update
by Peadar CoyleSSML Harmonisation
A commonly asked question by customers is "how do I make sure the same SSML works across providers".
So today we're VERY excited to launch our first (of many) features 🎉 🎂
You can run this on ANY provider and it'll either work or it'll fail. So your code will work with ANY provider and ANY voice
-
body = { "projectName": "__TEST", "scriptText": '<as:section> hello <as:break time="4s"/> worlds </as:section>', }
We showed this in beta to our customers and we got this quote
"This will save us so much time 10% of our bugs are due to things like this" - Software Developer at an Advertising company
How do I get started?
Simply to get stared just update
<break time="150ms"/>
to <as:break time="150ms"/>
Other reading
If you want an update on SSML wikipedia is great.
Documentation updates
by Peadar CoyleBetter previews with sharing
by Peadar Coyle30 New Sound Templates; Better examples
by Peadar CoyleNew Sound Templates
- Added 30 NEW sound templates
Ballsy Rock
Chopper Horns
Chopper Strings
Clap Together
Cool Industries
Dark And Unsettling
Design Grove
Epic Brass Trap
Epic Pace
Ethereal Dream
Fascinating Technology
Friendly Electro Pop
Friendly Fantasy
Garage Banger
Indie Disco
Laid Back Funk
Piano Artist
Power Sports
Pulsing Out Score
Rainbow Rock
Relaxed Vacation
RelaXmas
Rockabilly Mischief
Stepping Up
Sunny Skies
Surfing Dog
Trailer Beast
Vintage Swing
Welcoming Piano
Wild Beast
Have fun sampling these :)
Examples
We've fleshed out our examples.
Video Voice Over use case for a video voice over use case
And we added to our github a news summarizer
We added a beautiful example using our new Sound Template functionality
- Autotagger is enabled.
Media improvements
We fixed some UX problems (thanks to our customers for pointing this out) in the Media endpoints.
You can now do a lot more with media files, namely, place the name of the media file directly in the script using the name="" attribute.
You can also use the id="" attribute to create a placeholder, that can be overwritten in the mastering call:
import audiostack
import os
audiostack.api_key = os.environ["AUDIO_STACK_DEV_KEY"]
response = audiostack.Content.Media.create(filePath="default.wav")
print(response)
mediaId = response.mediaId
script = """
<as:section name="intro" soundsegment="intro">
hello world <as:media name="default.wav" id="file1"/>
</as:section>
"""
script = audiostack.Content.Script.create(scriptText=script)
print(script)
speech = audiostack.Speech.TTS.create(scriptItem=script, voice="sara")
print(speech)
mastering = audiostack.Production.Mix.create(
speechItem=speech,
# mediaFiles={
# "file1" : <any media id of an uploaded file">
# }
)
print(mastering)
Intern showcase
Our intern William built something cool 🆒
https://github.com/aflorithmic/news_article_summarizer have a look here at this news article summarizer - leveraging Beautiful Soup 😍, OpenAI 💯 and our AudioStack API, it takes a URL and produces a beautiful audio summary. Have a play!
Multilingual voices in our Frontend
We updated our library

You can see all of the multilingual voices - plus what languages they support.
Bug fixes
As we build out the AudioStack we're constantly looking for improvements.
- We fixed the trailing silence issue.
New Voices; Voice improvements
by Peadar CoyleNew voices WellSaid labs, ElevenLabs
Exciting news! We've been hard at work collaborating with our providers to bring you more hyper-realistic English voices 🥳 We're thrilled to introduce 35 new voices from ElevenLabs, including 14 American accents 🇺🇸, 19 British accents 🇬🇧, and 2 Australian accents 🇦🇺. Additionally, we've added 41 voices from WellSaid Labs, featuring 28 American English 🇺🇸, 6 British English 🇬🇧, 2 English Mexican 🇲🇽 , 1 English South African 🇿🇦, and 4 English Australian voices 🇦🇺. Head over to our Library (library.audiostack.ai) to explore and try them all out!
Suggestions to try:
Finch, 11L, US
Elora, 11L GB
Conversational_greg WSL AU
Narration_Issa WSL ZA
Narration_lorenzo WSL MX:
**Stay tuned as there’s more coming in the next month!
**
https://library.audiostack.ai/
Voice improvements
We've enhanced our msnr
voices with punctuation symbols and respective lengths, as well as enabling break tags. This should add naturalness and enhanced pronunciation, as well as improved speech flow. Some examples:
Full break is 600 milliseconds.
Comma is 300 milliseconds,
Colon is 300 milliseconds:
Semicolon is 400 milliseconds;
Question mark is 400 milliseconds?
Exclamation mark is 600 milliseconds!
Break tag with three seconds
Break tag with thousand millisecond
Break tag with thousand milliseconds using single quote
Break tag with thousand milliseconds using single quote and no space
Voice improvements
Special characters are supported again in the API: & and %. These are both normalised for English and German voices!
Ever wanted to make your audio perfect for podcasts or youtube?
A common requested feature is to make your audio perfect for a certain platforms. Our audio experts have been hard at work on this 🎧 💯
So we're excited to launch these new mixing features 😍
Some customers wanted more control on the loudness off the mixes. Different platforms have different loudness standards so there is now a new parameter in the encoder endpoint. Can be used as follows:
encoder = audiostack.Delivery.Encoder.encode_mix(
productionItem=mix,
preset="wav",
loudnessPreset="podcast" #spotify, applePodcast, youtube ...
)
Some customers mentioned our voices were a bit harsh (as we were making them loud and excited for advertising). So we implemetned this
mix = audiostack.Production.Mix.create(
productionItem=mix,
soundTemplate="",
masteringPreset="podcast",
timelineProperties = {"padding" : 1.5},
)
Try it out! You'll need to update your AudioStack SDK in Python to v0.0.9
What are the specs?
If you use the following command you'll get back these solutions
audiostack.Delivery.Encoder.list_presets()
"spotify": "-16 LUFS Loudness Integrated and -2 dB True Peak",
"podcast": "-16 LUFS Loudness Integrated and -3 dB True Peak",
"applePodcast": "-16 LUFS Loudness Integrated and -1 dB True Peak",
"youtube": "-14 LUFS Loudness Integrated and -1 dB True Peak",
"lowVol": "-20 LUFS Loudness Integrated and -5 dB True Peak"
Also to get better user experience, we improved some of our explanations of some of these features - you can see that below.
"mp3": "mp3 format 320k (320 CBR)",
"wav": "wav format at 48 kHz sample rate and 16 bits per sample",
"ogg": "ogg format at 320k",
"flac": "flac format at 48 kHz sample rate and 16 bits per sample",
"mp3_very_low": "mp3 lowest quality (~64 kbps VBR)",
"mp3_low": "mp3 low quality (~115 kbps VBR)",
"mp3_medium": "mp3 medium quality (~165 kbps VBR)",
"mp3_high": "mp3 high quality (~190 kbps VBR)",
"mp3_very_high": "mp3 very high quality (~245 kbps VBR)",
"mp3_alexa": "mp3 format mono at 48kHz sample rate",
"mp3_alexa_48br": "mp3 format mono at 48 bit rate and 24kHz sample rate"
We're constantly working on iterating and improving our experience. Let us know what you think.
SonicSell v0.4
SonicSell Beta
If you want an invite to the SonicSell beta email [email protected] or [email protected] and we'll get you invited
Automatic Language Detection:
Users no longer have to select the flag at the top right to select an output language.
We have implemented automatic language detection and it will automatically adjust your output to your input language.
This feature works for English, French, German, Spanish, Italian and Portuguese. 🇬🇧 🇫🇷 🇪🇸 🇩🇪 🇵🇹 🇮🇹

Advanced Mode:
Mood and Tone Indicator (BETA): Users can now suggest what the mood of the ad should be, and what the tone of the ad should be.
Multi-Versions: Users can produce up to 3 versions of the same ad in 1 go.

Virality Link:
Earlier, users would have to download the audio asset and manually share it with their friends and colleagues.
This was a very frictionful process, and wouldn’t work for us from a “word of mouth marketing” perspective.
Now, users can simply share a link that leads to an AudioStack page, where they can listen to the shared ad.
There’s a button there called “Try it now”, which is currently linked to a typeform where users can request access.
We are working on implementing the sonicsell trial flow on the website right now (will have it by next week),
Once that’s shipped, we will link this button directly to that trial flow.

Improved UI/UX:
Earlier, users could only truly generate and work with 1 ad. Not anymore,
Now, you can bulk generate 100s of ads in the same screen, and dynamically customise them as you go (without losing any progress of the other ads built in that session). Of course, once the session is refreshed, all that progress will be lost, since we are not saving the audioforms on a user level yet.
Additionally, users now also have the option to only “edit the ad” and not “generate” it at the same time. They can edit, see what they like and don’t like, and then generate as they please.
Also, now, if a request fails, we show them a beautiful failed card with a button to try to regenerate.
Migrated from WebSocket to HTTP
Earlier, if you were on the app for 30 mins, the websocket would time out which would result in failed generate requests.
Now, that’s no longer the case. With the migration to HTTP, we’ve seen a massive reduction in time-outs and failed requests, improving your user experience! 🚀
Voice Intelligence Layer improvements
We made some huge 🚀 improvements to the voice intelligence layer.
You can see a material improvement in the correctness on our internal data set compared to the NEMO project. We used this on a range of complex normalisation data challenges. We hope this enables you to have a much better user experience.
We also made a few more improvements
- SSML tags are removed before any processes are applied and replaced after. This was causing some edge cases and poor behaviour.
- Bug Breaktags no longer having normalisation incorrectly applied.
Feature Enhancement We also shipped improvements to the Eleven Labs voices. If you use the voiceIntelligence layer it'll apply our voice intelligence layer, and detect the language and pronounce the numbers correctly. For our 🇩🇪 customers this is super valuable.
Here's an example. Use the Wren voice from Eleven Labs for example for this.
"scriptText": "<as:section name='intro' soundsegment='intro'> heute ist der 12.12.2012</as:section>"
Padding Parameters
We added two new padding parameters
- timelineProperties: {padding : seconds} adds padding between each section
- sectionProperties: {sectionName : {padding : seconds}} adds padding after a named section
Loudness presets
We added a range of loudness presets - here's a simple example.
# test list encoder and loudness presets
a = audiostack.Delivery.Encoder.list_presets()
print(a)
# test wav encoding with spotify loudness
encoded = audiostack.Delivery.Encoder.encode_mix(productionId='PRODUCTION_ID',
preset='wav',
loudnessPreset="spotify")
DCO MVP
We shipped a few weeks ago the following digital creative optimisation - this features a fictional audio campaign of a fictional gym and tonic brand!
You can see in this campaign that it personalises the audio based on some specific parameters, so you can better connect with your customers.