Exciting news! We've been hard at work collaborating with our providers to bring you more hyper-realistic English voices 🥳 We're thrilled to introduce 35 new voices from ElevenLabs, including 14 American accents 🇺🇸, 19 British accents 🇬🇧, and 2 Australian accents 🇦🇺. Additionally, we've added 41 voices from WellSaid Labs, featuring 28 American English 🇺🇸, 6 British English 🇬🇧, 2 English Mexican 🇲🇽 , 1 English South African 🇿🇦, and 4 English Australian voices 🇦🇺. Head over to our Library (library.audiostack.ai) to explore and try them all out!
Suggestions to try:
Finch, 11L, US
Elora, 11L GB
Conversational_greg WSL AU
Narration_Issa WSL ZA
Narration_lorenzo WSL MX:
**Stay tuned as there’s more coming in the next month!
We've enhanced our msnr voices with punctuation symbols and respective lengths, as well as enabling break tags. This should add naturalness and enhanced pronunciation, as well as improved speech flow. Some examples:
Full break is 600 milliseconds.
Comma is 300 milliseconds,
Colon is 300 milliseconds:
Semicolon is 400 milliseconds;
Question mark is 400 milliseconds?
Exclamation mark is 600 milliseconds!
Break tag with three seconds
Break tag with thousand millisecond
Break tag with thousand milliseconds using single quote
Break tag with thousand milliseconds using single quote and no space
Voice improvements
Special characters are supported again in the API: & and %. These are both normalised for English and German voices!
Ever wanted to make your audio perfect for podcasts or youtube?
A common requested feature is to make your audio perfect for a certain platforms. Our audio experts have been hard at work on this 🎧 💯
So we're excited to launch these new mixing features 😍
Some customers wanted more control on the loudness off the mixes. Different platforms have different loudness standards so there is now a new parameter in the encoder endpoint. Can be used as follows:
Try it out! You'll need to update your AudioStack SDK in Python to v0.0.9
What are the specs?
If you use the following command you'll get back these solutions
audiostack.Delivery.Encoder.list_presets()
"spotify": "-16 LUFS Loudness Integrated and -2 dB True Peak",
"podcast": "-16 LUFS Loudness Integrated and -3 dB True Peak",
"applePodcast": "-16 LUFS Loudness Integrated and -1 dB True Peak",
"youtube": "-14 LUFS Loudness Integrated and -1 dB True Peak",
"lowVol": "-20 LUFS Loudness Integrated and -5 dB True Peak"
Also to get better user experience, we improved some of our explanations of some of these features - you can see that below.
"mp3": "mp3 format 320k (320 CBR)",
"wav": "wav format at 48 kHz sample rate and 16 bits per sample",
"ogg": "ogg format at 320k",
"flac": "flac format at 48 kHz sample rate and 16 bits per sample",
"mp3_very_low": "mp3 lowest quality (~64 kbps VBR)",
"mp3_low": "mp3 low quality (~115 kbps VBR)",
"mp3_medium": "mp3 medium quality (~165 kbps VBR)",
"mp3_high": "mp3 high quality (~190 kbps VBR)",
"mp3_very_high": "mp3 very high quality (~245 kbps VBR)",
"mp3_alexa": "mp3 format mono at 48kHz sample rate",
"mp3_alexa_48br": "mp3 format mono at 48 bit rate and 24kHz sample rate"
We're constantly working on iterating and improving our experience. Let us know what you think.
Automatic Language Detection:
Users no longer have to select the flag at the top right to select an output language.
We have implemented automatic language detection and it will automatically adjust your output to your input language.
This feature works for English, French, German, Spanish, Italian and Portuguese. 🇬🇧 🇫🇷 🇪🇸 🇩🇪 🇵🇹 🇮🇹
Advanced Mode:
Mood and Tone Indicator (BETA): Users can now suggest what the mood of the ad should be, and what the tone of the ad should be.
Multi-Versions: Users can produce up to 3 versions of the same ad in 1 go.
Virality Link:
Earlier, users would have to download the audio asset and manually share it with their friends and colleagues.
This was a very frictionful process, and wouldn’t work for us from a “word of mouth marketing” perspective.
Now, users can simply share a link that leads to an AudioStack page, where they can listen to the shared ad.
There’s a button there called “Try it now”, which is currently linked to a typeform where users can request access.
We are working on implementing the sonicsell trial flow on the website right now (will have it by next week),
Once that’s shipped, we will link this button directly to that trial flow.
Improved UI/UX:
Earlier, users could only truly generate and work with 1 ad. Not anymore,
Now, you can bulk generate 100s of ads in the same screen, and dynamically customise them as you go (without losing any progress of the other ads built in that session). Of course, once the session is refreshed, all that progress will be lost, since we are not saving the audioforms on a user level yet.
Additionally, users now also have the option to only “edit the ad” and not “generate” it at the same time. They can edit, see what they like and don’t like, and then generate as they please.
Also, now, if a request fails, we show them a beautiful failed card with a button to try to regenerate. Migrated from WebSocket to HTTP
Earlier, if you were on the app for 30 mins, the websocket would time out which would result in failed generate requests.
Now, that’s no longer the case. With the migration to HTTP, we’ve seen a massive reduction in time-outs and failed requests, improving your user experience! 🚀
Voice Intelligence Layer improvements
We made some huge 🚀 improvements to the voice intelligence layer.
You can see a material improvement in the correctness on our internal data set compared to the NEMO project. We used this on a range of complex normalisation data challenges. We hope this enables you to have a much better user experience.
We also made a few more improvements
SSML tags are removed before any processes are applied and replaced after. This was causing some edge cases and poor behaviour.
Bug Breaktags no longer having normalisation incorrectly applied.
Feature Enhancement We also shipped improvements to the Eleven Labs voices. If you use the voiceIntelligence layer it'll apply our voice intelligence layer, and detect the language and pronounce the numbers correctly. For our 🇩🇪 customers this is super valuable.
Here's an example. Use the Wren voice from Eleven Labs for example for this.
"scriptText": "<as:section name='intro' soundsegment='intro'> heute ist der 12.12.2012</as:section>"
Padding Parameters
We added two new padding parameters
timelineProperties: {padding : seconds} adds padding between each section
sectionProperties: {sectionName : {padding : seconds}} adds padding after a named section
Loudness presets
We added a range of loudness presets - here's a simple example.
# test list encoder and loudness presets
a = audiostack.Delivery.Encoder.list_presets()
print(a)
# test wav encoding with spotify loudness
encoded = audiostack.Delivery.Encoder.encode_mix(productionId='PRODUCTION_ID',
preset='wav',
loudnessPreset="spotify")
DCO MVP
We shipped a few weeks ago the following digital creative optimisation - this features a fictional audio campaign of a fictional gym and tonic brand!
You can see in this campaign that it personalises the audio based on some specific parameters, so you can better connect with your customers.
We've updated our permissions in our voices so we're now enabling more of our premium and high quality voices to all plans. This is a much requested feature. 💯
Improved script sections functionality
Create a single section of a text-to-speech resource.
One problem that a lot of our customers have noted is that it's hard to tell how long a particular voice will work with the text provided, as there's variance in speech rates. So we produced a proprietary ML model based on our customer data to ship this feature.
Furthermore this will keep getting better as our usage of voices grows.
Many of our customers asked how to upload custom sound beds or custom sound templates. You can see this above
Voice pipeline improvements
We've made some improvements to our voice pipeline so our voice cloning engine works 3x better. We're constantly working on these improvements to our infrastructure
Mastering engine performance improvements
We improved the reliability of our mastering engine so you can produce more beautiful audio at scale.
Through our partnership with Eleven Labs, we bring you cutting-edge voices equipped with the groundbreaking feature of Multilingual Synthetic Models. With an extensive selection of languages such as English, Spanish, German, Italian, French, Portuguese, Polish, and Hindi, our MultiLingual voices ensure an immersive and localized experience. Now, effortlessly generate scripts in different languages or combine them using the following code example to leverage the power of Eleven Labs' voices within Audiostack.ai:
import audiostack
import os
audiostack.api_key = "APIKEY" # fill up
script = """
<as:section name="main" soundsegment="main">
Un homme armé d’un couteau a semé la terreur jeudi 8 juin au matin dans un parc sur les bords du lac d’Annecy, blessant grièvement plusieurs enfants, avant d’être interpellé. Le Monde fait le point sur ce que l’on sait.
Das Brandenburger Tor ist eine bekannte Sehenswürdigkeit in Berlin und in der Geschichte Deutschlands von großer Bedeutung. Das Tor wurde im 18. Jahrhundert erbaut und war damals das Eingangstor zur Stadt. Es wurde von dem preußischen König Friedrich Wilhelm II gebaut..
"""
names = ["aspen"]
presets = ["musicenhanced", "balanced", "voiceenhanced"]
templates = ["solution_zen_30"]
script = audiostack.Content.Script.create(scriptText=script, scriptName="multilingual_test", projectName="multilingual_test")
for name in names:
# Creates text to speech
speech = audiostack.Speech.TTS.create(
scriptItem=script,
voice=name,
speed=1
)
for template in templates:
for preset in presets:
mix = audiostack.Production.Mix.create(
speechItem=speech,
soundTemplate=template,
masteringPreset=preset, public=True
)
print(mix)
mix.download(fileName=f"french_{name}_{template}_{preset}")
encoded = audiostack.Delivery.Encoder.encode_mix(productionId=mix.productionId, preset="mp3")
print(encoded)
encoded.download()
Immerse your audience in a truly global audio experience, breaking language barriers with Audiostack's Multilingual Models powered by Eleven Labs.They are available in all our Paid Plans, try it here Audiostack.
SonicSell V3
We made a number of changes to SonicSell (you'll need to reach out to us to get access it's still in Beta)
This v3 has the following updates:
German frontend (change flag in the top right corner) 🇩🇪
German voices German prompting
Generative script creation
Ad database
700 Voices selectable now
Length & character estimation
Coming next:
Advanced mode
Multi-version creation (on default)
Mood & tone input
More languages
Dynamic parameters / versioning
Time Out and other fixes (*if ad is not created in 30 seconds please refresh or start new tab)
We refactored and redesigned our Voice Intelligence Layer to use just a voiceIntelligence boolean and not specify the inner workings of the Dictionary or normaliser. We did this to enhance the Developer Experience
Exciting news! Our innovative AI audio ad tool, SonicSell, is now available for all paid AudioStack users. SonicSell generates radio-ready ads in just 30 seconds, leveraging AI, synthetic voices, and generative music. Experience the future of efficient, high-quality audio ad creation today. We can’t wait to hear your feedback.
We implemented a much requested customer feature 💯
You can now send your audio content to the Julep Podcast network, you just need your API key and then you can roll.
Noise gate
We've been working hard on our audio intelligence, we just added a noise gate which will help with noisy voices. This should ensure your audio experience is even better.
Performance enhancements
We've been hard at work on our performance of our system. More on that soon but we'r
Effective from June 5th, our partner WellSaid Labs will be retiring the voice Narration_roxy from their platform, and consecutively from Audiostack Voice Library. We recommend you try Narration_fiona as the closest alternative.
We'll be adding some new voices in the near future
We've also been heavily involved in making our systems more stable and enhancing our security and control systems.
Validation
We have added a validation route in mastering: https://docs.audiostack.ai/reference/validatemix
This allows you to validate your mastering request before sending it (and consuming credits) great for use cases involving user defined start and end values.
We've been working hard on making our voices easier to discover. We've invested in the following key features which you can see on https://library.audiostack.ai/
These include
Tagging and metadata - all our 600+ voices are tagged and up to date, so you can pick the perfect voice for your audio.
Search and discoverability, it was our most requested feature!
You can specify by language and provider and various other tags.
The website is also 👌and beautiful with responsive design.
Image: Showing the library with the search functionality
Here's a demo example in some code. Run this and listen to how awesome the audio sounds 🎧
We've added enhanced plugins - you'll see some presets below they are aimed at making your audio sound superb.
We're working hard on adding more sound templates and will be adding more to this in the future.
import audiostack
import os
from uuid import uuid4
audiostack.api_key = "APIKEY" # Add your API key here
script = """
<as:section name="main" soundsegment="main">
Are you ready to explore the vibrant city of Barcelona? Do you want to experience the culture, the nightlife, and the beauty of this incredible city?
Then we've got just the thing for you!
Join our travel agency for an unforgettable trip to Barcelona. Experience the bustling city streets, the stunning canals, and the charming architecture that Amsterdam is known for. Get lost in the vibrant nightlife,
explore the world-renowned museums, or simply soak in the local culture.
</as:section>
"""
names = ["Wren", "jollie", "aspen", "monica"]
presets = ["musicenhanced", "balanced", "voiceenhanced"]
templates = ["your_take_30","listen_up_30", "future_focus_30"]
script = audiostack.Content.Script.create(scriptText=script, scriptName="test", projectName="ams_tests_2")
for name in names:
# Creates text to speech
speech = audiostack.Speech.TTS.create(
scriptItem=script,
voice=name,
speed=100
)
for template in templates:
for preset in presets:
mix = audiostack.Production.Mix.create(
speechItem=speech,
soundTemplate=template,
masteringPreset=preset,
)
print(mix)
uuid = uuid4()
mix.download(fileName=f"V4_{name}_{template}_{preset}")
print(mix)