almost 2 years ago

Improved Documentation; Bug fixes and performance enhancements

by Peadar Coyle

Media improvements

We fixed some UX problems (thanks to our customers for pointing this out) in the Media endpoints.

You can now do a lot more with media files, namely, place the name of the media file directly in the script using the name="" attribute.
You can also use the id="" attribute to create a placeholder, that can be overwritten in the mastering call:

import audiostack
import os

audiostack.api_key = os.environ["AUDIO_STACK_DEV_KEY"]

response = audiostack.Content.Media.create(filePath="default.wav")
print(response)
mediaId = response.mediaId


script = """
<as:section name="intro" soundsegment="intro">
  hello world <as:media name="default.wav" id="file1"/>
</as:section>
"""

script = audiostack.Content.Script.create(scriptText=script)
print(script)
speech = audiostack.Speech.TTS.create(scriptItem=script, voice="sara")
print(speech)

mastering = audiostack.Production.Mix.create(
    speechItem=speech, 
    # mediaFiles={
    #     "file1" : <any media id of an uploaded file">
    # }
)
print(mastering)

Intern showcase

Our intern William built something cool 🆒

https://github.com/aflorithmic/news_article_summarizer have a look here at this news article summarizer - leveraging Beautiful Soup 😍, OpenAI 💯 and our AudioStack API, it takes a URL and produces a beautiful audio summary. Have a play!

Multilingual voices in our Frontend

We updated our library

You can see all of the multilingual voices - plus what languages they support.

Bug fixes

As we build out the AudioStack we're constantly looking for improvements.

We fixed the trailing silence issue.

almost 2 years ago

New Voices; Voice improvements

by Peadar Coyle

New voices WellSaid labs, ElevenLabs

Exciting news! We've been hard at work collaborating with our providers to bring you more hyper-realistic English voices 🥳 We're thrilled to introduce 35 new voices from ElevenLabs, including 14 American accents 🇺🇸, 19 British accents 🇬🇧, and 2 Australian accents 🇦🇺. Additionally, we've added 41 voices from WellSaid Labs, featuring 28 American English 🇺🇸, 6 British English 🇬🇧, 2 English Mexican 🇲🇽 , 1 English South African 🇿🇦, and 4 English Australian voices 🇦🇺. Head over to our Library (library.audiostack.ai) to explore and try them all out!

Suggestions to try:

Finch, 11L, US

Elora, 11L GB

Conversational_greg WSL AU

Narration_Issa WSL ZA

Narration_lorenzo WSL MX:

**Stay tuned as there’s more coming in the next month!

https://library.audiostack.ai/

Voice improvements

We've enhanced our msnr voices with punctuation symbols and respective lengths, as well as enabling break tags. This should add naturalness and enhanced pronunciation, as well as improved speech flow. Some examples:

Full break is 600 milliseconds.
Comma is 300 milliseconds,
Colon is 300 milliseconds:
Semicolon is 400 milliseconds;
Question mark is 400 milliseconds?
Exclamation mark is 600 milliseconds!
Break tag with three seconds
Break tag with thousand millisecond
Break tag with thousand milliseconds using single quote
Break tag with thousand milliseconds using single quote and no space

Voice improvements

Special characters are supported again in the API: & and %. These are both normalised for English and German voices!

almost 2 years ago

8th July - Make your audio perfect for Youtube, Spotify or Podcasts

by Peadar Coyle

Ever wanted to make your audio perfect for podcasts or youtube?

A common requested feature is to make your audio perfect for a certain platforms. Our audio experts have been hard at work on this 🎧 💯

So we're excited to launch these new mixing features 😍

Some customers wanted more control on the loudness off the mixes. Different platforms have different loudness standards so there is now a new parameter in the encoder endpoint. Can be used as follows:

encoder = audiostack.Delivery.Encoder.encode_mix(
  productionItem=mix,
  preset="wav",
  loudnessPreset="podcast"      #spotify, applePodcast, youtube ...
)

Some customers mentioned our voices were a bit harsh (as we were making them loud and excited for advertising). So we implemetned this

mix = audiostack.Production.Mix.create(  
  productionItem=mix,  
  soundTemplate="",  
  masteringPreset="podcast",  
  timelineProperties = {"padding" : 1.5},  
)

Try it out! You'll need to update your AudioStack SDK in Python to v0.0.9

What are the specs?

If you use the following command you'll get back these solutions

audiostack.Delivery.Encoder.list_presets()

"spotify": "-16 LUFS Loudness Integrated and -2 dB True Peak",  
"podcast": "-16 LUFS Loudness Integrated and -3 dB True Peak",  
"applePodcast": "-16 LUFS Loudness Integrated and -1 dB True Peak",  
"youtube": "-14 LUFS Loudness Integrated and -1 dB True Peak",  
"lowVol": "-20 LUFS Loudness Integrated and -5 dB True Peak"

Also to get better user experience, we improved some of our explanations of some of these features - you can see that below.

"mp3": "mp3 format 320k (320 CBR)", 
"wav": "wav format at 48 kHz sample rate and 16 bits per sample", 
"ogg": "ogg format at 320k", 
"flac": "flac format at 48 kHz sample rate and 16 bits per sample", 
"mp3_very_low": "mp3 lowest quality (~64 kbps VBR)", 
"mp3_low": "mp3 low quality (~115 kbps VBR)", 
"mp3_medium": "mp3 medium quality (~165 kbps VBR)", 
"mp3_high": "mp3 high quality (~190 kbps VBR)", 
"mp3_very_high": "mp3 very high quality (~245 kbps VBR)", 
"mp3_alexa": "mp3 format mono at 48kHz sample rate", 
"mp3_alexa_48br": "mp3 format mono at 48 bit rate and 24kHz sample rate"

We're constantly working on iterating and improving our experience. Let us know what you think.

almost 2 years ago

7th July - Sonic Sell v0.4, Voice Intelligence Layer improvements, padding parameters

by Peadar Coyle

SonicSell v0.4

🚧
SonicSell Beta
If you want an invite to the SonicSell beta email [email protected] or [email protected] and we'll get you invited

Automatic Language Detection:
Users no longer have to select the flag at the top right to select an output language.
We have implemented automatic language detection and it will automatically adjust your output to your input language.
This feature works for English, French, German, Spanish, Italian and Portuguese. 🇬🇧 🇫🇷 🇪🇸 🇩🇪 🇵🇹 🇮🇹

Advanced Mode:

Mood and Tone Indicator (BETA): Users can now suggest what the mood of the ad should be, and what the tone of the ad should be.

Multi-Versions: Users can produce up to 3 versions of the same ad in 1 go.

Virality Link:

Earlier, users would have to download the audio asset and manually share it with their friends and colleagues.
This was a very frictionful process, and wouldn’t work for us from a “word of mouth marketing” perspective.
Now, users can simply share a link that leads to an AudioStack page, where they can listen to the shared ad.
There’s a button there called “Try it now”, which is currently linked to a typeform where users can request access.
We are working on implementing the sonicsell trial flow on the website right now (will have it by next week),
Once that’s shipped, we will link this button directly to that trial flow.

Improved UI/UX:
Earlier, users could only truly generate and work with 1 ad. Not anymore,
Now, you can bulk generate 100s of ads in the same screen, and dynamically customise them as you go (without losing any progress of the other ads built in that session). Of course, once the session is refreshed, all that progress will be lost, since we are not saving the audioforms on a user level yet.
Additionally, users now also have the option to only “edit the ad” and not “generate” it at the same time. They can edit, see what they like and don’t like, and then generate as they please.
Also, now, if a request fails, we show them a beautiful failed card with a button to try to regenerate.
Migrated from WebSocket to HTTP
Earlier, if you were on the app for 30 mins, the websocket would time out which would result in failed generate requests.
Now, that’s no longer the case. With the migration to HTTP, we’ve seen a massive reduction in time-outs and failed requests, improving your user experience! 🚀

Voice Intelligence Layer improvements

We made some huge 🚀 improvements to the voice intelligence layer.

You can see a material improvement in the correctness on our internal data set compared to the NEMO project. We used this on a range of complex normalisation data challenges. We hope this enables you to have a much better user experience.

We also made a few more improvements

SSML tags are removed before any processes are applied and replaced after. This was causing some edge cases and poor behaviour.
Bug Breaktags no longer having normalisation incorrectly applied.

Feature Enhancement We also shipped improvements to the Eleven Labs voices. If you use the voiceIntelligence layer it'll apply our voice intelligence layer, and detect the language and pronounce the numbers correctly. For our 🇩🇪 customers this is super valuable.

Here's an example. Use the Wren voice from Eleven Labs for example for this.

"scriptText": "<as:section name='intro' soundsegment='intro'> heute ist der 12.12.2012</as:section>"

Padding Parameters

We added two new padding parameters

timelineProperties: {padding : seconds} adds padding between each section
sectionProperties: {sectionName : {padding : seconds}} adds padding after a named section

Loudness presets

We added a range of loudness presets - here's a simple example.

# test list encoder and loudness presets
a = audiostack.Delivery.Encoder.list_presets()
print(a)

# test wav encoding with spotify loudness
encoded = audiostack.Delivery.Encoder.encode_mix(productionId='PRODUCTION_ID',
                                                 preset='wav', 
                                                 loudnessPreset="spotify")

DCO MVP

We shipped a few weeks ago the following digital creative optimisation - this features a fictional audio campaign of a fictional gym and tonic brand!

You can see in this campaign that it personalises the audio based on some specific parameters, so you can better connect with your customers.

about 2 years ago

16th June - Voice Permissions updated, Custom sound templates, Predicting the length of speech

by Peadar Coyle

Voice permissions updated

We've updated our permissions in our voices so we're now enabling more of our premium and high quality voices to all plans. This is a much requested feature. 💯

Improved script sections functionality

Create a single section of a text-to-speech resource.

https://docs.audiostack.ai/reference/postspeechsection

Predicting the length of speech (based on text)

One problem that a lot of our customers have noted is that it's hard to tell how long a particular voice will work with the text provided, as there's variance in speech rates. So we produced a proprietary ML model based on our customer data to ship this feature.

Furthermore this will keep getting better as our usage of voices grows.

url = "<https://v2.api.audio/speech/predict>"  
r = requests.post(url=url, headers=headers, json={"voice": voice, "text": f})

Uploading custom sound templates

https://docs.audiostack.ai/docs/custom-sound-design-templates

Many of our customers asked how to upload custom sound beds or custom sound templates. You can see this above

Voice pipeline improvements

We've made some improvements to our voice pipeline so our voice cloning engine works 3x better. We're constantly working on these improvements to our infrastructure

Mastering engine performance improvements

We improved the reliability of our mastering engine so you can produce more beautiful audio at scale.

about 2 years ago

9th June - Introducing Multilingual Models in Audiostack; SonicSell v3

by Maria Chatzi

Through our partnership with Eleven Labs, we bring you cutting-edge voices equipped with the groundbreaking feature of Multilingual Synthetic Models. With an extensive selection of languages such as English, Spanish, German, Italian, French, Portuguese, Polish, and Hindi, our MultiLingual voices ensure an immersive and localized experience. Now, effortlessly generate scripts in different languages or combine them using the following code example to leverage the power of Eleven Labs' voices within Audiostack.ai:

import audiostack
import os


audiostack.api_key = "APIKEY" # fill up



script = """
<as:section name="main" soundsegment="main">
Un homme armé d’un couteau a semé la terreur jeudi 8 juin au matin dans un parc sur les bords du lac d’Annecy, blessant grièvement plusieurs enfants, avant d’être interpellé. Le Monde fait le point sur ce que l’on sait.
Das Brandenburger Tor ist eine bekannte Sehenswürdigkeit in Berlin und in der Geschichte Deutschlands von großer Bedeutung. Das Tor wurde im 18. Jahrhundert erbaut und war damals das Eingangstor zur Stadt. Es wurde von dem preußischen König Friedrich Wilhelm II gebaut..

"""



names = ["aspen"]
presets = ["musicenhanced", "balanced", "voiceenhanced"]
templates = ["solution_zen_30"]



script = audiostack.Content.Script.create(scriptText=script, scriptName="multilingual_test", projectName="multilingual_test")        

for name in names:
    # Creates text to speech
    speech = audiostack.Speech.TTS.create(
            scriptItem=script,
            voice=name,
            speed=1
    )
    for template in templates:

        for preset in presets:

            mix = audiostack.Production.Mix.create(
                speechItem=speech,
                soundTemplate=template,
                masteringPreset=preset, public=True
            )
            print(mix)
  

            mix.download(fileName=f"french_{name}_{template}_{preset}")
            encoded = audiostack.Delivery.Encoder.encode_mix(productionId=mix.productionId, preset="mp3")
            print(encoded)
            encoded.download()

Immerse your audience in a truly global audio experience, breaking language barriers with Audiostack's Multilingual Models powered by Eleven Labs.They are available in all our Paid Plans, try it here Audiostack.

SonicSell V3

We made a number of changes to SonicSell (you'll need to reach out to us to get access it's still in Beta)

This v3 has the following updates:

German frontend (change flag in the top right corner) 🇩🇪

German voices German prompting
Generative script creation
Ad database

700 Voices selectable now
Length & character estimation

Coming next:

Advanced mode
Multi-version creation (on default)
Mood & tone input
More languages
Dynamic parameters / versioning
Time Out and other fixes (*if ad is not created in 30 seconds please refresh or start new tab)

about 2 years ago

8th of June - Voice Intelligence Layer

by Peadar Coyle

Breaking Change

We refactored and redesigned our Voice Intelligence Layer to use just a voiceIntelligence boolean and not specify the inner workings of the Dictionary or normaliser. We did this to enhance the Developer Experience

voiceIntelligence: bool = False

We deprecated:

"useDictionary": useDictionary,  
 "useTextNormalizer": useTextNormalizer

  speech = audiostack.Speech.TTS.create(
            scriptItem=script,
            voice=name,
            speed=100,
            voiceIntelligence="true"
    )

Not

  speech = audiostack.Speech.TTS.create(
            scriptItem=script,
            voice=name,
            speed=100,
            useTextNormalizer=True,
            useDictionary=True
    )

about 2 years ago

2nd June - Julep Connector, SonicSell

by Peadar Coyle

SonicSell for all AudioStack paid users

Exciting news! Our innovative AI audio ad tool, SonicSell, is now available for all paid AudioStack users. SonicSell generates radio-ready ads in just 30 seconds, leveraging AI, synthetic voices, and generative music. Experience the future of efficient, high-quality audio ad creation today. We can’t wait to hear your feedback.

Julep Connector

https://docs.audiostack.ai/reference/postjulep

We implemented a much requested customer feature 💯

You can now send your audio content to the Julep Podcast network, you just need your API key and then you can roll.

Noise gate

We've been working hard on our audio intelligence, we just added a noise gate which will help with noisy voices. This should ensure your audio experience is even better.

Performance enhancements

We've been hard at work on our performance of our system. More on that soon but we'r

about 2 years ago

19th May - Voices update

by Peadar Coyle

Voice Update:

Effective from June 5th, our partner WellSaid Labs will be retiring the voice Narration_roxy from their platform, and consecutively from Audiostack Voice Library. We recommend you try Narration_fiona as the closest alternative.

We'll be adding some new voices in the near future

about 2 years ago

16th May - DOCS DOCS DOCS

by Peadar Coyle

Doc updates

We made some excellent additions to docs. 💯

📔

Dynamic Creative Optimisation 📣

One of the many reasons for creating a script is so that it can be reused for different voices, and also for different personalisation parameters.

How to do dynamic creative optimisation https://docs.audiostack.ai/docs/dynamic-creative-optimisation-dco

Multispeaker support 🎉

Make your AI voices speak to each other!

https://docs.audiostack.ai/docs/multi-speaker-support

Enhanced mastering and timing parameters 🎓

An advanced feature

https://docs.audiostack.ai/docs/advance-timing-parameters

Other improvements

We've also been heavily involved in making our systems more stable and enhancing our security and control systems.

Validation

We have added a validation route in mastering:
https://docs.audiostack.ai/reference/validatemix
This allows you to validate your mastering request before sending it (and consuming credits) great for use cases involving user defined start and end values.

Media improvements

Intern showcase

Multilingual voices in our Frontend

Bug fixes

New voices WellSaid labs, ElevenLabs

Voice improvements

Voice improvements

Ever wanted to make your audio perfect for podcasts or youtube?

What are the specs?

SonicSell v0.4

🚧SonicSell Beta

Voice Intelligence Layer improvements

Padding Parameters

Loudness presets

DCO MVP

Voice permissions updated

Improved script sections functionality

Predicting the length of speech (based on text)

Uploading custom sound templates

Voice pipeline improvements

Mastering engine performance improvements

SonicSell V3

This v3 has the following updates:

Coming next:

Breaking Change

SonicSell for all AudioStack paid users

Julep Connector

Noise gate

Performance enhancements

Voice Update:

Doc updates

Dynamic Creative Optimisation 📣

Multispeaker support 🎉

Enhanced mastering and timing parameters 🎓

Other improvements

Validation

🚧
SonicSell Beta