A year ago, Stability AI, the London-based startup behind the open source image-generating AI model Stable Diffusion, quietly released Dance Diffusion, a model that can generate songs and sound effects given a text description of the songs and sound effects in question.
Dance Diffusion was Stability AI’s first foray into generative audio, and it signaled a meaningful investment — and acute interest, seemingly — from the company in the nascent field of AI music creation tools. But for nearly a year after Dance Diffusion was announced, all seemed quiet on the generative audio front — at least as far as it concerned Stability’s efforts.
The research organization Stability funded to create the model, Harmonai, stopped updating Dance Diffusion sometime last year. (Historically, Stability has provided resources and compute to outside groups rather than build models entirely in-house.) And Dance Diffusion never gained a more polished release; even today, installing it requires working directly with the source code, as there’s no user interface to speak of.
Now, under pressure from investors to translate over $100 million in capital into revenue-generated products, Stability is recommitting to audio in a big way.
Today marks the release of Stable Audio, a tool that Stability claims is the first capable of creating “high-quality,” 44.1 kHz music for commercial use via a technique called latent diffusion. Trained on audio metadata as well as audio files’ durations — and start times — Stability says that Audio Diffusion’s underlying, roughly 1.2-billion-parameter model affords greater control over the content and length of synthesized audio than the generative music tools released before it.
“Stability AI is on a mission to unlock humanity’s potential by building foundational AI models across a number of content types or ‘modalities,’” Ed Newton-Rex, VP of audio for Stability AI, told TechCrunch in an email interview. “We started with Stable Diffusion and have grown to include languages, code and now music. We believe the future of generative AI is multimodality.”
Stable Audio wasn’t developed by Harmonai — or, rather, it wasn’t developed by Harmonai alone. Stability’s audio team, formalized in April, created a new model inspired by Dance Diffusion to underpin Stable Audio, which Harmonai then trained.
Techcrunch event
Join 10k+ tech and VC leaders for growth and connections at Disrupt 2025
Netflix, Box, a16z, ElevenLabs, Wayve, Hugging Face, Elad Gil, Vinod Khosla — just some of the 250+ heavy hitters leading 200+ sessions designed to deliver the insights that fuel startup growth and sharpen your edge. Don’t miss the 20th anniversary of TechCrunch, and a chance to learn from the top voices in tech. Grab your ticket before doors open to save up to $444.
Join 10k+ tech and VC leaders for growth and connections at Disrupt 2025
Netflix, Box, a16z, ElevenLabs, Wayve, Hugging Face, Elad Gil, Vinod Khosla — just some of the 250+ heavy hitters leading 200+ sessions designed to deliver the insights that fuel startup growth and sharpen your edge. Don’t miss a chance to learn from the top voices in tech. Grab your ticket before doors open to save up to $444.
Harmonai now serves as Stability’s AI music research arm, Newton-Rex, who joined Stability last year after tenures at TikTok and Snap, tells me.
“Dance Diffusion generated short, random audio clips from a limited sound palette, and the user had to fine-tune the model themselves if they wanted any control. StableAudio can generate longer audio, and the user can guide generation using a text prompt and by setting the desired duration,” Newton-Rex said. “Some prompts work fantastically, like EDM and more beat-driven music, as well as ambient music, and some generate audio that’s a bit more ‘out there,’ like more melodic music, classical and jazz.”
Stability turned down our repeated requests to try Stable Audio ahead of its launch. For now, and perhaps in perpetuity, Stable Audio can only be used through a web app, which wasn’t live until this morning. In a move that’s sure to irk supporters of its open research mission, Stability hasn’t announced plans to release the model behind Stable Audio in open source.
But Stability was amenable to sending samples showcasing what the model can accomplish across a range of genres, mainly EDM, given brief prompts.
While they very well could’ve been cherry picked, the samples sound — at least to this reporter’s ears — more coherent, melodic and for lack of a better word musical than many of the “songs” from the audio generation models released so far. (See Meta’s AudioGen and MusicGen, Riffusion, OpenAI’s Jukebox, Google’s MusicLM and so on.) Are they perfect? Clearly not — they’re lacking in creativity, for one. But if I heard the ambient techno track below playing in a hotel lobby somewhere, I probably wouldn’t assume AI was the creator.
Kyle Wiggers was TechCrunch’s AI Editor until June 2025. His writing has appeared in VentureBeat and Digital Trends, as well as a range of gadget blogs including Android Police, Android Authority, Droid-Life, and XDA-Developers. He lives in Manhattan with his partner, a music therapist.
Join 10k+ tech and VC leaders for growth and connections at Disrupt 2025
Netflix, Box, a16z, ElevenLabs, Wayve, Hugging Face, Elad Gil, Vinod Khosla — just some of the 250+ heavy hitters leading 200+ sessions designed to deliver the insights that fuel startup growth and sharpen your edge. Don’t miss the 20th anniversary of TechCrunch, and a chance to learn from the top voices in tech. Grab your ticket before doors open to save up to $444.
Join 10k+ tech and VC leaders for growth and connections at Disrupt 2025
Netflix, Box, a16z, ElevenLabs, Wayve, Hugging Face, Elad Gil, Vinod Khosla — just some of the 250+ heavy hitters leading 200+ sessions designed to deliver the insights that fuel startup growth and sharpen your edge. Don’t miss a chance to learn from the top voices in tech. Grab your ticket before doors open to save up to $444.