Meta released a new open-source AI model called AudioCraft, which lets users create music and sounds entirely through generative AI.
It consists of three AI models, all tackling different areas of sound generation. MusicGen takes text inputs to generate music. This model was trained on “20,000 hours of music owned by Meta or licensed specifically for this purpose.” AudioGen creates audio from written prompts, simulating barking dogs or footsteps, and was trained on public sound effects. An improved version of Meta’s EnCodec decoder lets users create sounds with fewer artifacts — which is what happens when you manipulate audio too much.
The company let the media listen to some sample audio made with AudioCraft. The generated noise of whistling, sirens, and humming sounded pretty natural. While the guitar strings on the songs felt real, they still felt, well, artificial.
Meta is just the latest to tackle combining music and AI. Google came up with MusicLM, a large language model that generated minutes of sounds based on text prompts and is only accessible to researchers. Then, an “AI-generated” song featuring a voice likeness of Drake and The Weeknd went viral before it was taken down. More recently, some musicians, like Grimes, have encouraged people to use their voices in AI-made songs.
Of course, musicians have been experimenting with electronic audio for a very long time; EDM and festivals like Ultra didn’t appear out of nowhere. But computer-generated music often sounds manipulated from existing audio. AudioCraft and other generative AI-produced music create those sounds just from texts and a vast library of sound data.
Right now, AudioCraft sounds like something that could be used for elevator music or stock songs that can be plugged in for some atmosphere rather than the next big pop hit. However, Meta believes its new model can usher in a new wave of songs in the same way that synthesizers changed music once they became popular.
“We think MusicGen can turn into a new type of instrument — just like synthesizers when they first appeared,” the company said in a blog. Meta acknowledged the difficulty in creating AI models capable of making music since audio often contains millions of points where the model does an action compared to written text models like Llama 2, which contain only thousands.
The company says AudioCraft needs open sourcing in order to diversify the data used to train it.
“We recognize that the datasets used to train our models lack diversity. In particular, the music dataset used contains a larger portion of Western-style music and only contains audio-text pairs with text and metadata written in English,” Meta said. “By sharing the code for AudioCraft, we hope other researchers can more easily test new approaches to limit or eliminate potential bias in and misuse of generative models.”
Record labels and artists have already sounded the alarm on the dangers of AI, as many fear AI models take in copyrighted material for training, and historically speaking, they are a litigious bunch. Sure, we all remember what happened to Napster, but more recently, Spotify faced a billion-dollar lawsuit based on a law that’s been around since the days of player pianos, and just this year, a court had to rule on whether Ed Sheeran copied Marvin Gaye for “Thinking Out Loud.”
But before Meta’s “synthesizer” goes on tour, someone will have to figure out a prompt that pulls in fans who want more machine-made songs and not just muzak.