Listen to Google’s latest AI create music from text prompts

MusicLM is Google's fancy new AI model, but isn't being released to the public

Jan 30, 2023

(Credit: Google)

A team of Google researchers and engineers have created a new AI tool capable of generating pieces of music from short text prompts, but haven’t released the engine for public use.

The tool, called MusicLM, was revealed in a research paper and is described as “a model generating high-fidelity music from text descriptions such as ‘a calming violin melody backed by a distorted guitar riff’”.

Like artificial intelligence models, MusicLM is trained on a large dataset of unlabeled music to generate its own “long and coherent music” out of “text descriptions of significant complexity”.

➡️ The Shortcut Skinny: Google’s music AI

🔮 Google researchers have created an impressive text-to-music AI
📜 MusicLM generates short and long musical pieces from text prompts
😮 It can cover a huge range of genres and styles
🎤 It sounds pretty good, so long as you don’t ask it to sing

Users can input prompts like “enchanting jazz song with a memorable saxophone solo and a solo singer” or “Berlin 90s techno with a low bass and strong kick” to generate snippets of corresponding music.

The researchers shared several examples of AI-generated music pieces created from short descriptions on a github page. Some are short, 30-second snippets, while others are 5-minute tracks. They cover a range of styles and genres, from reggae to jazz to techno, and also include very short, pared-down instrumental clips based on mini phrases like “guitar solo” or “string quartet”.

On the whole, it’s all pretty impressive. They may not blow you away, but the examples certainly sound like the kind of music a human might create. I was especially impressed by the tool’s ability to smoothly blend genres and shift between sections as described by the text prompt. It struggles to generate human voices, which sound unnatural and weirdly angular, as well as some percussion. Apple Books’ new AI narrators sound a lot more natural, although they’re only reading, not singing their robotic lungs out.

Other text-to-music AI engines have been developed in the past, but the researchers say “MusicLM outperforms previous systems both in audio quality and adherence to the text descriptions. Moreover, we demonstrate that MusicLM can be conditioned on both text and a melody in that it can transform whistled and hummed melodies according to the style described in a text caption.”

They hope the AI engine can “assist humans with creative music tasks”, although recognize several risks posed by the model, including programming biases, risks of cultural appropriation and the “potential misappropriation of creative content” – which sounds like another way of saying plagiarism. As such, Google has “no plans to release [the] models at this point”.

The company’s certainly being more cautious than some. CNET was recently found to have been surreptitiously using an AI model to generate articles that were hidden behind an inconspicuous byline. The robotic content creator only made things worse for itself when it was found to have stolen content from other sites and struggled with basic math.