Magenta RealTime

Today, we’re happy to share a research preview of Magenta RealTime (Magenta RT), an open-weights live music model that allows you to interactively create, control and perform music in the moment.

Colab Demo 📝Paper GitHub Code Model Card

Magenta RT is the latest in a series of models and applications developed as part of the Magenta Project. It is the open-weights cousin of Lyria RealTime, the real-time generative music model powering Music FX DJ and the real-time music API in Google AI Studio, developed by Google DeepMind. Real-time music generation models open up unique opportunities for live music exploration and performance, and we’re excited to see what new tools, experiences, and art you create with them.

As an open-weights model, Magenta RT is targeted towards eventually running locally on consumer hardware (currently runs on free-tier Colab TPUs). It is an 800 million parameter autoregressive transformer model trained on ~190k hours of stock music from multiple sources, mostly instrumental. The model code is available on Github and the weights are available on Google Cloud Storage and Hugging Face under permissive licenses with some additional bespoke terms. To see how to run inference with the model and try it yourself, check out our Colab Demo. You may also customize MagentaRT on your own audio or explore live audio input Options for local, on device inference are coming soon.

How it Works

Live generative music is particularly difficult because it requires both real-time generation (i.e. real-time factor > 1, generating X seconds of audio in less than X seconds), causal streaming (i.e. online generation), and low-latency controllability.

Magenta RT overcomes these challenges by adapting the MusicLM architecture to perform block autoregression. The model generates a continuous stream of music in sequential chunks, each conditioned on the previous audio output (10s of coarse audio tokens) and a style embedding to produce the next audio chunk (2s of fine audio tokens). By manipulating the style embedding (weighted average of text or audio prompt embeddings), players can shape and morph the music in real-time, mixing together different styles, instruments, and musical attributes.

The latency of controls is set by the chunk size, which has a maximum output size of two seconds but can be reduced to increase reactivity. On a Colab free-tier TPU (v2-8 TPU), these two seconds of audio are generated in 1.25 seconds, giving a real-time factor of 1.6.

Compared to the original MusicLM, we’ve upgraded our representations to SpectroStream for high-fidelity (48kHz stereo) audio, which is a successor to SoundStream (Zeghidour+ 21). We also trained a new joint music+text embedding model called MusicCoCa that is influenced by both MuLan (Huang+ 22) and the CoCa models (Yu+ 22). Additional details are provided in the model card and deeper technical descriptions are available in our paper.

Latent Space Exploration… In Real Time

Magenta’s earlier work in latent music models for MIDI clips (MusicVAE, GrooVAE) and instrumental timbre (NSynth), offered a wide range of possible interfaces.

With Magenta RT, it is now possible to traverse the space of multi-instrumental audio: explore the never-before-heard music between genres, unusual instrument combinations, or your own audio samples.

The ability to adjust prompt mixtures in real-time allows you to efficiently explore the sonic landscape and find novel textures and loops to use as part of a larger piece of music.

Real-time interactivity also provides the possibility of this latent exploration being its own type of musical performance, the interpolation through space combined with anchoring of the audio context producing a structure similar to a DJ set or improvisation session. Beyond performance, it can also be used to provide interactive soundscapes for physical spaces like artist installations or virtual spaces like video games.

This opens up a world of possibilities to build new tools and interfaces, and below you can see three example applications built on the Lyria RealTime API in AI Studio. Over time, Magenta RT will open up similar opportunities for on-device applications.

PromptDJ PromptDJ MIDI PromptDJ Pad

Why Magenta RealTime?

Enhancing human creativity (not replacing it) has always been at the core of Magenta’s mission. AI, however, can be a double-edged sword for creative agency. It offers new opportunities for accessibility and expression, but it can also create a deluge of more passive creation and consumption compared to traditional methods. With this in mind, we have always strived to build tools that help close the skill gap to make creation more accessible, while also valuing existing musical practices and encouraging people to dig deeper in their own creative journeys. In this regard, real-time interactive music models offer several important advantages that have motivated our research over the years (Piano Genie, DDSP, NSynth, AI Duet, and more).

Live interaction demands more from the player but can offer more in return. The continuous perception-action loop between the human and the model provides access to a creative flow state, centering the experience on the joy of the process over the final product. The higher bandwidth channel of communication and control often results in outputs that are more unique and personal, as every action the player takes (or doesn’t) has an effect.

Finally, live models naturally avoid creating a deluge of passive content, because they intrinsically balance listening with generation in a 1:1 ratio. They create a unique moment in time, shared by the player, the model, and listeners.

While Lyria RealTime provides access to state-of-the-art live music generation to developers and users around the globe, the Magenta Project remains committed to providing more direct access to code and models to enable researchers, artists, and creative coders to further build upon and adapt to achieve their creative goals.

Known Limitations

Coverage of broad musical styles. Magenta RT’s training data primarily consists of Western instrumental music. As a consequence, Magenta RT has incomplete coverage of both vocal performance and the broader landscape of rich musical traditions worldwide. For real-time generation with broader style coverage, we refer users to our Lyria RealTime API.

Vocals. While the model is capable of generating non-lexical vocalizations and humming, it is not conditioned on lyrics and is unlikely to generate actual words. However, there remains some risk of generating explicit or culturally-insensitive lyrical content.

Latency. Because the Magenta RT LLM operates on two second chunks, user inputs for the style prompt may take two or more seconds to influence the musical output.

Limited context. Because the Magenta RT encoder has a maximum audio context window of ten seconds, the model is unable to directly reference music that has been output earlier than that. While the context is sufficient to enable the model to create melodies, rhythms, and chord progressions, the model is not capable of automatically creating longer-term song structures.

Future Work

Magenta RT and Lyria RT are pushing the boundaries of live generative music, and we are happy that Magenta RT marks a return of open releases from Magenta.

We are hard at work at making MagentaRT run locally on your own device - stay tuned for more info!

We are also working on the next generation of real-time models with higher quality, lower latency, and more interactivity, to create truly playable instruments and live accompaniment.

How to cite

Please cite our technical report:

BibTeX:

@article{gdmlyria2025live,
    title={Live Music Models},
    author={Caillon, Antoine and McWilliams, Brian and Tarakajian, Cassie and Simon, Ian and Manco, Ilaria and Engel, Jesse and Constant, Noah and Li, Pen and Denk, Timo I. and Lalama, Alberto and Agostinelli, Andrea and Huang, Anna and Manilow, Ethan and Brower, George and Erdogan, Hakan and Lei, Heidi and Rolnick, Itai and Grishchenko, Ivan and Orsini, Manu and Kastelic, Matej and Zuluaga, Mauricio and Verzetti, Mauro and Dooley, Michael and Skopek, Ondrej and Ferrer, Rafael and Borsos, Zal{\'a}n and van den Oord, {\"A}aron and Eck, Douglas and Collins, Eli and Baldridge, Jason and Hume, Tom and Donahue, Chris and Han, Kehang and Roberts, Adam},
    journal={arXiv:2508.04651},
    year={2025}
}