musicgen

Maintainer: aussielabs - Last updated 6/17/2024

📉

Model overview

musicgen is a deployment of Meta's MusicGen model, a state-of-the-art controllable text-to-music generation system. It was developed by the team at aussielabs. musicgen can generate high-quality music from text prompts or continue and mimic existing audio. It is part of the broader AudioCraft library, which contains other impressive audio generation models like AudioGen and EnCodec.

Model inputs and outputs

Inputs

  • Prompt: A description of the music you want to generate.
  • Input Audio: An audio file that will influence the generated music. The generated music can either continue the audio file's melody or mimic its style.
  • Duration: The desired duration of the generated audio in seconds.
  • Continuation Start/End: The start and end times of the audio file to use for continuation.
  • Model Version: The specific MusicGen model to use, such as the "melody" version.
  • Output Format: The desired format for the generated audio, such as WAV.
  • Normalization Strategy: The strategy for normalizing the output audio.
  • Temperature: Controls the "conservativeness" of the sampling process.
  • Top K/P: Reduces the sampling to the most likely tokens.
  • Classifier Free Guidance: Increases the influence of the input on the output.

Outputs

  • Output: The generated audio file in the specified format.

Capabilities

musicgen can generate diverse and high-quality musical compositions from text prompts. It can also continue and mimic existing audio, allowing for creative remixing and mashups. The model is highly controllable, with options to adjust the generated music's style, duration, and other parameters.

What can I use it for?

musicgen can be used for a variety of applications, such as:

  • Generating custom background music for videos, games, or podcasts
  • Creating unique musical compositions for personal or commercial projects
  • Experimenting with remixing and mashups by continuing or mimicking existing tracks
  • Exploring new musical ideas and styles through text-based prompts

Things to try

One interesting capability of musicgen is its ability to continue and mimic existing audio. Try providing an audio file as input and experiment with the "continuation" and "melody" options to see how the model can extend or transform the original music. You can also try adjusting the temperature and guidance settings to generate more diverse or controlled outputs.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Total Score

525

Follow @aimodelsfyi on 𝕏 →

Related Models

musicgen
Total Score

2.2K

musicgen

meta

musicgen is a simple and controllable model for music generation developed by Meta. Unlike existing methods like MusicLM, musicgen doesn't require a self-supervised semantic representation and generates all 4 codebooks in one pass. By introducing a small delay between the codebooks, the authors show they can predict them in parallel, thus having only 50 auto-regressive steps per second of audio. musicgen was trained on 20K hours of licensed music, including an internal dataset of 10K high-quality music tracks and music data from ShutterStock and Pond5. Model inputs and outputs musicgen takes in a text prompt or melody and generates corresponding music. The model's inputs include a description of the desired music, an optional input audio file to influence the generated output, and various parameters to control the generation process like temperature, top-k, and top-p sampling. The output is a generated audio file in WAV format. Inputs Prompt**: A description of the music you want to generate. Input Audio**: An optional audio file that will influence the generated music. If "continuation" is set to true, the generated music will be a continuation of the input audio. Otherwise, it will mimic the input audio's melody. Duration**: The duration of the generated audio in seconds. Continuation Start/End**: The start and end times of the input audio to use for continuation. Various generation parameters**: Settings like temperature, top-k, top-p, etc. to control the diversity and quality of the generated output. Outputs Generated Audio**: A WAV file containing the generated music. Capabilities musicgen can generate a wide variety of music styles and genres based on the provided text prompt. For example, you could ask it to generate "tense, staccato strings with plucked dissonant strings, like a scary movie soundtrack" and it would produce corresponding music. The model can also continue or mimic the melody of an input audio file, allowing for more coherent and controlled music generation. What can I use it for? musicgen could be used for a variety of applications, such as: Background music generation**: Automatically generating custom music for videos, games, or other multimedia projects. Music composition assistance**: Helping musicians and composers come up with new musical ideas or sketches to build upon. Audio creation for content creators**: Allowing YouTubers, podcasters, and other content creators to easily add custom music to their projects. Things to try One interesting aspect of musicgen is its ability to generate music in parallel by predicting the different codebook components separately. This allows for faster generation compared to previous autoregressive music models. You could try experimenting with different generation parameters to find the right balance between generation speed, diversity, and quality for your use case. Additionally, the model's ability to continue or mimic input audio opens up possibilities for interactive music creation workflows, where users could iterate on an initial seed melody or prompt to refine the generated output.

Read more

Updated 12/9/2024

Text-to-Audio
music-gen
Total Score

14

music-gen

pollinations

music-gen is a text-to-music generation model developed by the team at pollinations. It is part of the Audiocraft library, which is a PyTorch-based library for deep learning research on audio generation. music-gen is a state-of-the-art controllable text-to-music model that can generate music from a given text prompt. It is similar to other music generation models like musicgen, audiogen, and musicgen-choral, but it offers a unique approach with its own strengths. Model inputs and outputs music-gen takes a text prompt and an optional duration as inputs, and generates an audio file as output. The text prompt can be used to specify the desired genre, mood, or other attributes of the generated music. Inputs Text**: A text prompt that describes the desired music Duration**: The duration of the generated music in seconds Outputs Audio file**: An audio file containing the generated music Capabilities music-gen is capable of generating high-quality, controllable music from text prompts. It uses a single-stage auto-regressive Transformer model trained on a large dataset of licensed music, which allows it to generate diverse and coherent musical compositions. Unlike some other music generation models, music-gen does not require a self-supervised semantic representation, and it can generate all the necessary audio components (such as melody, harmony, and rhythm) in a single pass. What can I use it for? music-gen can be used for a variety of creative and practical applications, such as: Generating background music for videos, games, or other multimedia projects Composing music for specific moods or genres, such as relaxing ambient music or upbeat dance tracks Experimenting with different musical styles and ideas by prompting the model with different text descriptions Assisting composers and musicians in the creative process by providing inspiration or starting points for new compositions Things to try One interesting aspect of music-gen is its ability to generate music with a specified melody. By providing the model with a pre-existing melody, such as a fragment of a classical composition, you can prompt it to create new music that incorporates and builds upon that melody. This can be a powerful tool for exploring new musical ideas and variations on existing themes.

Read more

Updated 12/9/2024

Audio-to-Audio
musicgen-songstarter-v0.2
Total Score

3

musicgen-songstarter-v0.2

nateraw

musicgen-songstarter-v0.2 is a large, stereo MusicGen model fine-tuned by nateraw on a dataset of melody loops from their Splice sample library. It is intended to be a useful tool for music producers to generate song ideas. Compared to the previous version musicgen-songstarter-v0.1, this new model was trained on 3x more unique, manually-curated samples and is double the size, using a larger large transformer language model. Similar models include the original musicgen from Meta, which can generate music from a prompt or melody, as well as other fine-tuned versions like musicgen-fine-tuner and musicgen-stereo-chord. Model inputs and outputs musicgen-songstarter-v0.2 takes a variety of inputs to control the generated music, including a text prompt, audio file, and various parameters to adjust the sampling and normalization. The model outputs stereo audio at 32kHz. Inputs Prompt**: A description of the music you want to generate Input Audio**: An audio file that will influence the generated music Continuation**: Whether the generated music should continue from the provided audio file or mimic its melody Continuation Start/End**: The start and end times of the audio file to use for continuation Duration**: The duration of the generated audio in seconds Sampling Parameters**: Controls like top_k, top_p, temperature, and classifier_free_guidance to adjust the diversity and influence of the inputs Outputs Audio**: Stereo audio samples in the requested format (e.g. WAV) Capabilities musicgen-songstarter-v0.2 can generate a variety of musical styles and genres based on the provided prompt, including genres like hip hop, soul, jazz, and more. It can also continue or mimic the melody of an existing audio file, making it useful for music producers looking to build on existing ideas. What can I use it for? musicgen-songstarter-v0.2 is a great tool for music producers looking to generate song ideas and sketches. By providing a textual prompt and/or an existing audio file, the model can produce new musical ideas that can be used as a starting point for further development. The model's ability to generate in stereo and mimic existing melodies makes it particularly useful for quickly prototyping new songs. Things to try One interesting capability of musicgen-songstarter-v0.2 is its ability to generate music that adheres closely to the provided inputs, thanks to the "classifier free guidance" parameter. By increasing this value, you can produce outputs that are less diverse but more closely aligned with the desired style and melody. This can be useful for quickly generating variations on a theme or refining a specific musical idea.

Read more

Updated 12/9/2024

Audio-to-Audio

🔗

Total Score

63

stable-audio-prod

ardianfe

stable-audio-prod is an AI model created by Replicate user ardianfe that enables the generation of music using open-source technologies. It can be seen as a counterpart to similar models like music-gen-fn-200e, stable-audio-open-1.0, and musicgen-songstarter-v0.2 that focus on generating audio content from text prompts. Model inputs and outputs stable-audio-prod takes a variety of inputs to control the generation process, including a text prompt, audio file for continuation or mimicry, and various parameters to customize the output. The model then generates an audio file based on these inputs. Inputs Prompt**: A description of the music you want to generate Input Audio**: An audio file that will influence the generated music Continuation**: Whether to continue the melody of the input audio or mimic it Duration**: The duration of the generated audio in seconds Continuation Start/End**: The start and end times of the audio file to use for continuation Multi-Band Diffusion**: Whether to use multi-band diffusion for decoding Normalization Strategy**: The strategy for normalizing the audio Temperature**: Controls the "conservativeness" of the sampling process Classifier Free Guidance**: Increases the influence of inputs on the output Outputs The generated audio file in the specified format (e.g., MP3) Capabilities stable-audio-prod can generate a wide variety of musical styles and genres by adjusting the text prompt and other inputs. It can create short audio samples, sound effects, and production elements that can be useful for content creators, musicians, and audio professionals. What can I use it for? With stable-audio-prod, you can generate original music to use in your content, whether it's for a podcast, video, or any other project. You can also use it to create unique sound effects or production elements to enhance your audio projects. The model's ability to mimic or continue input audio can be especially useful for music producers looking to experiment with new ideas or build on existing melodies. Things to try One interesting aspect of stable-audio-prod is its ability to use multi-band diffusion for decoding, which can produce more diverse and high-quality audio outputs. You could try experimenting with this feature to see how it affects the generated music. Additionally, the classifier-free guidance parameter allows you to control the influence of the input prompt on the output, which could be useful for fine-tuning the generated audio to your specific needs.

Read more

Updated 12/9/2024

Audio-to-Audio