Fine-tune MusicGen small, medium and melody models. Also stereo models available.

## Model overview

`musicgen-fine-tuner` is a Cog implementation of the MusicGen model, a straightforward and manageable model for music generation. Developed by the Meta team, [MusicGen](https://replicate.com/meta/musicgen) is a simple and controllable model that can generate diverse music without requiring a self-supervised semantic representation like [MusicLM](https://arxiv.org/abs/2301.11325). The `musicgen-fine-tuner` allows users to refine the MusicGen model using their own datasets, enabling them to customize the generated music to their specific needs.

## Model inputs and outputs

The `musicgen-fine-tuner` model takes several inputs to generate music, including a prompt describing the desired music, an optional input audio file to influence the melody, and various configuration parameters like duration, temperature, and continuation options. The model outputs a WAV or MP3 audio file containing the generated music.

### Inputs
- **Prompt**: A description of the music you want to generate.
- **Input Audio**: An audio file that will influence the generated music. The model can either continue the melody of the input audio or mimic its overall style.
- **Duration**: The duration of the generated audio in seconds.
- **Continuation**: Whether the generated music should continue the input audio's melody or mimic its overall style.
- **Continuation Start/End**: The start and end times of the input audio to use for continuation.
- **Multi-Band Diffusion**: Whether to use multi-band diffusion when decoding the EnCodec tokens (only works with non-stereo models).
- **Normalization Strategy**: The strategy for normalizing the output audio.
- **Temperature**: Controls the "conservativeness" of the sampling process, with higher values producing more diverse outputs.
- **Classifier Free Guidance**: Increases the influence of inputs on the output, producing lower-variance outputs that adhere more closely to the inputs.

### Outputs
- **Audio File**: A WAV or MP3 audio file containing the generated music.

## Capabilities

The `musicgen-fine-tuner` model can generate diverse and customizable music based on user prompts and input audio. It can produce a wide range of musical styles and genres, from classical to electronic, and can be fine-tuned to specialize in specific styles or themes. Unlike more complex models like [MusicLM](https://arxiv.org/abs/2301.11325), `musicgen-fine-tuner` is a single-stage, auto-regressive Transformer model that can generate all the necessary audio components in a single pass, resulting in faster and more efficient music generation.

## What can I use it for?

The `musicgen-fine-tuner` model can be used for a variety of applications, such as:

- **Soundtrack and background music generation**: Generate custom music for videos, games, or other multimedia projects.
- **Music composition assistance**: Use the model to generate musical ideas or inspirations for human composers and musicians.
- **Audio content creation**: Create custom audio content for podcasts, radio, or other audio-based platforms.
- **Music exploration and experimentation**: Fine-tune the model on your own musical datasets to explore new styles and genres.

## Things to try

To get the most out of the `musicgen-fine-tuner` model, you can experiment with different prompts, input audio, and configuration settings. Try generating music in a variety of styles and genres, and explore the effects of adjusting parameters like temperature and classifier free guidance. You can also fine-tune the model on your own datasets to see how it performs on specific types of music or audio content.