## Model overview

`demucs` is a state-of-the-art music source separation model developed by researchers at Facebook AI Research. It is capable of separating drums, bass, vocals, and other accompaniment from audio tracks. The latest version, `Hybrid Transformer Demucs (v4)`, uses a hybrid spectrogram and waveform architecture with a Transformer encoder-decoder to achieve high-quality separation performance. This builds on the previous `Hybrid Demucs (v3)` model, which won the Sony MDX challenge. `demucs` is similar to other advanced source separation models like [Wave-U-Net](https://github.com/f90/Wave-U-Net), [Open-Unmix](https://github.com/sigsep/open-unmix-pytorch), and [D3Net](https://arxiv.org/abs/2010.01733), but achieves state-of-the-art results on standard benchmarks.

## Model inputs and outputs

`demucs` takes as input an audio file in a variety of formats including WAV, MP3, FLAC, and more. It outputs the separated audio stems for drums, bass, vocals, and other accompaniment as individual stereo WAV or MP3 files. Users can also choose to output just the vocals or other specific stems.

### Inputs
- **audio**: The input audio file to be separated
- **stem**: The specific stem to separate (e.g. vocals, drums, bass) or "no_stem" to separate all stems
- **model_name**: The pre-trained model to use for separation, such as `htdemucs`, `htdemucs_ft`, or `mdx_extra`
- **shifts**: The number of random shifts to use for equivariant stabilization, which improves quality but increases inference time
- **overlap**: The amount of overlap between prediction windows
- **clip_mode**: The strategy for avoiding clipping in the output, either "rescale" or "clamp"
- **float32**: Whether to output the audio as 32-bit float instead of 16-bit integer
- **mp3_bitrate**: The bitrate to use when outputting the audio as MP3

### Outputs
- **drums.wav**: The separated drums stem
- **bass.wav**: The separated bass stem 
- **vocals.wav**: The separated vocals stem
- **other.wav**: The separated other/accompaniment stem

## Capabilities

`demucs` is a highly capable music source separation model that can extract individual instrument and vocal tracks from complex audio mixes with high accuracy. It outperforms many previous state-of-the-art models on standard benchmarks like the MUSDB18 dataset. The latest `Hybrid Transformer Demucs (v4)` model achieves 9.0 dB SDR, which is a significant improvement over earlier versions and other leading approaches.

## What can I use it for?

`demucs` can be used for a variety of music production and audio engineering tasks. It enables users to isolate individual elements of a song, which is useful for tasks like:

- Karaoke or music removal - Extracting just the vocals to create a karaoke track
- Remixing or mash-ups - Separating the drums, bass, and other elements to remix a song
- Audio post-production - Cleaning up or enhancing specific elements of a mix
- Music education - Isolating instrument tracks for practicing or study
- Music information retrieval - Analyzing the individual components of a song

The model's state-of-the-art performance and flexible interface make it a powerful tool for both professionals and hobbyists working with audio.

## Things to try

Some interesting things to try with `demucs` include:

- Experimenting with the different pre-trained models to find the best fit for your audio
- Trying the "two-stems" mode to extract just the vocals or other specific element
- Utilizing the "shifts" option to improve separation quality, especially for complex mixes
- Applying the model to a diverse range of musical genres and styles to see how it performs

The maintainer, [cjwbw](https://aimodels.fyi/creators/replicate/cjwbw), has also released several other impressive audio models like [audiosep](https://aimodels.fyi/models/replicate/audiosep-cjwbw), [video-retalking](https://aimodels.fyi/models/replicate/video-retalking-cjwbw), and [voicecraft](https://aimodels.fyi/models/replicate/voicecraft-cjwbw) that may be of interest to explore further.