M-a-p

Rank:

Average Model Cost: $0.0000

Number of Runs: 20,792

Models by this creator

MERT-v1-330M

MERT-v1-330M

m-a-p

MERT-v1-330M is a music audio pre-training model that has been trained with a new paradigm and dataset. It outperforms previous models and can better generalize to various tasks. The model architecture is similar to other models in the m-a-p family, with the main difference being the pre-training paradigm used. The model size, transformer layer-dimension, feature rate, and sample rate are important technical configurations to consider when using the model. Compared to the previous version (MERT-v0), MERT-v1 introduces several improvements, including using 8 codebooks for pseudo labels, MLM prediction with in-batch noise mixture, training with higher audio frequency (24K Hz), training with more audio data, and offering more available model sizes (95M and 330M). More details about the model can be found in the upcoming paper.

Read more

$-/run

9.5K

Huggingface

MERT-v1-95M

MERT-v1-95M

MERT-v1-95M is an advanced music understanding model developed as part of the Music Audio Pre-training (m-a-p) model family. It is trained using a new paradigm and dataset and outperforms previous models. The model has a different pre-training approach compared to MERT-v0, with changes to the pseudo labels, MLM prediction, audio frequency, and amount of audio data used for training. MERT-v1-95M offers multiple model sizes and can be used for tasks related to music understanding and generation. More details about the model can be found in the accompanying research paper.

Read more

$-/run

8.1K

Huggingface

music2vec-v1

music2vec-v1

Introduction to our series work The development log of our Music Audio Pre-training (m-a-p) model family: 17/03/2023: we release two advanced music understanding models, MERT-v1-95M and MERT-v1-330M , trained with new paradigm and dataset. They outperform the previous models and can better generalize to more tasks. 14/03/2023: we retrained the MERT-v0 model with open-source-only music dataset MERT-v0-public 29/12/2022: a music understanding model MERT-v0 trained with MLM paradigm, which performs better at downstream tasks. 29/10/2022: a pre-trained MIR model music2vec trained with BYOL paradigm. Here is a table for quick model pick-up: Explanation The m-a-p models share the similar model architecture and the most distinguished difference is the paradigm in used pre-training. Other than that, there are several nuance technical configuration needs to know before using: Model Size: the number of parameters that would be loaded to memory. Please select the appropriate size fitting your hardware. Transformer Layer-Dimension: The number of transformer layers and the corresponding feature dimensions can be outputted from our model. This is marked out because features extracted by different layers could have various performance depending on tasks. Feature Rate: Given a 1-second audio input, the number of features output by the model. Sample Rate: The frequency of audio that the model is trained with. Introduction to Music2Vec Music2Vec is accepted as 2-page abstract in Late Breaking Demos (LBD) at the ISMIR 2022. It is a completely unsupervised model trained on 1000 hour music audios. We release the crop5s version base model as music2vec-v1. Our base model is SOTA-comparable on multiple MIR tasks even under probing settings, while keeping fine-tunable on a single 2080Ti. Larger models trained with more data are on the way~ For a more recent pretrained model with better performance, please refer to m-a-p/MERT-v0. Model Architecture Music2Vec Framework. During pre-training, the student model aims to reconstruct the masked music audio by taking the contextualized representations provided by the teacher model as prediction targets. Performance Comparison With 95M parameters and relatively small training data (1k hr), our base Music2Vec representation achieves comparable performance to the SOTA Jukebox-5B representation. Note that our base model size is <2% of Jukebox-5B. Model Usage Our model is based on the data2vec audio model. Citation The paper can be found at ISMIR.

Read more

$-/run

1.5K

Huggingface

Similar creators