Alefiury

Rank:

Average Model Cost: $0.0000

Number of Runs: 6,121

Models by this creator

wav2vec2-large-xlsr-53-gender-recognition-librispeech

wav2vec2-large-xlsr-53-gender-recognition-librispeech

alefiury

The wav2vec2-large-xlsr-53-gender-recognition-librispeech model is a fine-tuned version of the wav2vec2-xls-r-300m model specifically trained for gender recognition. It was trained on the Librispeech-clean-100 dataset, with 70% of the data used for training, 10% for validation, and 20% for testing. The model achieved a loss of 0.0061 and an F1 score of 0.9993 on the evaluation set. The training hyperparameters include a learning rate of 3e-05, a training batch size of 4, an evaluation batch size of 4, a seed of 42, gradient accumulation steps of 4, and a total train batch size of 16. The optimizer used is Adam with betas=(0.9,0.999) and epsilon=1e-08. The learning rate scheduler type is linear with a warm-up ratio of 0.1, and the model was trained for 1 epoch. The training was done using mixed precision training with native AMP. The model was implemented using Transformers 4.28.0, Pytorch 2.0.0+cu118, and Tokenizers 0.13.3.

Read more

$-/run

6.0K

Huggingface

wav2vec2-xls-r-300m-pt-br-spontaneous-speech-emotion-recognition

wav2vec2-xls-r-300m-pt-br-spontaneous-speech-emotion-recognition

Wav2vec 2.0 XLS-R For Spontaneous Speech Emotion Recognition This is the model that got first place in the SER track of the Automatic Speech Recognition for spontaneous and prepared speech & Speech Emotion Recognition in Portuguese (SE&R 2022) Workshop. The following datasets were used in the training: CORAA SER v1.0: a dataset composed of spontaneous portuguese speech and approximately 40 minutes of audio segments labeled in three classes: neutral, non-neutral female, and non-neutral male. EMOVO Corpus: a database of emotional speech for the Italian language, built from the voices of up to 6 actors who played 14 sentences simulating 6 emotional states (disgust, fear, anger, joy, surprise, sadness) plus the neutral state. RAVDESS: a dataset that provides 1440 samples of recordings from actors performing on 8 different emotions in English, which are: angry, calm, disgust, fearful, happy, neutral, sad and surprised. BAVED: a collection of audio recordings of Arabic words spoken with varying degrees of emotion. The dataset contains seven words: like, unlike, this, file, good, neutral, and bad, which are spoken at three emotional levels: low emotion (tired or feeling down), neutral emotion (the way the speaker speaks daily), and high emotion (positive or negative emotions such as happiness, joy, sadness, anger). The test set used is a part of the CORAA SER v1.0 that has been set aside for this purpose. It achieves the following results on the test set: Accuracy: 0.9090 Macro Precision: 0.8171 Macro Recall: 0.8397 Macro F1-Score: 0.8187 Datasets Details The following image shows the overall distribution of the datasets: The following image shows the number of instances by label: Repository The repository that implements the model to be trained and tested is avaible here.

Read more

$-/run

48

Huggingface

Similar creators