Speechbrain

Rank:

Average Model Cost: $0.0000

Number of Runs: 602,537

Models by this creator

spkrec-ecapa-voxceleb

spkrec-ecapa-voxceleb

speechbrain

The spkrec-ecapa-voxceleb model is a pretrained model for speaker verification using ECAPA-TDNN embeddings on the Voxceleb dataset. It is trained on Voxceleb 1+ Voxceleb2 training data and can be used to extract speaker embeddings. The model is composed of an ECAPA-TDNN architecture with convolutional and residual blocks, and uses attentive statistical pooling to extract embeddings. Speaker verification is performed using cosine distance between speaker embeddings. The model can be used for inference on both CPU and GPU, and it was trained using SpeechBrain. Please cite SpeechBrain if you use this model for your research or business.

Read more

$-/run

549.5K

Huggingface

emotion-recognition-wav2vec2-IEMOCAP

emotion-recognition-wav2vec2-IEMOCAP

The emotion-recognition-wav2vec2-IEMOCAP model is a machine learning model trained to recognize emotions in audio data. It is trained on the IEMOCAP dataset, which contains recordings of acted emotional speech. The model uses the wav2vec2 architecture, which is based on self-supervised training on unlabeled audio data. It is fine-tuned on the emotion recognition task using labeled audio data. The model can be used to classify the emotions in audio recordings, such as happiness, sadness, anger, etc.

Read more

$-/run

23.9K

Huggingface

lang-id-voxlingua107-ecapa

lang-id-voxlingua107-ecapa

VoxLingua107 ECAPA-TDNN Spoken Language Identification Model Model description This is a spoken language recognition model trained on the VoxLingua107 dataset using SpeechBrain. The model uses the ECAPA-TDNN architecture that has previously been used for speaker recognition. However, it uses more fully connected hidden layers after the embedding layer, and cross-entropy loss was used for training. We observed that this improved the performance of extracted utterance embeddings for downstream tasks. The system is trained with recordings sampled at 16kHz (single channel). The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling classify_file if needed. The model can classify a speech utterance according to the language spoken. It covers 107 different languages ( Abkhazian, Afrikaans, Amharic, Arabic, Assamese, Azerbaijani, Bashkir, Belarusian, Bulgarian, Bengali, Tibetan, Breton, Bosnian, Catalan, Cebuano, Czech, Welsh, Danish, German, Greek, English, Esperanto, Spanish, Estonian, Basque, Persian, Finnish, Faroese, French, Galician, Guarani, Gujarati, Manx, Hausa, Hawaiian, Hindi, Croatian, Haitian, Hungarian, Armenian, Interlingua, Indonesian, Icelandic, Italian, Hebrew, Japanese, Javanese, Georgian, Kazakh, Central Khmer, Kannada, Korean, Latin, Luxembourgish, Lingala, Lao, Lithuanian, Latvian, Malagasy, Maori, Macedonian, Malayalam, Mongolian, Marathi, Malay, Maltese, Burmese, Nepali, Dutch, Norwegian Nynorsk, Norwegian, Occitan, Panjabi, Polish, Pushto, Portuguese, Romanian, Russian, Sanskrit, Scots, Sindhi, Sinhala, Slovak, Slovenian, Shona, Somali, Albanian, Serbian, Sundanese, Swedish, Swahili, Tamil, Telugu, Tajik, Thai, Turkmen, Tagalog, Turkish, Tatar, Ukrainian, Urdu, Uzbek, Vietnamese, Waray, Yiddish, Yoruba, Mandarin Chinese). Intended uses & limitations The model has two uses: use 'as is' for spoken language recognition use as an utterance-level feature (embedding) extractor, for creating a dedicated language ID model on your own data The model is trained on automatically collected YouTube data. For more information about the dataset, see here. To perform inference on the GPU, add run_opts={"device":"cuda"} when calling the from_hparams method. The system is trained with recordings sampled at 16kHz (single channel). The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling classify_file if needed. Make sure your input tensor is compliant with the expected sampling rate if you use encode_batch and classify_batch. Since the model is trained on VoxLingua107, it has many limitations and biases, some of which are: Probably it's accuracy on smaller languages is quite limited Probably it works worse on female speech than male speech (because YouTube data includes much more male speech) Based on subjective experiments, it doesn't work well on speech with a foreign accent Probably it doesn't work well on children's speech and on persons with speech disorders Training data The model is trained on VoxLingua107. VoxLingua107 is a speech dataset for training spoken language identification models. The dataset consists of short speech segments automatically extracted from YouTube videos and labeled according the language of the video title and description, with some post-processing steps to filter out false positives. VoxLingua107 contains data for 107 languages. The total amount of speech in the training set is 6628 hours. The average amount of data per language is 62 hours. However, the real amount per language varies a lot. There is also a seperate development set containing 1609 speech segments from 33 languages, validated by at least two volunteers to really contain the given language. Training procedure See the SpeechBrain recipe. Evaluation results Error rate: 6.7% on the VoxLingua107 development dataset Referencing VoxLingua107 SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains. Website: https://speechbrain.github.io/ GitHub: https://github.com/speechbrain/speechbrain

Read more

$-/run

4.1K

Huggingface

tts-hifigan-ljspeech

tts-hifigan-ljspeech

Vocoder with HiFIGAN trained on LJSpeech This repository provides all the necessary tools for using a HiFIGAN vocoder trained with LJSpeech. The pre-trained model takes in input a spectrogram and produces a waveform in output. Typically, a vocoder is used after a TTS model that converts an input text into a spectrogram. The sampling frequency is 22050 Hz. Install SpeechBrain Please notice that we encourage you to read our tutorials and learn more about SpeechBrain. Using the Vocoder Using the Vocoder with the TTS Inference on GPU To perform inference on the GPU, add run_opts={"device":"cuda"} when calling the from_hparams method. Training The model was trained with SpeechBrain. To train it from scratch follow these steps: Clone SpeechBrain: Install it: Run Training: You can find our training results (models, logs, etc) here.

Read more

$-/run

3.7K

Huggingface

sepformer-wsj02mix

sepformer-wsj02mix

SepFormer trained on WSJ0-2Mix This repository provides all the necessary tools to perform audio source separation with a SepFormer model, implemented with SpeechBrain, and pretrained on WSJ0-2Mix dataset. For a better experience we encourage you to learn more about SpeechBrain. The model performance is 22.4 dB on the test set of WSJ0-2Mix dataset. You can listen to example results obtained on the test set of WSJ0-2/3Mix through here. Install SpeechBrain First of all, please install SpeechBrain with the following command: Please notice that we encourage you to read our tutorials and learn more about SpeechBrain. Perform source separation on your own audio file The system expects input recordings sampled at 8kHz (single channel). If your signal has a different sample rate, resample it (e.g, using torchaudio or sox) before using the interface. Inference on GPU To perform inference on the GPU, add run_opts={"device":"cuda"} when calling the from_hparams method. Training The model was trained with SpeechBrain (fc2eabb7). To train it from scratch follows these steps: Clone SpeechBrain: Install it: Run Training: You can find our training results (models, logs, etc) here. Limitations The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets. About SpeechBrain Website: https://speechbrain.github.io/ Code: https://github.com/speechbrain/speechbrain/ HuggingFace: https://huggingface.co/speechbrain/

Read more

$-/run

2.1K

Huggingface

Similar creators