Get a weekly rundown of the latest AI models and research... subscribe! https://aimodels.substack.com/

Nguyenvulebinh

Rank:

Average Model Cost: $0.0000

Number of Runs: 8,275

Models by this creator

๐Ÿ‹๏ธ

wav2vec2-base-vietnamese-250h

Facebook's Wav2Vec2 Our models are pre-trained on 13k hours of Vietnamese youtube audio (un-label data) and fine-tuned on 250 hours labeled of VLSP ASR dataset on 16kHz sampled speech audio. We use wav2vec2 architecture for the pre-trained model. Follow wav2vec2 paper: For fine-tuning phase, wav2vec2 is fine-tuned using Connectionist Temporal Classification (CTC), which is an algorithm that is used to train neural networks for sequence-to-sequence problems and mainly in Automatic Speech Recognition and handwriting recognition. In a formal ASR system, two components are required: acoustic model and language model. Here ctc-wav2vec fine-tuned model works as an acoustic model. For the language model, we provide a 4-grams model trained on 2GB of spoken text. Detail of training and fine-tuning process, the audience can follow fairseq github and huggingface blog. When using the model make sure that your speech input is sampled at 16Khz. Audio length should be shorter than 10s. Following the Colab link below to use a combination of CTC-wav2vec and 4-grams LM. The ASR model parameters are made available for non-commercial use only, under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. You can find details at: https://creativecommons.org/licenses/by-nc/4.0/legalcode Please CITE our repo when it is used to help produce published results or is incorporated into other software. nguyenvulebinh@gmail.com / binh@vietai.org

Read more

$-/run

2.4K

Huggingface

๐ŸŒ€

vi-mrc-base

Model Description Language model: XLM-RoBERTa Fine-tune: MRCQuestionAnswering Language: Vietnamese, Englsih Downstream-task: Extractive QA Dataset (combine English and Vietnamese): Squad 2.0 mailong25 UIT-ViQuAD MultiLingual Question Answering This model is intended to be used for QA in the Vietnamese language so the valid set is Vietnamese only (but English works fine). The evaluation result below using 10% of the Vietnamese dataset. MRCQuestionAnswering using XLM-RoBERTa as a pre-trained language model. By default, XLM-RoBERTa will split word in to sub-words. But in my implementation, I re-combine sub-words representation (after encoded by BERT layer) into word representation using sum strategy. Using pre-trained model Hugging Face pipeline style (NOT using sum features strategy). More accurate infer process (Using sum features strategy) About Built by Binh Nguyen For more details, visit the project repository.

Read more

$-/run

2.0K

Huggingface

๐Ÿ“ˆ

wav2vec2-base-vi

Vietnamese Self-Supervised Learning Wav2Vec2 model Model We use wav2vec2 architecture for doing Self-Supervised learning Data Our self-supervised model is pre-trained on a massive audio set of 13k hours of Vietnamese youtube audio, which includes: Clean audio Noise audio Conversation Multi-gender and dialects Download We have already upload our pre-trained model to the Huggingface. The base model trained 35 epochs and the large model trained 20 epochs in about 30 days using TPU V3-8. Based version ~ 95M params Large version ~ 317M params Usage Since our model has the same architecture as the English wav2vec2 version, you can use this notebook for more information on how to fine-tune the model. Finetuned version VLSP 2020 ASR dataset Benchmark WER result on VLSP T1 testset: Usage Acknowledgment We would like to thank the Google TPU Research Cloud (TRC) program and Soonson Kwon (Google ML Ecosystem programs Lead) for their support. Special thanks to my colleagues at VietAI and VAIS for their advice. Contact nguyenvulebinh@gmail.com / binh@vietai.org

Read more

$-/run

1.8K

Huggingface

๐Ÿงช

vi-mrc-large

Model Description Language model: XLM-RoBERTa Fine-tune: MRCQuestionAnswering Language: Vietnamese, Englsih Downstream-task: Extractive QA Dataset (combine English and Vietnamese): Squad 2.0 mailong25 VLSP MRC 2021 MultiLingual Question Answering This model is intended to be used for QA in the Vietnamese language so the valid set is Vietnamese only (but English works fine). The evaluation result below uses the VLSP MRC 2021 test set. This experiment achieves TOP 1 on the leaderboard. MRCQuestionAnswering using XLM-RoBERTa as a pre-trained language model. By default, XLM-RoBERTa will split word in to sub-words. But in my implementation, I re-combine sub-words representation (after encoded by BERT layer) into word representation using sum strategy. Using pre-trained model Hugging Face pipeline style (NOT using sum features strategy). More accurate infer process (Using sum features strategy) About Built by Binh Nguyen For more details, visit the project repository.

Read more

$-/run

574

Huggingface

๐Ÿคฏ

lyric-alignment

No description available.

Read more

$-/run

486

Huggingface

โœ…

envibert

RoBERTa for Vietnamese and English (envibert) This RoBERTa version is trained by using 100GB of text (50GB of Vietnamese and 50GB of English) so it is named envibert. The model architecture is custom for production so it only contains 70M parameters. Usages Citation Please CITE our repo when it is used to help produce published results or is incorporated into other software. Contact nguyenvulebinh@gmail.com

Read more

$-/run

462

Huggingface

โ†—๏ธ

wav2vec2-large-vi-vlsp2020

Model description Our models use wav2vec2 architecture, pre-trained on 13k hours of Vietnamese youtube audio (un-label data) and fine-tuned on 250 hours labeled of VLSP ASR dataset on 16kHz sampled speech audio. You can find more description here Benchmark WER result on VLSP T1 testset: Usage Model Parameters License The ASR model parameters are made available for non-commercial use only, under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. You can find details at: https://creativecommons.org/licenses/by-nc/4.0/legalcode Contact nguyenvulebinh@gmail.com

Read more

$-/run

411

Huggingface

๐Ÿ…

spoken-norm

Transformation spoken text to written text This model is used for formatting raw asr text output from spoken text to written text (Eg. date, number, id, ...). It also supports formatting "out of vocab" by using external vocabulary. Some of examples: Model architecture Infer model Play around at Huggingface Space Init tokenizer and model Infer sample Format input text with bias phrases Format input text without bias phrases Contact nguyenvulebinh@gmail.com

Read more

$-/run

67

Huggingface

๐Ÿ–ผ๏ธ

voice-filter

No description available.

Read more

$-/run

49

Huggingface

โ›๏ธ

deltalm-base

Platform did not provide a description for this model.

Read more

$-/run

40

Huggingface

Similar creators