Asapp

Rank:

Average Model Cost: $0.0000

Number of Runs: 41,017

Models by this creator

sew-d-tiny-100k

sew-d-tiny-100k

asapp

SEW-D-tiny is a pre-trained model for speech feature extraction, specifically for tasks such as Automatic Speech Recognition, Speaker Identification, Intent Classification, Emotion Recognition, etc. It is based on the SEW (Squeezed and Efficient Wav2vec) architecture, which improves both performance and efficiency compared to the original wav2vec 2.0 model. SEW-D-tiny is pretrained on 16kHz sampled speech audio and should be fine-tuned on downstream tasks. It achieves a faster inference speed and a reduced word error rate compared to wav2vec 2.0. The original model can be found on GitHub.

Read more

$-/run

20.1K

Huggingface

sew-tiny-100k

sew-tiny-100k

SEW-tiny is a variant of the SEW (Squeezed and Efficient Wav2vec) model, which is a pre-trained model for automatic speech recognition (ASR). SEW-tiny is specifically trained on 16kHz sampled speech audio. This model can be fine-tuned for downstream tasks such as ASR, speaker identification, intent classification, and emotion recognition. It offers improvements in both performance and efficiency compared to the original Wav2vec 2.0 model. SEW-tiny achieves faster inference speed and reduced word error rates across different model sizes.

Read more

$-/run

20.1K

Huggingface

sew-d-tiny-100k-ft-ls100h

sew-d-tiny-100k-ft-ls100h

SEW-D-tiny SEW-D by ASAPP Research The base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. Note that this model should be fine-tuned on a downstream task, like Automatic Speech Recognition, Speaker Identification, Intent Classification, Emotion Recognition, etc... Paper: Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition Authors: Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi Abstract This paper is a study of performance-efficiency trade-offs in pre-trained models for automatic speech recognition (ASR). We focus on wav2vec 2.0, and formalize several architecture designs that influence both the model performance and its efficiency. Putting together all our observations, we introduce SEW (Squeezed and Efficient Wav2vec), a pre-trained model architecture with significant improvements along both performance and efficiency dimensions across a variety of training setups. For example, under the 100h-960h semi-supervised setup on LibriSpeech, SEW achieves a 1.9x inference speedup compared to wav2vec 2.0, with a 13.5% relative reduction in word error rate. With a similar inference time, SEW reduces word error rate by 25-50% across different model sizes. The original model can be found under https://github.com/asappresearch/sew#model-checkpoints . Usage To transcribe audio files the model can be used as a standalone acoustic model as follows: Evaluation This code snippet shows how to evaluate asapp/sew-d-tiny-100k-ft-ls100h on LibriSpeech's "clean" and "other" test data. Result (WER):

Read more

$-/run

394

Huggingface

sew-tiny-100k-ft-ls100h

sew-tiny-100k-ft-ls100h

SEW-tiny SEW by ASAPP Research The base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. Note that this model should be fine-tuned on a downstream task, like Automatic Speech Recognition, Speaker Identification, Intent Classification, Emotion Recognition, etc... Paper: Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition Authors: Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi Abstract This paper is a study of performance-efficiency trade-offs in pre-trained models for automatic speech recognition (ASR). We focus on wav2vec 2.0, and formalize several architecture designs that influence both the model performance and its efficiency. Putting together all our observations, we introduce SEW (Squeezed and Efficient Wav2vec), a pre-trained model architecture with significant improvements along both performance and efficiency dimensions across a variety of training setups. For example, under the 100h-960h semi-supervised setup on LibriSpeech, SEW achieves a 1.9x inference speedup compared to wav2vec 2.0, with a 13.5% relative reduction in word error rate. With a similar inference time, SEW reduces word error rate by 25-50% across different model sizes. The original model can be found under https://github.com/asappresearch/sew#model-checkpoints . Usage To transcribe audio files the model can be used as a standalone acoustic model as follows: Evaluation This code snippet shows how to evaluate asapp/sew-tiny-100k-ft-ls100h on LibriSpeech's "clean" and "other" test data. Result (WER):

Read more

$-/run

340

Huggingface

sew-d-mid-k127-400k-ft-ls100h

sew-d-mid-k127-400k-ft-ls100h

SEW-D-mid-k127 SEW-D by ASAPP Research The base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. Note that this model should be fine-tuned on a downstream task, like Automatic Speech Recognition, Speaker Identification, Intent Classification, Emotion Recognition, etc... Paper: Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition Authors: Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi Abstract This paper is a study of performance-efficiency trade-offs in pre-trained models for automatic speech recognition (ASR). We focus on wav2vec 2.0, and formalize several architecture designs that influence both the model performance and its efficiency. Putting together all our observations, we introduce SEW (Squeezed and Efficient Wav2vec), a pre-trained model architecture with significant improvements along both performance and efficiency dimensions across a variety of training setups. For example, under the 100h-960h semi-supervised setup on LibriSpeech, SEW achieves a 1.9x inference speedup compared to wav2vec 2.0, with a 13.5% relative reduction in word error rate. With a similar inference time, SEW reduces word error rate by 25-50% across different model sizes. The original model can be found under https://github.com/asappresearch/sew#model-checkpoints . Usage To transcribe audio files the model can be used as a standalone acoustic model as follows: Evaluation This code snippet shows how to evaluate asapp/sew-d-mid-k127-400k-ft-ls100hh on LibriSpeech's "clean" and "other" test data. Result (WER):

Read more

$-/run

26

Huggingface

sew-d-base-plus-400k-ft-ls100h

sew-d-base-plus-400k-ft-ls100h

SEW-D-base+ SEW-D by ASAPP Research The base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. Note that this model should be fine-tuned on a downstream task, like Automatic Speech Recognition, Speaker Identification, Intent Classification, Emotion Recognition, etc... Paper: Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition Authors: Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi Abstract This paper is a study of performance-efficiency trade-offs in pre-trained models for automatic speech recognition (ASR). We focus on wav2vec 2.0, and formalize several architecture designs that influence both the model performance and its efficiency. Putting together all our observations, we introduce SEW (Squeezed and Efficient Wav2vec), a pre-trained model architecture with significant improvements along both performance and efficiency dimensions across a variety of training setups. For example, under the 100h-960h semi-supervised setup on LibriSpeech, SEW achieves a 1.9x inference speedup compared to wav2vec 2.0, with a 13.5% relative reduction in word error rate. With a similar inference time, SEW reduces word error rate by 25-50% across different model sizes. The original model can be found under https://github.com/asappresearch/sew#model-checkpoints . Usage To transcribe audio files the model can be used as a standalone acoustic model as follows: Evaluation This code snippet shows how to evaluate asapp/sew-d-base-plus-400k-ft-ls100h on LibriSpeech's "clean" and "other" test data. Result (WER):

Read more

$-/run

23

Huggingface

e_branchformer_librispeech

e_branchformer_librispeech

ESPnet2 ASR model asapp/e_branchformer_librispeech This model was trained by Kwangyoun Kim using librispeech recipe in espnet. References: E-Branchformer: Branchformer with Enhanced merging for speech recognition (SLT 2022) Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding (ICML 2022) Demo: How to use in ESPnet2 Follow the ESPnet installation instructions if you haven't done that already. RESULTS Environments date: Mon Jan 2 12:59:49 UTC 2023 python version: 3.8.15 (default, Nov 24 2022, 15:19:38) [GCC 11.2.0] espnet version: espnet 202211 pytorch version: pytorch 1.10.1 Git hash: 7a203d55543df02f0369d5608cd6f3033119a135 Commit date: Fri Dec 23 00:58:49 2022 +0000 asr_train_asr_e_branchformer_raw_en_bpe5000_sp WER CER TER ASR config Citing ESPnet or arXiv:

Read more

$-/run

22

Huggingface

sew-d-base-100k

sew-d-base-100k

SEW-D-base SEW-D by ASAPP Research The base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. Note that this model should be fine-tuned on a downstream task, like Automatic Speech Recognition, Speaker Identification, Intent Classification, Emotion Recognition, etc... Paper: Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition Authors: Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi Abstract This paper is a study of performance-efficiency trade-offs in pre-trained models for automatic speech recognition (ASR). We focus on wav2vec 2.0, and formalize several architecture designs that influence both the model performance and its efficiency. Putting together all our observations, we introduce SEW (Squeezed and Efficient Wav2vec), a pre-trained model architecture with significant improvements along both performance and efficiency dimensions across a variety of training setups. For example, under the 100h-960h semi-supervised setup on LibriSpeech, SEW achieves a 1.9x inference speedup compared to wav2vec 2.0, with a 13.5% relative reduction in word error rate. With a similar inference time, SEW reduces word error rate by 25-50% across different model sizes. The original model can be found under https://github.com/asappresearch/sew#model-checkpoints . Usage See this blog for more information on how to fine-tune the model. Note that the class Wav2Vec2ForCTC has to be replaced by SEWDForCTC.

Read more

$-/run

21

Huggingface

sew-d-mid-400k-ft-ls100h

sew-d-mid-400k-ft-ls100h

SEW-D-mid SEW-D by ASAPP Research The base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. Note that this model should be fine-tuned on a downstream task, like Automatic Speech Recognition, Speaker Identification, Intent Classification, Emotion Recognition, etc... Paper: Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition Authors: Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi Abstract This paper is a study of performance-efficiency trade-offs in pre-trained models for automatic speech recognition (ASR). We focus on wav2vec 2.0, and formalize several architecture designs that influence both the model performance and its efficiency. Putting together all our observations, we introduce SEW (Squeezed and Efficient Wav2vec), a pre-trained model architecture with significant improvements along both performance and efficiency dimensions across a variety of training setups. For example, under the 100h-960h semi-supervised setup on LibriSpeech, SEW achieves a 1.9x inference speedup compared to wav2vec 2.0, with a 13.5% relative reduction in word error rate. With a similar inference time, SEW reduces word error rate by 25-50% across different model sizes. The original model can be found under https://github.com/asappresearch/sew#model-checkpoints . Usage To transcribe audio files the model can be used as a standalone acoustic model as follows: Evaluation This code snippet shows how to evaluate asapp/sew-d-mid-400k-ft-ls100hh on LibriSpeech's "clean" and "other" test data. Result (WER):

Read more

$-/run

19

Huggingface

sew-small-100k

sew-small-100k

SEW-small SEW by ASAPP Research The base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. Note that this model should be fine-tuned on a downstream task, like Automatic Speech Recognition, Speaker Identification, Intent Classification, Emotion Recognition, etc... Paper: Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition Authors: Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi Abstract This paper is a study of performance-efficiency trade-offs in pre-trained models for automatic speech recognition (ASR). We focus on wav2vec 2.0, and formalize several architecture designs that influence both the model performance and its efficiency. Putting together all our observations, we introduce SEW (Squeezed and Efficient Wav2vec), a pre-trained model architecture with significant improvements along both performance and efficiency dimensions across a variety of training setups. For example, under the 100h-960h semi-supervised setup on LibriSpeech, SEW achieves a 1.9x inference speedup compared to wav2vec 2.0, with a 13.5% relative reduction in word error rate. With a similar inference time, SEW reduces word error rate by 25-50% across different model sizes. The original model can be found under https://github.com/asappresearch/sew#model-checkpoints . Usage See this blog for more information on how to fine-tune the model. Note that the class Wav2Vec2ForCTC has to be replaced by SEWForCTC.

Read more

$-/run

15

Huggingface

Similar creators