Models by this creator




Total Score


whisperkit-coreml_01-30-24 is a set of optimized variants of the OpenAI Whisper model, created by the maintainer argmaxinc. These variants aim to provide improved performance and reduced model size compared to the original Whisper large-v3 model, while maintaining high quality of inference (QoI). The models were evaluated on the LibriSpeech dataset, with the best-performing variant achieving a word error rate (WER) of 2.41% and a QoI of 99.8%, while reducing the model size to 3100 MB. Model inputs and outputs Inputs Audio data in the form of log-mel spectrograms Outputs Transcribed text in the target language (English) Capabilities The whisperkit-coreml_01-30-24 models demonstrate improved robustness and performance compared to the original Whisper large-v3 model, particularly on the LibriSpeech dataset. The optimized variants offer significantly reduced model size and latency, making them more suitable for deployment on resource-constrained devices or in real-time applications. What can I use it for? The whisperkit-coreml_01-30-24 models can be used for a variety of speech recognition tasks, such as transcribing audio recordings, enabling voice-controlled interfaces, or improving accessibility for the hearing impaired. The reduced model size and latency also make these models suitable for integration into mobile apps, edge devices, or other applications where computational resources are limited. Things to try Developers can explore using the whisperkit-coreml_01-30-24 models in their speech recognition pipelines, either as a drop-in replacement for the original Whisper large-v3 model or as a component in more complex audio processing workflows. Additionally, researchers may be interested in further analyzing the tradeoffs between model size, latency, and QoI to inform the development of even more efficient speech recognition models.

Read more

Updated 5/28/2024