[](#openais-whisper-models-converted-to-ggml-format)OpenAI's Whisper models converted to ggml format
====================================================================================================

[Available models](https://huggingface.co/ggerganov/whisper.cpp/tree/main)

Model

Disk

Mem

SHA

tiny

75 MB

~390 MB

`bd577a113a864445d4c299885e0cb97d4ba92b5f`

tiny.en

75 MB

~390 MB

`c78c86eb1a8faa21b369bcd33207cc90d64ae9df`

base

142 MB

~500 MB

`465707469ff3a37a2b9b8d8f89f2f99de7299dac`

base.en

142 MB

~500 MB

`137c40403d78fd54d454da0f9bd998f78703390c`

small

466 MB

~1.0 GB

`55356645c2b361a969dfd0ef2c5a50d530afd8d5`

small.en

466 MB

~1.0 GB

`db8a495a91d927739e50b3fc1cc4c6b8f6c2d022`

medium

1.5 GB

~2.6 GB

`fd9727b6e1217c2f614f9b698455c4ffd82463b4`

medium.en

1.5 GB

~2.6 GB

`8c30f0e44ce9560643ebd10bbe50cd20eafd3723`

large-v1

2.9 GB

~4.7 GB

`b1caaf735c4cc1429223d5a74f0f4d0b9b59a299`

large-v2

2.9 GB

~4.7 GB

`0f4c8e34f21cf1a914c59d8b3ce882345ad349d6`

large

2.9 GB

~4.7 GB

`ad82bf6a9043ceed055076d0fd39f5f186ff8062`

note: `large` corresponds to the latest Large v3 model

For more information, visit:

[https://github.com/ggerganov/whisper.cpp/tree/master/models](https://github.com/ggerganov/whisper.cpp/tree/master/models)

## Model Overview

`whisper.cpp` is a collection of OpenAI's Whisper models that have been converted to the ggml format by the maintainer [ggerganov](https://aimodels.fyi/creators/huggingFace/ggerganov). Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation, trained on 680k hours of labeled data. It demonstrates a strong ability to generalize to many datasets and domains without the need for fine-tuning.

Similar Whisper models available on the Hugging Face Hub include [whisper-large-v3](https://aimodels.fyi/models/huggingFace/whisper-large-v3-openai), [whisper-tiny.en](https://aimodels.fyi/models/huggingFace/whisper-tinyen-openai), [whisper-large-v2](https://aimodels.fyi/models/huggingFace/whisper-large-v2-openai), [whisper-small](https://aimodels.fyi/models/huggingFace/whisper-small-openai), and [whisper-large](https://aimodels.fyi/models/huggingFace/whisper-large-openai). These models vary in size and capabilities, with the larger models generally performing better but requiring more compute resources.

## Model Inputs and Outputs

The `whisper.cpp` models take audio as input and output text transcriptions. The models are informed of the task to perform (transcription or translation) by passing "context tokens" to the decoder at the start of the decoding process.

### Inputs
- Audio data

### Outputs
- Text transcriptions or translations

## Capabilities

The `whisper.cpp` models exhibit improved robustness to accents, background noise, and technical language compared to many existing ASR systems. They also demonstrate the ability to perform zero-shot translation from multiple languages into English.

## What Can I Use It For?

The `whisper.cpp` models can be used for a variety of audio-to-text applications, such as:

- Improving accessibility tools by providing speech-to-text capabilities
- Enabling near-real-time speech recognition and translation by building applications on top of the models
- Automating transcription and translation of large volumes of audio data

While the models show strong performance, the maintainers caution against using them for high-risk applications or subjective classification tasks, as they may exhibit disparate performance across languages, accents, and demographics.

## Things to Try

One interesting aspect of the `whisper.cpp` models is their ability to perform long-form transcription by using a chunking algorithm. This allows the models to transcribe audio samples of arbitrary length, rather than being limited to short 30-second clips. You can experiment with this functionality using the Transformers pipeline in Python.

Another interesting area to explore is fine-tuning the pre-trained Whisper models on specific datasets or tasks. The maintainers provide a blog post with a step-by-step guide on how to fine-tune the Whisper model with as little as 5 hours of labeled data, which can further improve the models' performance for your particular use case.