AI model preview image
The Whisperx model is an Automatic Speech Recognition (ASR) model with a word alignment feature. It uses the whisper medium which is sized at 769M. The model converts audio inputs into text, and supports a number of customization parameters including the ability to run in debug mode, to output only text, to batch the audio inputs for conversion, and to align the output. The input to this model is a JSON object containing an audio file URL and optional debug, only_text, batch_size, and align_output parameters. The output is a JSON array containing objects with start and end time for each word spoken in the audio, the corresponding text of the spoken words, as well as a detailed breakdown which includes the start and end time and a score for each word.

Use cases

The WhisperX AI model has a broad range of use cases. Being an audio-to-text model, one of its primary uses would be in transcription services which can be used in different industries, pursuits, and professions. In a corporate environment, it can take audio recordings from meetings, dictations, interviews, conference calls, and turn them into written documents, helping businesses keep accurate, written records of their activities. The WhisperX AI model can also be used to create closed captions for videos and films which would improve accessibility for the hard-of-hearing and non-native speakers. With its ability to segregate the output by individual words and their corresponding timings, it can be very useful for language learning apps where users can look up the pronunciation and timing of individual words in a sentence. Other practical uses could be in voice-controlled assistants, or on platforms to conduct sentiment analysis from customer calls for businesses. Furthermore, it would be beneficial for people with disabilities, like visual impairment, or people who are not able to write manually.



Cost per run
Avg run time

Creator Models

No other models by this creator

Similar Models

Try it!

You can use this area to play around with demo applications that incorporate the Whisperx model. These demos are maintained and hosted externally by third-party creators. If you see an error, message me on Twitter.

Currently, there are no demos available for this model.


Summary of this model and related resources.

Model NameWhisperx
ASR with word alignment based on whisperx using whisper medium (769M)
Model LinkView on Replicate
API SpecView on Replicate
Github LinkView on Github
Paper LinkView on Arxiv


How popular is this model, by number of runs? How popular is the creator, by the sum of all their runs?

Model Rank
Creator Rank


How much does it cost to run this model? How long, on average, does it take to complete a run?

Cost per Run$-
Prediction Hardware-
Average Completion Time-