Generate subtitles (.srt and .vtt) from audio files using OpenAI's Whisper models.

## Model overview

The `whisper-subtitles` model is a forked version of the [m1guelpf/whisper-subtitles](https://replicate.com/m1guelpf/whisper-subtitles) model, which uses OpenAI's [Whisper](https://github.com/openai/whisper) speech recognition model to generate subtitles in `.srt` and `.vtt` formats from audio files. This model adds support for voice activity detection (VAD) to filter out parts of the audio without speech, the ability to select a language, and the use of language-specific Whisper models. It also allows you to download the generated subtitle files directly from the model output.

## Model inputs and outputs

The `whisper-subtitles` model takes an audio file, a Whisper model name, a language, and an option to enable VAD filtering as inputs. It outputs the generated subtitle files in both `.srt` and `.vtt` formats.

### Inputs
- **audio_path**: The path to the audio file to generate subtitles for.
- **model_name**: The name of the Whisper model to use, with "small" being the default.
- **language**: The language of the audio, with "en" (English) being the default.
- **vad_filter**: A boolean value to enable or disable voice activity detection (VAD) filtering, which is set to true by default.

### Outputs
- **srt_file**: The generated subtitle file in the SubRip Subtitle (`.srt`) format.
- **vtt_file**: The generated subtitle file in the Web Video Text Tracks (`.vtt`) format.

## Capabilities

The `whisper-subtitles` model can generate accurate subtitles for a wide range of audio files in different languages. It uses the powerful Whisper speech recognition model, which has been shown to perform well on various speech recognition tasks. The addition of VAD filtering and language-specific models further improves the quality and accuracy of the generated subtitles.

## What can I use it for?

The `whisper-subtitles` model can be useful for a variety of applications, such as:

- **Video captioning**: Add subtitles to your videos to make them more accessible and engaging for viewers.
- **Podcast transcription**: Generate transcripts of your podcast episodes to make them searchable and shareable.
- **Language learning**: Use the subtitles to improve your language skills by following along with audio content.
- **Accessibility**: Provide subtitles for audio and video content to make it more accessible for people with hearing impairments.

## Things to try

One interesting thing to try with the `whisper-subtitles` model is to experiment with the different Whisper model sizes and language-specific models. The "small" model is the default, but the larger models may provide better accuracy, especially for more complex or noisy audio. You can also try enabling and disabling the VAD filtering to see how it affects the quality of the generated subtitles.