cog-whisperx-withprompt is a fork of the WhisperX transcription model that exposes the initial_prompt parameter. This allows users to provide an optional text prompt to guide the transcription process for the first audio window. This model is built on top of the original Whisper model and inherits its capabilities, while adding the ability to customize the initial prompt. Model inputs and outputs The cog-whisperx-withprompt model takes several inputs to customize the transcription process. These include the audio file to be transcribed, a debug flag to print memory usage information, a batch size for parallelization, an option to align the output with word-level timestamps, and the initial prompt text. Inputs audio**: The audio file to be transcribed debug**: A boolean flag to print memory usage information batch_size**: An integer specifying the number of audio samples to process in parallel align_output**: A boolean flag to enable word-level timestamp alignment in the output initial_prompt**: An optional string to provide as a prompt for the first window of the audio Outputs Output**: The transcribed text from the input audio Capabilities The cog-whisperx-withprompt model inherits the powerful speech recognition capabilities of the Whisper model, including the ability to accurately transcribe audio in a wide range of languages. The addition of the initial_prompt parameter allows users to customize the transcription process, potentially improving accuracy or directing the model's output in specific ways. What can I use it for? The cog-whisperx-withprompt model can be used for a variety of speech-to-text applications, such as transcribing audio recordings, generating captions for videos, or automating the processing of voice-based data. The ability to provide an initial prompt can be particularly useful in scenarios where the audio content is domain-specific, and the user wants to guide the model's understanding of the context. Things to try One interesting thing to try with the cog-whisperx-withprompt model is to experiment with different initial prompts and observe how they affect the transcription output. Users could try prompts that provide background information, set the tone or mood, or introduce specific terminology and concepts relevant to the audio content. This can help uncover the model's sensitivity to contextual cues and its ability to adapt its transcription to the user's needs.

Updated 6/21/2024