### [](#summary)Summary

This model map provides information about a model based on Whisper Large v3 that has been fine-tuned for speech recognition in German. Whisper is a powerful speech recognition platform developed by OpenAI. This model has been specially optimized for processing and recognizing German speech.

### [](#applications)Applications

This model can be used in various application areas, including

*   Transcription of spoken German language
*   Voice commands and voice control
*   Automatic subtitling for German videos
*   Voice-based search queries in German
*   Dictation functions in word processing programs

[](#model-family)Model family
-----------------------------

Model

Parameters

link

Whisper large v3 german

1.54B

[link](https://huggingface.co/primeline/whisper-large-v3-german)

Distil-whisper large v3 german

756M

[link](https://huggingface.co/primeline/distil-whisper-large-v3-german)

tiny whisper

37.8M

[link](https://huggingface.co/primeline/whisper-tiny-german)

### [](#training-data)Training data

The training data for this model includes a large amount of spoken German from various sources. The data was carefully selected and processed to optimize recognition performance.

### [](#training-process)Training process

The training of the model was performed with the following hyperparameters

*   Batch size: 1024
*   Epochs: 2
*   Learning rate: 1e-5
*   Data augmentation: No

### [](#how-to-use)How to use

    import torch
    from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
    from datasets import load_dataset
    device = "cuda:0" if torch.cuda.is_available() else "cpu"
    torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
    model_id = "primeline/whisper-large-v3-german"
    model = AutoModelForSpeechSeq2Seq.from_pretrained(
        model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
    )
    model.to(device)
    processor = AutoProcessor.from_pretrained(model_id)
    pipe = pipeline(
        "automatic-speech-recognition",
        model=model,
        tokenizer=processor.tokenizer,
        feature_extractor=processor.feature_extractor,
        max_new_tokens=128,
        chunk_length_s=30,
        batch_size=16,
        return_timestamps=True,
        torch_dtype=torch_dtype,
        device=device,
    )
    dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
    sample = dataset[0]["audio"]
    result = pipe(sample)
    print(result["text"])
    

[](#about-us)About us
---------------------

[![primeline AI](https://primeline-ai.com/wp-content/uploads/2024/02/pl_ai_bildwortmarke_original.svg)](https://primeline-ai.com/en/)

Your partner for AI infrastructure in Germany  
Experience the powerful AI infrastructure that drives your ambitions in Deep Learning, Machine Learning & High-Performance Computing. Optimized for AI training and inference.

Model author: [Florian Zimmermeister](https://huggingface.co/flozi00)

## Model overview

The `whisper-large-v3-german` model is a powerful speech recognition system developed by Primeline, a leading AI infrastructure provider in Germany. This model is based on the Whisper Large v3 architecture, which was originally created by OpenAI, and has been fine-tuned specifically for German speech. The model is capable of accurately transcribing German speech, making it useful for a variety of applications such as video subtitling, voice commands, and dictation. In addition to the large version, Primeline also offers a distilled model called `distil-whisper-large-v3-german` and a smaller `tiny whisper` model, providing options to meet different performance and resource requirements.

## Model inputs and outputs

The `whisper-large-v3-german` model takes audio data as input and outputs the corresponding text transcript. The audio input can be in various formats, and the model is designed to handle a wide range of audio quality and background noise levels.

### Inputs
- Audio data, such as WAV or MP3 files

### Outputs
- Text transcript of the input audio in German

## Capabilities

The `whisper-large-v3-german` model is capable of accurately transcribing a wide range of German speech, including formal and informal speech, different accents, and even speech with background noise. The model has been trained on a large and diverse dataset of German audio, enabling it to handle a variety of real-world scenarios.

## What can I use it for?

The `whisper-large-v3-german` model can be used in a variety of applications that require accurate German speech recognition. Some potential use cases include:

- Transcription of German audio recordings, such as interviews, lectures, or meeting recordings
- Automatic subtitling of German videos, improving accessibility for viewers
- Voice-controlled interfaces and virtual assistants for German-speaking users
- Dictation functions in German-language word processing applications

## Things to try

One interesting aspect of the `whisper-large-v3-german` model is its ability to handle diverse audio inputs, including speech with background noise or non-native accents. Developers could experiment with using the model to transcribe audio recordings from different environments, such as noisy public spaces or formal presentations, to see how it performs. Additionally, the model could be integrated into various applications, such as video players or voice assistants, to provide seamless German speech recognition capabilities.