🔊 Text-Prompted Generative Audio Model

## Model overview

`Bark` is a transformer-based text-to-audio model created by [Suno](https://www.suno.ai). Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying. Bark is similar to other advanced text-to-speech models like [Vall-E](https://arxiv.org/abs/2301.02111) and [AudioLM](https://arxiv.org/abs/2209.03143), but it can generate a wider range of audio beyond just speech.

## Model inputs and outputs

Bark takes in a text prompt and generates an audio waveform. The model uses a three-stage process to convert the text into audio - first mapping the text to semantic tokens, then to coarse audio tokens, and finally to fine-grained audio waveform tokens.

### Inputs
- **Prompt**: The text prompt to be converted to audio

### Outputs
- **Audio waveform**: The generated audio waveform corresponding to the input text prompt

## Capabilities

Bark can generate highly realistic and expressive speech in over a dozen languages, including English, German, Spanish, French, Hindi, and more. It can also produce non-speech sounds like music, laughter, sighs, and other sound effects. The model is capable of adjusting attributes like tone, emotion, and prosody to match the specified context.

## What can I use it for?

Bark's text-to-audio capabilities can be useful for a variety of applications, such as:

- Improving accessibility by generating audio narrations for content
- Enhancing interactive experiences with natural-sounding voice interfaces
- Automating the creation of audio content like podcasts, audiobooks, and voiceovers
- Generating sound effects and background audio for multimedia projects

## Things to try

Some interesting things to explore with Bark include:

- Generating multilingual speech by mixing languages in the prompts
- Experimenting with different ways to guide the model's output, such as using speaker prompts or adding musical notation
- Trying to clone specific voices by providing audio samples as history prompts
- Using Bark to generate audio for interactive stories, games, or other immersive experiences