Models by this creator




Total Score


metavoice-1B-v0.1 is a 1.2B parameter base model for text-to-speech (TTS), trained by metavoiceio on 100K hours of speech. It has been built with a focus on emotional speech rhythm and tone in English, without hallucinations. The model supports voice cloning with as little as 1 minute of training data for Indian speakers, and zero-shot cloning for American and British voices with just 30 seconds of reference audio. It also handles long-form synthesis. Similar models include metavoice by camenduru, and WhisperSpeech by collabora, which is an open-source text-to-speech system built by inverting Whisper. Model inputs and outputs Inputs Text prompts for TTS generation Outputs Synthesized speech audio in a variety of voices and emotional tones Capabilities metavoice-1B-v0.1 can generate emotional and expressive speech from text inputs, with the ability to clone voices from as little as 30 seconds of reference audio. It supports long-form synthesis, making it suitable for generating speech for extended passages of text. What can I use it for? The metavoice-1B-v0.1 model can be used to create engaging and personalized TTS applications, such as audiobook narration, podcast generation, or virtual assistant voices. Its voice cloning capabilities allow for easy customization and personalization of speech output. Developers could integrate the model into their applications to provide high-quality, emotionally-expressive speech synthesis. Things to try Experiment with the model's ability to clone different accents and voices, even with minimal reference audio. Try generating long-form speech passages and observe the consistency and expressiveness of the output. Explore the model's robustness to different text inputs and genres, from formal to casual language.

Read more

Updated 5/23/2024