Cvssp

Rank:

Average Model Cost: $0.0000

Number of Runs: 25,977

Models by this creator

audioldm

audioldm

cvssp

No description available.

Read more

$-/run

20.3K

Huggingface

audioldm-s-full-v2

audioldm-s-full-v2

AudioLDM AudioLDM is a latent text-to-audio diffusion model capable of generating realistic audio samples given any text input. It is available in the 🧨 Diffusers library from v0.15.0 onwards. Model Details AudioLDM was proposed in the paper AudioLDM: Text-to-Audio Generation with Latent Diffusion Models by Haohe Liu et al. Inspired by Stable Diffusion, AudioLDM is a text-to-audio latent diffusion model (LDM) that learns continuous audio representations from CLAP latents. AudioLDM takes a text prompt as input and predicts the corresponding audio. It can generate text-conditional sound effects, human speech and music. Checkpoint Details This is the small v2 version of the AudioLDM model, which is the same size as the original AudioLDM small checkpoint, but trained for more steps. The four AudioLDM checkpoints are summarised below: Table 1: Summary of the AudioLDM checkpoints. Model Sources Original Repository 🧨 Diffusers Pipeline Paper Demo Usage First, install the required packages: Text-to-Audio For text-to-audio generation, the AudioLDMPipeline can be used to load pre-trained weights and generate text-conditional audio outputs: The resulting audio output can be saved as a .wav file: Or displayed in a Jupyter Notebook / Google Colab: Tips Prompts: Descriptive prompt inputs work best: you can use adjectives to describe the sound (e.g. "high quality" or "clear") and make the prompt context specific (e.g., "water stream in a forest" instead of "stream"). It's best to use general terms like 'cat' or 'dog' instead of specific names or abstract objects that the model may not be familiar with. Inference: The quality of the predicted audio sample can be controlled by the num_inference_steps argument: higher steps give higher quality audio at the expense of slower inference. The length of the predicted audio sample can be controlled by varying the audio_length_in_s argument. Citation BibTeX:

Read more

$-/run

2.4K

Huggingface

audioldm-l-full

audioldm-l-full

AudioLDM AudioLDM is a latent text-to-audio diffusion model capable of generating realistic audio samples given any text input. It is available in the 🧨 Diffusers library from v0.15.0 onwards. Model Details AudioLDM was proposed in the paper AudioLDM: Text-to-Audio Generation with Latent Diffusion Models by Haohe Liu et al. Inspired by Stable Diffusion, AudioLDM is a text-to-audio latent diffusion model (LDM) that learns continuous audio representations from CLAP latents. AudioLDM takes a text prompt as input and predicts the corresponding audio. It can generate text-conditional sound effects, human speech and music. Checkpoint Details This is the large version of the AudioLDM model, with twice the number of UNet channels and head channels as the small checkpoints. The four AudioLDM checkpoints are summarised in the table below: Table 1: Summary of the AudioLDM checkpoints. Model Sources Original Repository 🧨 Diffusers Pipeline Paper Demo Usage First, install the required packages: Text-to-Audio For text-to-audio generation, the AudioLDMPipeline can be used to load pre-trained weights and generate text-conditional audio outputs: The resulting audio output can be saved as a .wav file: Or displayed in a Jupyter Notebook / Google Colab: Tips Prompts: Descriptive prompt inputs work best: you can use adjectives to describe the sound (e.g. "high quality" or "clear") and make the prompt context specific (e.g., "water stream in a forest" instead of "stream"). It's best to use general terms like 'cat' or 'dog' instead of specific names or abstract objects that the model may not be familiar with. Inference: The quality of the predicted audio sample can be controlled by the num_inference_steps argument: higher steps give higher quality audio at the expense of slower inference. The length of the predicted audio sample can be controlled by varying the audio_length_in_s argument. Citation BibTeX:

Read more

$-/run

863

Huggingface

Similar creators