## Model overview

`Mustango` is an exciting addition to the world of Multimodal Large Language Models designed for controlled music generation. Developed by the declare-lab team, `Mustango` leverages Latent Diffusion Model (LDM), Flan-T5, and musical features to generate music from text prompts. It builds upon the work of similar models like [MusicGen](https://aimodels.fyi/models/replicate/musicgen-meta) and [MusicGen Remixer](https://aimodels.fyi/models/replicate/musicgen-remixer-sakemin), but with a focus on more fine-grained control and improved overall music quality.

## Model inputs and outputs

`Mustango` takes in a text prompt describing the desired music and generates an audio file in response. The model can be used to create a wide range of musical styles, from ambient to pop, by crafting the right prompts.

### Inputs
- **Prompt**: A text description of the desired music, including details about the instrumentation, genre, tempo, and mood.

### Outputs
- **Audio file**: A generated audio file containing the music based on the input prompt.

## Capabilities

`Mustango` demonstrates impressive capabilities in generating music that closely matches the provided text prompt. The model is able to capture details like instrumentation, rhythm, and mood, and translate them into coherent musical compositions. Compared to earlier text-to-music models, `Mustango` shows significant improvements in terms of overall musical quality and coherence.

## What can I use it for?

`Mustango` opens up a world of possibilities for content creators, musicians, and hobbyists alike. The model can be used to generate custom background music for videos, podcasts, or video games. Composers could leverage `Mustango` to quickly prototype musical ideas or explore new creative directions. Advertisers and marketers may find the model useful for generating jingles or soundtracks for their campaigns.

## Things to try

One interesting aspect of `Mustango` is its ability to generate music in a variety of styles based on the input prompt. Try experimenting with different genres, moods, and levels of detail in your prompts to see the diverse range of musical compositions the model can produce. Additionally, the team has released several pre-trained models, including a [Mustango Pretrained](https://huggingface.co/declare-lab/mustango-pretrained) version, which may be worth exploring for specific use cases.

[](#--flan-alpaca-instruction-tuning-from-humans-and-machines)  Flan-Alpaca: Instruction Tuning from Humans and Machines
--------------------------------------------------------------------------------------------------------------------------------

 Introducing **Red-Eval** to evaluate the safety of the LLMs using several jailbreaking prompts. With **Red-Eval** one could jailbreak/red-team GPT-4 with a 65.1% attack success rate and ChatGPT could be jailbroken 73% of the time as measured on DangerousQA and HarmfulQA benchmarks. More details are here: [Code](https://github.com/declare-lab/red-instruct) and [Paper](https://arxiv.org/abs/2308.09662).

 We developed Flacuna by fine-tuning Vicuna-13B on the Flan collection. Flacuna is better than Vicuna at problem-solving. Access the model here [https://huggingface.co/declare-lab/flacuna-13b-v1.0](https://huggingface.co/declare-lab/flacuna-13b-v1.0).

 Curious to know the performance of   **Flan-Alpaca** on large-scale LLM evaluation benchmark, **InstructEval**? Read our paper [https://arxiv.org/pdf/2306.04757.pdf](https://arxiv.org/pdf/2306.04757.pdf). We evaluated more than 10 open-source instruction-tuned LLMs belonging to various LLM families including Pythia, LLaMA, T5, UL2, OPT, and Mosaic. Codes and datasets: [https://github.com/declare-lab/instruct-eval](https://github.com/declare-lab/instruct-eval)

 **FLAN-T5** is also useful in text-to-audio generation. Find our work at [https://github.com/declare-lab/tango](https://github.com/declare-lab/tango) if you are interested.

Our [repository](https://github.com/declare-lab/flan-alpaca) contains code for extending the [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) synthetic instruction tuning to existing instruction-tuned models such as [Flan-T5](https://arxiv.org/abs/2210.11416). We have a [live interactive demo](https://huggingface.co/spaces/joaogante/transformers_streaming) thanks to [Joao Gante](https://huggingface.co/joaogante)! We are also benchmarking many instruction-tuned models at [declare-lab/flan-eval](https://github.com/declare-lab/flan-eval). Our pretrained models are fully available on HuggingFace  :

Model

Parameters

Instruction Data

Training GPUs

[Flan-Alpaca-Base](https://huggingface.co/declare-lab/flan-alpaca-base)

220M

[Flan](https://github.com/google-research/FLAN), [Alpaca](https://github.com/tatsu-lab/stanford_alpaca)

1x A6000

[Flan-Alpaca-Large](https://huggingface.co/declare-lab/flan-alpaca-large)

770M

[Flan](https://github.com/google-research/FLAN), [Alpaca](https://github.com/tatsu-lab/stanford_alpaca)

1x A6000

[Flan-Alpaca-XL](https://huggingface.co/declare-lab/flan-alpaca-xl)

3B

[Flan](https://github.com/google-research/FLAN), [Alpaca](https://github.com/tatsu-lab/stanford_alpaca)

1x A6000

[Flan-Alpaca-XXL](https://huggingface.co/declare-lab/flan-alpaca-xxl)

11B

[Flan](https://github.com/google-research/FLAN), [Alpaca](https://github.com/tatsu-lab/stanford_alpaca)

4x A6000 (FSDP)

[Flan-GPT4All-XL](https://huggingface.co/declare-lab/flan-gpt4all-xl)

3B

[Flan](https://github.com/google-research/FLAN), [GPT4All](https://github.com/nomic-ai/gpt4all)

1x A6000

[Flan-ShareGPT-XL](https://huggingface.co/declare-lab/flan-sharegpt-xl)

3B

[Flan](https://github.com/google-research/FLAN), [ShareGPT](https://github.com/domeccleston/sharegpt)/[Vicuna](https://github.com/lm-sys/FastChat)

1x A6000

[Flan-Alpaca-GPT4-XL\*](https://huggingface.co/declare-lab/flan-alpaca-gpt4-xl)

3B

[Flan](https://github.com/google-research/FLAN), [GPT4-Alpaca](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM)

1x A6000

\*recommended for better performance

### [](#why)Why?

[Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html) represents an exciting new direction to approximate the performance of large language models (LLMs) like ChatGPT cheaply and easily. Concretely, they leverage an LLM such as GPT-3 to generate instructions as synthetic training data. The synthetic data which covers more than 50k tasks can then be used to finetune a smaller model. However, the original implementation is less accessible due to licensing constraints of the underlying [LLaMA](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) model. Furthermore, users have noted [potential noise](https://github.com/tloen/alpaca-lora/issues/65) in the synthetic dataset. Hence, it may be better to explore a fully accessible model that is already trained on high-quality (but less diverse) instructions such as [Flan-T5](https://arxiv.org/abs/2210.11416).

### [](#usage)Usage

    from transformers import pipeline
    
    prompt = "Write an email about an alpaca that likes flan"
    model = pipeline(model="declare-lab/flan-alpaca-gpt4-xl")
    model(prompt, max_length=128, do_sample=True)
    
    # Dear AlpacaFriend,
    # My name is Alpaca and I'm 10 years old.
    # I'm excited to announce that I'm a big fan of flan!
    # We like to eat it as a snack and I believe that it can help with our overall growth.
    # I'd love to hear your feedback on this idea. 
    # Have a great day! 
    # Best, AL Paca

## Model overview

`flan-alpaca-xl` is a large language model developed by the declare-lab team. It is an instruction-tuned model based on combining the [Flan](https://github.com/google-research/FLAN) and [Alpaca](https://github.com/tatsu-lab/stanford_alpaca) datasets. The model was fine-tuned on a 3 billion parameter base model using a single NVIDIA A6000 GPU. 

Similar instruction-tuned models like [flan-t5-xl](https://aimodels.fyi/models/huggingFace/flan-t5-xl-google) and [flan-ul2](https://aimodels.fyi/models/huggingFace/flan-ul2-google) have shown strong performance on a variety of benchmarks, including reasoning and question answering tasks. The declare-lab team has also evaluated the safety of these types of models using the [Red-Eval](https://github.com/declare-lab/red-instruct) framework, finding that GPT-4 and ChatGPT can be "jailbroken" with concerning frequency.

## Model inputs and outputs

### Inputs
- **Text**: The model accepts natural language text as input, which can include instructions, questions, or other prompts for the model to respond to.

### Outputs
- **Text**: The model generates natural language text in response to the input. This can include answers to questions, completions of instructions, or other relevant text.

## Capabilities

The `flan-alpaca-xl` model has been shown to excel at a variety of language tasks, including problem-solving, reasoning, and question answering. The declare-lab team has also benchmarked the model on the large-scale [InstructEval](https://github.com/declare-lab/instruct-eval) benchmark, demonstrating strong performance compared to other open-source instruction-tuned models.

## What can I use it for?

The `flan-alpaca-xl` model could be useful for a wide range of natural language processing tasks, such as:

- Question answering: The model can be used to answer questions on a variety of topics by generating relevant and informative responses.
- Task completion: The model can be used to complete instructions or perform specific tasks, such as code generation, summarization, or translation.
- Conversational AI: The model's language understanding and generation capabilities could be leveraged to build more natural and engaging conversational AI systems.

However, as noted in the [declare-lab maintainer profile](https://aimodels.fyi/creators/huggingFace/declare-lab), these types of models should be used with caution and their safety and fairness should be carefully assessed before deployment in real-world applications.

## Things to try

One interesting aspect of the `flan-alpaca-xl` model is its ability to leverage instruction-tuning from both human and machine-generated data. This approach, exemplified by the [Flacuna](https://huggingface.co/declare-lab/flacuna-13b-v1.0) model, has shown promising results in improving the model's problem-solving capabilities compared to the original Vicuna model.

Researchers and developers interested in exploring the boundaries of language model safety and robustness may also find the [Red-Eval](https://github.com/declare-lab/red-instruct) framework and the declare-lab team's work on "jailbreaking" large language models to be a useful area of investigation.

Tango 2: Use text prompts to make sound effects

## Model overview

`Tango` is a latent diffusion model (LDM) for text-to-audio (TTA) generation, capable of generating realistic audios including human sounds, animal sounds, natural and artificial sounds, and sound effects from textual prompts. It uses the frozen instruction-tuned language model Flan-T5 as the text encoder and trains a UNet-based diffusion model for audio generation. Compared to current state-of-the-art TTA models, `Tango` performs comparably across both objective and subjective metrics, despite training on a dataset 63 times smaller. The [maintainer](https://aimodels.fyi/creators/replicate/declare-lab) has released the model, training, and inference code for the research community.

`Tango 2` is a follow-up to `Tango`, built upon the same foundation but with additional alignment training using Direct Preference Optimization (DPO) on the Audio-alpaca dataset, a pairwise text-to-audio preference dataset. This helps `Tango 2` generate higher-quality and more aligned audio outputs.

## Model inputs and outputs

### Inputs
- **Prompt**: A textual description of the desired audio to be generated.
- **Steps**: The number of steps to use for the diffusion-based audio generation process, with more steps typically producing higher-quality results at the cost of longer inference time.
- **Guidance**: The guidance scale, which controls the trade-off between sample quality and sample diversity during the audio generation process.

### Outputs
- **Audio**: The generated audio clip corresponding to the input prompt, in WAV format.

## Capabilities

`Tango` and `Tango 2` can generate a wide variety of realistic audio clips, including human sounds, animal sounds, natural and artificial sounds, and sound effects. For example, they can generate sounds of an audience cheering and clapping, rolling thunder with lightning strikes, or a car engine revving.

## What can I use it for?

The `Tango` and `Tango 2` models can be used for a variety of applications, such as:

- **Audio content creation**: Generating audio clips for videos, games, podcasts, and other multimedia projects.
- **Sound design**: Creating custom sound effects for various applications.
- **Music composition**: Generating musical elements or accompaniment for songwriting and composition.
- **Accessibility**: Generating audio descriptions for visually impaired users.

## Things to try

You can try generating various types of audio clips by providing different prompts to the `Tango` and `Tango 2` models, such as:

- Everyday sounds (e.g., a dog barking, water flowing, a car engine revving)
- Natural phenomena (e.g., thunderstorms, wind, rain)
- Musical instruments and soundscapes (e.g., a piano playing, a symphony orchestra)
- Human vocalizations (e.g., laughter, cheering, singing)
- Ambient and abstract sounds (e.g., a futuristic machine, alien landscapes)

Experiment with the number of steps and guidance scale to find the right balance between sample quality and generation time for your specific use case.