[](#model-card-for-flan-t5-base)Model Card for FLAN-T5 base
===========================================================

![drawing](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/flan2_architecture.jpg)

[](#table-of-contents)Table of Contents
=======================================

0.  [TL;DR](#TL;DR)
1.  [Model Details](#model-details)
2.  [Usage](#usage)
3.  [Uses](#uses)
4.  [Bias, Risks, and Limitations](#bias-risks-and-limitations)
5.  [Training Details](#training-details)
6.  [Evaluation](#evaluation)
7.  [Environmental Impact](#environmental-impact)
8.  [Citation](#citation)
9.  [Model Card Authors](#model-card-authors)

[](#tldr)TL;DR
==============

If you already know T5, FLAN-T5 is just better at everything. For the same number of parameters, these models have been fine-tuned on more than 1000 additional tasks covering also more languages. As mentioned in the first few lines of the abstract :

> Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints,1 which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models.

**Disclaimer**: Content from **this** model card has been written by the Hugging Face team, and parts of it were copy pasted from the [T5 model card](https://huggingface.co/t5-large).

[](#model-details)Model Details
===============================

[](#model-description)Model Description
---------------------------------------

*   **Model type:** Language model
*   **Language(s) (NLP):** English, Spanish, Japanese, Persian, Hindi, French, Chinese, Bengali, Gujarati, German, Telugu, Italian, Arabic, Polish, Tamil, Marathi, Malayalam, Oriya, Panjabi, Portuguese, Urdu, Galician, Hebrew, Korean, Catalan, Thai, Dutch, Indonesian, Vietnamese, Bulgarian, Filipino, Central Khmer, Lao, Turkish, Russian, Croatian, Swedish, Yoruba, Kurdish, Burmese, Malay, Czech, Finnish, Somali, Tagalog, Swahili, Sinhala, Kannada, Zhuang, Igbo, Xhosa, Romanian, Haitian, Estonian, Slovak, Lithuanian, Greek, Nepali, Assamese, Norwegian
*   **License:** Apache 2.0
*   **Related Models:** [All FLAN-T5 Checkpoints](https://huggingface.co/models?search=flan-t5)
*   **Original Checkpoints:** [All Original FLAN-T5 Checkpoints](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints)
*   **Resources for more information:**
    *   [Research paper](https://arxiv.org/pdf/2210.11416.pdf)
    *   [GitHub Repo](https://github.com/google-research/t5x)
    *   [Hugging Face FLAN-T5 Docs (Similar to T5)](https://huggingface.co/docs/transformers/model_doc/t5)

[](#usage)Usage
===============

Find below some example scripts on how to use the model in `transformers`:

[](#using-the-pytorch-model)Using the Pytorch model
---------------------------------------------------

### [](#running-the-model-on-a-cpu)Running the model on a CPU

Click to expand

    
    from transformers import T5Tokenizer, T5ForConditionalGeneration
    
    tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-base")
    model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base")
    
    input_text = "translate English to German: How old are you?"
    input_ids = tokenizer(input_text, return_tensors="pt").input_ids
    
    outputs = model.generate(input_ids)
    print(tokenizer.decode(outputs[0]))

### [](#running-the-model-on-a-gpu)Running the model on a GPU

Click to expand

    # pip install accelerate
    from transformers import T5Tokenizer, T5ForConditionalGeneration
    
    tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-base")
    model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base", device_map="auto")
    
    input_text = "translate English to German: How old are you?"
    input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
    
    outputs = model.generate(input_ids)
    print(tokenizer.decode(outputs[0]))

### [](#running-the-model-on-a-gpu-using-different-precisions)Running the model on a GPU using different precisions

#### [](#fp16)FP16

Click to expand

    # pip install accelerate
    import torch
    from transformers import T5Tokenizer, T5ForConditionalGeneration
    
    tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-base")
    model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base", device_map="auto", torch_dtype=torch.float16)
    
    input_text = "translate English to German: How old are you?"
    input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
    
    outputs = model.generate(input_ids)
    print(tokenizer.decode(outputs[0]))

#### [](#int8)INT8

Click to expand

    # pip install bitsandbytes accelerate
    from transformers import T5Tokenizer, T5ForConditionalGeneration
    
    tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-base")
    model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base", device_map="auto", load_in_8bit=True)
    
    input_text = "translate English to German: How old are you?"
    input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
    
    outputs = model.generate(input_ids)
    print(tokenizer.decode(outputs[0]))

[](#uses)Uses
=============

[](#direct-use-and-downstream-use)Direct Use and Downstream Use
---------------------------------------------------------------

The authors write in [the original paper's model card](https://arxiv.org/pdf/2210.11416.pdf) that:

> The primary use is research on language models, including: research on zero-shot NLP tasks and in-context few-shot learning NLP tasks, such as reasoning, and question answering; advancing fairness and safety research, and understanding limitations of current large language models

See the [research paper](https://arxiv.org/pdf/2210.11416.pdf) for further details.

[](#out-of-scope-use)Out-of-Scope Use
-------------------------------------

More information needed.

[](#bias-risks-and-limitations)Bias, Risks, and Limitations
===========================================================

The information below in this section are copied from the model's [official model card](https://arxiv.org/pdf/2210.11416.pdf):

> Language models, including Flan-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). Flan-T5 should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application.

[](#ethical-considerations-and-risks)Ethical considerations and risks
---------------------------------------------------------------------

> Flan-T5 is fine-tuned on a large corpus of text data that was not filtered for explicit content or assessed for existing biases. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data.

[](#known-limitations)Known Limitations
---------------------------------------

> Flan-T5 has not been tested in real world applications.

[](#sensitive-use)Sensitive Use:
--------------------------------

> Flan-T5 should not be applied for any unacceptable use cases, e.g., generation of abusive speech.

[](#training-details)Training Details
=====================================

[](#training-data)Training Data
-------------------------------

The model was trained on a mixture of tasks, that includes the tasks described in the table below (from the original paper, figure 2):

[![table.png](https://s3.amazonaws.com/moonup/production/uploads/1666363265279-62441d1d9fdefb55a0b7d12c.png)](https://s3.amazonaws.com/moonup/production/uploads/1666363265279-62441d1d9fdefb55a0b7d12c.png)

[](#training-procedure)Training Procedure
-----------------------------------------

According to the model card from the [original paper](https://arxiv.org/pdf/2210.11416.pdf):

> These models are based on pretrained T5 (Raffel et al., 2020) and fine-tuned with instructions for better zero-shot and few-shot performance. There is one fine-tuned Flan model per T5 model size.

The model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).

[](#evaluation)Evaluation
=========================

[](#testing-data-factors--metrics)Testing Data, Factors & Metrics
-----------------------------------------------------------------

The authors evaluated the model on various tasks covering several languages (1836 in total). See the table below for some quantitative evaluation: [![image.png](https://s3.amazonaws.com/moonup/production/uploads/1668072995230-62441d1d9fdefb55a0b7d12c.png)](https://s3.amazonaws.com/moonup/production/uploads/1668072995230-62441d1d9fdefb55a0b7d12c.png) For full details, please check the [research paper](https://arxiv.org/pdf/2210.11416.pdf).

[](#results)Results
-------------------

For full results for FLAN-T5-Base, see the [research paper](https://arxiv.org/pdf/2210.11416.pdf), Table 3.

[](#environmental-impact)Environmental Impact
=============================================

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

*   **Hardware Type:** Google Cloud TPU Pods - TPU v3 or TPU v4 | Number of chips  4.
*   **Hours used:** More information needed
*   **Cloud Provider:** GCP
*   **Compute Region:** More information needed
*   **Carbon Emitted:** More information needed

[](#citation)Citation
=====================

**BibTeX:**

    @misc{https://doi.org/10.48550/arxiv.2210.11416,
      doi = {10.48550/ARXIV.2210.11416},
      
      url = {https://arxiv.org/abs/2210.11416},
      
      author = {Chung, Hyung Won and Hou, Le and Longpre, Shayne and Zoph, Barret and Tay, Yi and Fedus, William and Li, Eric and Wang, Xuezhi and Dehghani, Mostafa and Brahma, Siddhartha and Webson, Albert and Gu, Shixiang Shane and Dai, Zhuyun and Suzgun, Mirac and Chen, Xinyun and Chowdhery, Aakanksha and Narang, Sharan and Mishra, Gaurav and Yu, Adams and Zhao, Vincent and Huang, Yanping and Dai, Andrew and Yu, Hongkun and Petrov, Slav and Chi, Ed H. and Dean, Jeff and Devlin, Jacob and Roberts, Adam and Zhou, Denny and Le, Quoc V. and Wei, Jason},
      
      keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
      
      title = {Scaling Instruction-Finetuned Language Models},
      
      publisher = {arXiv},
      
      year = {2022},
      
      copyright = {Creative Commons Attribution 4.0 International}
    }
    

[](#model-recycling)Model Recycling
-----------------------------------

[Evaluation on 36 datasets](https://ibm.github.io/model-recycling/model_gain_chart?avg=9.16&mnli_lp=nan&20_newsgroup=3.34&ag_news=1.49&amazon_reviews_multi=0.21&anli=13.91&boolq=16.75&cb=23.12&cola=9.97&copa=34.50&dbpedia=6.90&esnli=5.37&financial_phrasebank=18.66&imdb=0.33&isear=1.37&mnli=11.74&mrpc=16.63&multirc=6.24&poem_sentiment=14.62&qnli=3.41&qqp=6.18&rotten_tomatoes=2.98&rte=24.26&sst2=0.67&sst_5bins=5.44&stsb=20.68&trec_coarse=3.95&trec_fine=10.73&tweet_ev_emoji=13.39&tweet_ev_emotion=4.62&tweet_ev_hate=3.46&tweet_ev_irony=9.04&tweet_ev_offensive=1.69&tweet_ev_sentiment=0.75&wic=14.22&wnli=9.44&wsc=5.53&yahoo_answers=4.14&model_name=google%2Fflan-t5-base&base_name=google%2Ft5-v1_1-base) using google/flan-t5-base as a base model yields average score of 77.98 in comparison to 68.82 by google/t5-v1\_1-base.

The model is ranked 1st among all tested models for the google/t5-v1\_1-base architecture as of 06/02/2023 Results:

20\_newsgroup

ag\_news

amazon\_reviews\_multi

anli

boolq

cb

cola

copa

dbpedia

esnli

financial\_phrasebank

imdb

isear

mnli

mrpc

multirc

poem\_sentiment

qnli

qqp

rotten\_tomatoes

rte

sst2

sst\_5bins

stsb

trec\_coarse

trec\_fine

tweet\_ev\_emoji

tweet\_ev\_emotion

tweet\_ev\_hate

tweet\_ev\_irony

tweet\_ev\_offensive

tweet\_ev\_sentiment

wic

wnli

wsc

yahoo\_answers

86.2188

89.6667

67.12

51.9688

82.3242

78.5714

80.1534

75

77.6667

90.9507

85.4

93.324

72.425

87.2457

89.4608

62.3762

82.6923

92.7878

89.7724

89.0244

84.8375

94.3807

57.2851

89.4759

97.2

92.8

46.848

80.2252

54.9832

76.6582

84.3023

70.6366

70.0627

56.338

53.8462

73.4

For more information, see: [Model Recycling](https://ibm.github.io/model-recycling/)

## Model overview

`flan-t5-base` is a language model developed by Google that is part of the FLAN-T5 family. It is an improved version of the original T5 model, with additional fine-tuning on over 1,000 tasks covering a variety of languages. Compared to the original T5 model, FLAN-T5 models like `flan-t5-base` are better at a wide range of tasks, including question answering, reasoning, and few-shot learning. The model is available in a range of sizes, from the base `flan-t5-base` to the much larger `flan-t5-xxl`. 

Similar FLAN-T5 models include [flan-t5-xxl](https://aimodels.fyi/models/huggingFace/flan-t5-xxl-google), which is a larger version of the model with better performance on some benchmarks. The Falcon series of models from TII, like [Falcon-40B](https://aimodels.fyi/models/huggingFace/falcon-40b-tiiuae) and [Falcon-180B](https://aimodels.fyi/models/huggingFace/falcon-180b-tiiuae), are also strong open-source language models that can be used for similar tasks.

## Model inputs and outputs

### Inputs
- **Text**: The `flan-t5-base` model takes text input, which can be in the form of a single sentence, a paragraph, or even longer documents.

### Outputs
- **Text**: The model generates text output, which can be used for a variety of tasks such as translation, summarization, question answering, and more.

## Capabilities

The `flan-t5-base` model is a powerful text-to-text transformer that can be used for a wide range of natural language processing tasks. It has shown strong performance on benchmarks like MMLU, HellaSwag, PIQA, and others, often outperforming even much larger language models. The model's versatility and few-shot learning capabilities make it a valuable tool for researchers and developers working on a variety of NLP applications.

## What can I use it for?

The `flan-t5-base` model can be used for a variety of natural language processing tasks, including:

- **Content Creation and Communication**: The model can be used to generate creative text, power chatbots and virtual assistants, and produce text summaries.
- **Research and Education**: Researchers can use the model as a foundation for experimenting with NLP techniques, developing new algorithms, and contributing to the advancement of the field. Educators can also leverage the model to create interactive language learning experiences.

## Things to try

One interesting aspect of the `flan-t5-base` model is its strong few-shot learning capabilities. This means that the model can often perform well on new tasks with just a few examples, without requiring extensive fine-tuning. Developers and researchers can experiment with prompting the model with different task descriptions and a small number of examples to see how it performs on a variety of downstream applications.

Another area to explore is the model's multilingual capabilities. The `flan-t5-base` model is trained on over 100 languages, which opens up opportunities to use it for cross-lingual tasks like machine translation, multilingual question answering, and more.