[](#model-card-for-t5-base)Model Card for T5 Base
=================================================

[![model image](https://camo.githubusercontent.com/623b4dea0b653f2ad3f36c71ebfe749a677ac0a1/68747470733a2f2f6d69726f2e6d656469756d2e636f6d2f6d61782f343030362f312a44304a31674e51663876727255704b657944387750412e706e67)](https://camo.githubusercontent.com/623b4dea0b653f2ad3f36c71ebfe749a677ac0a1/68747470733a2f2f6d69726f2e6d656469756d2e636f6d2f6d61782f343030362f312a44304a31674e51663876727255704b657944387750412e706e67)

[](#table-of-contents)Table of Contents
=======================================

1.  [Model Details](#model-details)
2.  [Uses](#uses)
3.  [Bias, Risks, and Limitations](#bias-risks-and-limitations)
4.  [Training Details](#training-details)
5.  [Evaluation](#evaluation)
6.  [Environmental Impact](#environmental-impact)
7.  [Citation](#citation)
8.  [Model Card Authors](#model-card-authors)
9.  [How To Get Started With the Model](#how-to-get-started-with-the-model)

[](#model-details)Model Details
===============================

[](#model-description)Model Description
---------------------------------------

The developers of the Text-To-Text Transfer Transformer (T5) [write](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html):

> With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to BERT-style models that can only output either a class label or a span of the input. Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task.

T5-Base is the checkpoint with 220 million parameters.

*   **Developed by:** Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. See [associated paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) and [GitHub repo](https://github.com/google-research/text-to-text-transfer-transformer#released-model-checkpoints)
*   **Model type:** Language model
*   **Language(s) (NLP):** English, French, Romanian, German
*   **License:** Apache 2.0
*   **Related Models:** [All T5 Checkpoints](https://huggingface.co/models?search=t5)
*   **Resources for more information:**
    *   [Research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf)
    *   [Google's T5 Blog Post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html)
    *   [GitHub Repo](https://github.com/google-research/text-to-text-transfer-transformer)
    *   [Hugging Face T5 Docs](https://huggingface.co/docs/transformers/model_doc/t5)

[](#uses)Uses
=============

[](#direct-use-and-downstream-use)Direct Use and Downstream Use
---------------------------------------------------------------

The developers write in a [blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) that the model:

> Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task, including machine translation, document summarization, question answering, and classification tasks (e.g., sentiment analysis). We can even apply T5 to regression tasks by training it to predict the string representation of a number instead of the number itself.

See the [blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) and [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for further details.

[](#out-of-scope-use)Out-of-Scope Use
-------------------------------------

More information needed.

[](#bias-risks-and-limitations)Bias, Risks, and Limitations
===========================================================

More information needed.

[](#recommendations)Recommendations
-----------------------------------

More information needed.

[](#training-details)Training Details
=====================================

[](#training-data)Training Data
-------------------------------

The model is pre-trained on the [Colossal Clean Crawled Corpus (C4)](https://www.tensorflow.org/datasets/catalog/c4), which was developed and released in the context of the same [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) as T5.

The model was pre-trained on a on a **multi-task mixture of unsupervised (1.) and supervised tasks (2.)**. Thereby, the following datasets were being used for (1.) and (2.):

1.  **Datasets used for Unsupervised denoising objective**:

*   [C4](https://huggingface.co/datasets/c4)
*   [Wiki-DPR](https://huggingface.co/datasets/wiki_dpr)

2.  **Datasets used for Supervised text-to-text language modeling objective**

*   Sentence acceptability judgment
    *   CoLA [Warstadt et al., 2018](https://arxiv.org/abs/1805.12471)
*   Sentiment analysis
    *   SST-2 [Socher et al., 2013](https://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf)
*   Paraphrasing/sentence similarity
    *   MRPC [Dolan and Brockett, 2005](https://aclanthology.org/I05-5002)
    *   STS-B [Ceret al., 2017](https://arxiv.org/abs/1708.00055)
    *   QQP [Iyer et al., 2017](https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs)
*   Natural language inference
    *   MNLI [Williams et al., 2017](https://arxiv.org/abs/1704.05426)
    *   QNLI [Rajpurkar et al.,2016](https://arxiv.org/abs/1606.05250)
    *   RTE [Dagan et al., 2005](https://link.springer.com/chapter/10.1007/11736790_9)
    *   CB [De Marneff et al., 2019](https://semanticsarchive.net/Archive/Tg3ZGI2M/Marneffe.pdf)
*   Sentence completion
    *   COPA [Roemmele et al., 2011](https://www.researchgate.net/publication/221251392_Choice_of_Plausible_Alternatives_An_Evaluation_of_Commonsense_Causal_Reasoning)
*   Word sense disambiguation
    *   WIC [Pilehvar and Camacho-Collados, 2018](https://arxiv.org/abs/1808.09121)
*   Question answering
    *   MultiRC [Khashabi et al., 2018](https://aclanthology.org/N18-1023)
    *   ReCoRD [Zhang et al., 2018](https://arxiv.org/abs/1810.12885)
    *   BoolQ [Clark et al., 2019](https://arxiv.org/abs/1905.10044)

[](#training-procedure)Training Procedure
-----------------------------------------

In their [abstract](https://jmlr.org/papers/volume21/20-074/20-074.pdf), the model developers write:

> In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks.

The framework introduced, the T5 framework, involves a training procedure that brings together the approaches studied in the paper. See the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for further details.

[](#evaluation)Evaluation
=========================

[](#testing-data-factors--metrics)Testing Data, Factors & Metrics
-----------------------------------------------------------------

The developers evaluated the model on 24 tasks, see the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for full details.

[](#results)Results
-------------------

For full results for T5-Base, see the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf), Table 14.

[](#environmental-impact)Environmental Impact
=============================================

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

*   **Hardware Type:** Google Cloud TPU Pods
*   **Hours used:** More information needed
*   **Cloud Provider:** GCP
*   **Compute Region:** More information needed
*   **Carbon Emitted:** More information needed

[](#citation)Citation
=====================

**BibTeX:**

    @article{2020t5,
      author  = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},
      title   = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
      journal = {Journal of Machine Learning Research},
      year    = {2020},
      volume  = {21},
      number  = {140},
      pages   = {1-67},
      url     = {http://jmlr.org/papers/v21/20-074.html}
    }
    

**APA:**

*   Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140), 1-67.

[](#model-card-authors)Model Card Authors
=========================================

This model card was written by the team at Hugging Face.

[](#how-to-get-started-with-the-model)How to Get Started with the Model
=======================================================================

Use the code below to get started with the model.

Click to expand

    from transformers import T5Tokenizer, T5Model
    
    tokenizer = T5Tokenizer.from_pretrained("t5-base")
    model = T5Model.from_pretrained("t5-base")
    
    input_ids = tokenizer(
        "Studies have been shown that owning a dog is good for you", return_tensors="pt"
    ).input_ids  # Batch size 1
    decoder_input_ids = tokenizer("Studies show that", return_tensors="pt").input_ids  # Batch size 1
    
    # forward pass
    outputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids)
    last_hidden_states = outputs.last_hidden_state
    

See the [Hugging Face T5](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5Model) docs and a [Colab Notebook](https://colab.research.google.com/github/google-research/text-to-text-transfer-transformer/blob/main/notebooks/t5-trivia.ipynb) created by the model developers for more examples.

## Model overview

The `t5-base` model is a language model developed by Google as part of the Text-To-Text Transfer Transformer (T5) series. It is a large transformer-based model with 220 million parameters, trained on a diverse set of natural language processing tasks in a unified text-to-text format. The T5 framework allows the same model, loss function, and hyperparameters to be used for a variety of NLP tasks. Similar models in the T5 series include [FLAN-T5-base](https://aimodels.fyi/models/huggingFace/flan-t5-base-google) and [FLAN-T5-XXL](https://aimodels.fyi/models/huggingFace/flan-t5-xxl-google), which build upon the original T5 model by further fine-tuning on a large number of instructional tasks.

## Model inputs and outputs

### Inputs
- **Text strings**: The `t5-base` model takes text strings as input, which can be in the form of a single sentence, a paragraph, or a sequence of sentences.

### Outputs
- **Text strings**: The model generates text strings as output, which can be used for a variety of natural language processing tasks such as translation, summarization, question answering, and more.

## Capabilities

The `t5-base` model is a powerful language model that can be applied to a wide range of NLP tasks. It has been shown to perform well on tasks like language translation, text summarization, and question answering. The model's ability to handle text-to-text transformations in a unified framework makes it a versatile tool for researchers and practitioners working on various natural language processing problems.

## What can I use it for?

The `t5-base` model can be used for a variety of natural language processing tasks, including:

- **Text Generation**: The model can be used to generate human-like text, such as creative writing, story continuation, or dialogue.
- **Text Summarization**: The model can be used to summarize long-form text, such as articles or reports, into concise and informative summaries.
- **Translation**: The model can be used to translate text from one language to another, such as English to French or German.
- **Question Answering**: The model can be used to answer questions based on provided text, making it useful for building intelligent question-answering systems.

## Things to try

One interesting aspect of the `t5-base` model is its ability to handle a diverse range of NLP tasks using a single unified framework. This means that you can fine-tune the model on a specific task, such as language translation or text summarization, and then use the fine-tuned model to perform that task on new data. Additionally, the model's text-to-text format allows for creative experimentation, where you can try combining different tasks or prompting the model in novel ways to see how it responds.

[](#model-card-for-t5-small)Model Card for T5 Small
===================================================

[![model image](https://camo.githubusercontent.com/623b4dea0b653f2ad3f36c71ebfe749a677ac0a1/68747470733a2f2f6d69726f2e6d656469756d2e636f6d2f6d61782f343030362f312a44304a31674e51663876727255704b657944387750412e706e67)](https://camo.githubusercontent.com/623b4dea0b653f2ad3f36c71ebfe749a677ac0a1/68747470733a2f2f6d69726f2e6d656469756d2e636f6d2f6d61782f343030362f312a44304a31674e51663876727255704b657944387750412e706e67)

[](#table-of-contents)Table of Contents
=======================================

1.  [Model Details](#model-details)
2.  [Uses](#uses)
3.  [Bias, Risks, and Limitations](#bias-risks-and-limitations)
4.  [Training Details](#training-details)
5.  [Evaluation](#evaluation)
6.  [Environmental Impact](#environmental-impact)
7.  [Citation](#citation)
8.  [Model Card Authors](#model-card-authors)
9.  [How To Get Started With the Model](#how-to-get-started-with-the-model)

[](#model-details)Model Details
===============================

[](#model-description)Model Description
---------------------------------------

The developers of the Text-To-Text Transfer Transformer (T5) [write](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html):

> With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to BERT-style models that can only output either a class label or a span of the input. Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task.

T5-Small is the checkpoint with 60 million parameters.

*   **Developed by:** Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. See [associated paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) and [GitHub repo](https://github.com/google-research/text-to-text-transfer-transformer#released-model-checkpoints)
*   **Model type:** Language model
*   **Language(s) (NLP):** English, French, Romanian, German
*   **License:** Apache 2.0
*   **Related Models:** [All T5 Checkpoints](https://huggingface.co/models?search=t5)
*   **Resources for more information:**
    *   [Research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf)
    *   [Google's T5 Blog Post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html)
    *   [GitHub Repo](https://github.com/google-research/text-to-text-transfer-transformer)
    *   [Hugging Face T5 Docs](https://huggingface.co/docs/transformers/model_doc/t5)

[](#uses)Uses
=============

[](#direct-use-and-downstream-use)Direct Use and Downstream Use
---------------------------------------------------------------

The developers write in a [blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) that the model:

> Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task, including machine translation, document summarization, question answering, and classification tasks (e.g., sentiment analysis). We can even apply T5 to regression tasks by training it to predict the string representation of a number instead of the number itself.

See the [blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) and [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for further details.

[](#out-of-scope-use)Out-of-Scope Use
-------------------------------------

More information needed.

[](#bias-risks-and-limitations)Bias, Risks, and Limitations
===========================================================

More information needed.

[](#recommendations)Recommendations
-----------------------------------

More information needed.

[](#training-details)Training Details
=====================================

[](#training-data)Training Data
-------------------------------

The model is pre-trained on the [Colossal Clean Crawled Corpus (C4)](https://www.tensorflow.org/datasets/catalog/c4), which was developed and released in the context of the same [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) as T5.

The model was pre-trained on a on a **multi-task mixture of unsupervised (1.) and supervised tasks (2.)**. Thereby, the following datasets were being used for (1.) and (2.):

1.  **Datasets used for Unsupervised denoising objective**:

*   [C4](https://huggingface.co/datasets/c4)
*   [Wiki-DPR](https://huggingface.co/datasets/wiki_dpr)

2.  **Datasets used for Supervised text-to-text language modeling objective**

*   Sentence acceptability judgment
    *   CoLA [Warstadt et al., 2018](https://arxiv.org/abs/1805.12471)
*   Sentiment analysis
    *   SST-2 [Socher et al., 2013](https://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf)
*   Paraphrasing/sentence similarity
    *   MRPC [Dolan and Brockett, 2005](https://aclanthology.org/I05-5002)
    *   STS-B [Ceret al., 2017](https://arxiv.org/abs/1708.00055)
    *   QQP [Iyer et al., 2017](https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs)
*   Natural language inference
    *   MNLI [Williams et al., 2017](https://arxiv.org/abs/1704.05426)
    *   QNLI [Rajpurkar et al.,2016](https://arxiv.org/abs/1606.05250)
    *   RTE [Dagan et al., 2005](https://link.springer.com/chapter/10.1007/11736790_9)
    *   CB [De Marneff et al., 2019](https://semanticsarchive.net/Archive/Tg3ZGI2M/Marneffe.pdf)
*   Sentence completion
    *   COPA [Roemmele et al., 2011](https://www.researchgate.net/publication/221251392_Choice_of_Plausible_Alternatives_An_Evaluation_of_Commonsense_Causal_Reasoning)
*   Word sense disambiguation
    *   WIC [Pilehvar and Camacho-Collados, 2018](https://arxiv.org/abs/1808.09121)
*   Question answering
    *   MultiRC [Khashabi et al., 2018](https://aclanthology.org/N18-1023)
    *   ReCoRD [Zhang et al., 2018](https://arxiv.org/abs/1810.12885)
    *   BoolQ [Clark et al., 2019](https://arxiv.org/abs/1905.10044)

[](#training-procedure)Training Procedure
-----------------------------------------

In their [abstract](https://jmlr.org/papers/volume21/20-074/20-074.pdf), the model developers write:

> In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks.

The framework introduced, the T5 framework, involves a training procedure that brings together the approaches studied in the paper. See the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for further details.

[](#evaluation)Evaluation
=========================

[](#testing-data-factors--metrics)Testing Data, Factors & Metrics
-----------------------------------------------------------------

The developers evaluated the model on 24 tasks, see the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for full details.

[](#results)Results
-------------------

For full results for T5-small, see the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf), Table 14.

[](#environmental-impact)Environmental Impact
=============================================

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

*   **Hardware Type:** Google Cloud TPU Pods
*   **Hours used:** More information needed
*   **Cloud Provider:** GCP
*   **Compute Region:** More information needed
*   **Carbon Emitted:** More information needed

[](#citation)Citation
=====================

**BibTeX:**

    @article{2020t5,
      author  = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},
      title   = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
      journal = {Journal of Machine Learning Research},
      year    = {2020},
      volume  = {21},
      number  = {140},
      pages   = {1-67},
      url     = {http://jmlr.org/papers/v21/20-074.html}
    }
    

**APA:**

*   Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140), 1-67.

[](#model-card-authors)Model Card Authors
=========================================

This model card was written by the team at Hugging Face.

[](#how-to-get-started-with-the-model)How to Get Started with the Model
=======================================================================

Use the code below to get started with the model.

Click to expand

    from transformers import T5Tokenizer, T5Model
    
    tokenizer = T5Tokenizer.from_pretrained("t5-small")
    model = T5Model.from_pretrained("t5-small")
    
    input_ids = tokenizer(
        "Studies have been shown that owning a dog is good for you", return_tensors="pt"
    ).input_ids  # Batch size 1
    decoder_input_ids = tokenizer("Studies show that", return_tensors="pt").input_ids  # Batch size 1
    
    # forward pass
    outputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids)
    last_hidden_states = outputs.last_hidden_state
    

See the [Hugging Face T5](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5Model) docs and a [Colab Notebook](https://colab.research.google.com/github/google-research/text-to-text-transfer-transformer/blob/main/notebooks/t5-trivia.ipynb) created by the model developers for more examples.

## Model Overview

`t5-small` is a language model developed by the Google T5 team. It is part of the Text-To-Text Transfer Transformer (T5) family of models that aim to unify natural language processing tasks into a text-to-text format. The `t5-small` checkpoint has 60 million parameters and is capable of performing a variety of NLP tasks such as machine translation, document summarization, question answering, and sentiment analysis.

Similar models in the T5 family include `t5-large` with 770 million parameters and `t5-11b` with 11 billion parameters. These larger models generally achieve stronger performance but at the cost of increased computational and memory requirements. The recently released [FLAN-T5 models](https://aimodels.fyi/models/huggingFace/flan-t5-small-google) build on the original T5 framework with further fine-tuning on a large set of instructional tasks, leading to improved few-shot and zero-shot capabilities.

## Model Inputs and Outputs

### Inputs
- Text strings that can be formatted for various NLP tasks, such as:
  - Source text for translation
  - Questions for question answering
  - Passages of text for summarization

### Outputs
- Text strings that provide the model's response, such as:
  - Translated text
  - Answers to questions
  - Summaries of input passages

## Capabilities

The `t5-small` model is a capable language model that can be applied to a wide range of text-based NLP tasks. It has demonstrated strong performance on benchmarks covering areas like natural language inference, sentiment analysis, and question answering. While the larger T5 models generally achieve better results, the `t5-small` checkpoint provides a more efficient option with good capabilities.

## What Can I Use It For?

The versatility of the T5 framework makes `t5-small` useful for many NLP applications. Some potential use cases include:

- **Machine Translation**: Translate text between supported languages like English, French, German, and more.
- **Summarization**: Generate concise summaries of long-form text documents.
- **Question Answering**: Answer questions based on provided context.
- **Sentiment Analysis**: Classify the sentiment (positive, negative, neutral) of input text.
- **Text Generation**: Use the model for open-ended text generation, with prompts to guide the output.

## Things to Try

Some interesting things to explore with `t5-small` include:

- Evaluating its few-shot or zero-shot performance on new tasks by providing limited training data or just a task description.
- Analyzing the model's outputs to better understand its strengths, weaknesses, and potential biases.
- Experimenting with different prompting strategies to steer the model's behavior and output.
- Comparing the performance and efficiency tradeoffs between `t5-small` and the larger T5 or FLAN-T5 models.

Overall, `t5-small` is a flexible and capable language model that can be a useful tool in a wide range of natural language processing applications.

[](#model-card-for-t5-large)Model Card for T5 Large
===================================================

[![model image](https://camo.githubusercontent.com/623b4dea0b653f2ad3f36c71ebfe749a677ac0a1/68747470733a2f2f6d69726f2e6d656469756d2e636f6d2f6d61782f343030362f312a44304a31674e51663876727255704b657944387750412e706e67)](https://camo.githubusercontent.com/623b4dea0b653f2ad3f36c71ebfe749a677ac0a1/68747470733a2f2f6d69726f2e6d656469756d2e636f6d2f6d61782f343030362f312a44304a31674e51663876727255704b657944387750412e706e67)

[](#table-of-contents)Table of Contents
=======================================

1.  [Model Details](#model-details)
2.  [Uses](#uses)
3.  [Bias, Risks, and Limitations](#bias-risks-and-limitations)
4.  [Training Details](#training-details)
5.  [Evaluation](#evaluation)
6.  [Environmental Impact](#environmental-impact)
7.  [Citation](#citation)
8.  [Model Card Authors](#model-card-authors)
9.  [How To Get Started With the Model](#how-to-get-started-with-the-model)

[](#model-details)Model Details
===============================

[](#model-description)Model Description
---------------------------------------

The developers of the Text-To-Text Transfer Transformer (T5) [write](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html):

> With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to BERT-style models that can only output either a class label or a span of the input. Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task.

T5-Large is the checkpoint with 770 million parameters.

*   **Developed by:** Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. See [associated paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) and [GitHub repo](https://github.com/google-research/text-to-text-transfer-transformer#released-model-checkpoints)
*   **Model type:** Language model
*   **Language(s) (NLP):** English, French, Romanian, German
*   **License:** Apache 2.0
*   **Related Models:** [All T5 Checkpoints](https://huggingface.co/models?search=t5)
*   **Resources for more information:**
    *   [Research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf)
    *   [Google's T5 Blog Post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html)
    *   [GitHub Repo](https://github.com/google-research/text-to-text-transfer-transformer)
    *   [Hugging Face T5 Docs](https://huggingface.co/docs/transformers/model_doc/t5)

[](#uses)Uses
=============

[](#direct-use-and-downstream-use)Direct Use and Downstream Use
---------------------------------------------------------------

The developers write in a [blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) that the model:

> Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task, including machine translation, document summarization, question answering, and classification tasks (e.g., sentiment analysis). We can even apply T5 to regression tasks by training it to predict the string representation of a number instead of the number itself.

See the [blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) and [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for further details.

[](#out-of-scope-use)Out-of-Scope Use
-------------------------------------

More information needed.

[](#bias-risks-and-limitations)Bias, Risks, and Limitations
===========================================================

More information needed.

[](#recommendations)Recommendations
-----------------------------------

More information needed.

[](#training-details)Training Details
=====================================

[](#training-data)Training Data
-------------------------------

The model is pre-trained on the [Colossal Clean Crawled Corpus (C4)](https://www.tensorflow.org/datasets/catalog/c4), which was developed and released in the context of the same [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) as T5.

The model was pre-trained on a on a **multi-task mixture of unsupervised (1.) and supervised tasks (2.)**. Thereby, the following datasets were being used for (1.) and (2.):

1.  **Datasets used for Unsupervised denoising objective**:

*   [C4](https://huggingface.co/datasets/c4)
*   [Wiki-DPR](https://huggingface.co/datasets/wiki_dpr)

2.  **Datasets used for Supervised text-to-text language modeling objective**

*   Sentence acceptability judgment
    *   CoLA [Warstadt et al., 2018](https://arxiv.org/abs/1805.12471)
*   Sentiment analysis
    *   SST-2 [Socher et al., 2013](https://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf)
*   Paraphrasing/sentence similarity
    *   MRPC [Dolan and Brockett, 2005](https://aclanthology.org/I05-5002)
    *   STS-B [Ceret al., 2017](https://arxiv.org/abs/1708.00055)
    *   QQP [Iyer et al., 2017](https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs)
*   Natural language inference
    *   MNLI [Williams et al., 2017](https://arxiv.org/abs/1704.05426)
    *   QNLI [Rajpurkar et al.,2016](https://arxiv.org/abs/1606.05250)
    *   RTE [Dagan et al., 2005](https://link.springer.com/chapter/10.1007/11736790_9)
    *   CB [De Marneff et al., 2019](https://semanticsarchive.net/Archive/Tg3ZGI2M/Marneffe.pdf)
*   Sentence completion
    *   COPA [Roemmele et al., 2011](https://www.researchgate.net/publication/221251392_Choice_of_Plausible_Alternatives_An_Evaluation_of_Commonsense_Causal_Reasoning)
*   Word sense disambiguation
    *   WIC [Pilehvar and Camacho-Collados, 2018](https://arxiv.org/abs/1808.09121)
*   Question answering
    *   MultiRC [Khashabi et al., 2018](https://aclanthology.org/N18-1023)
    *   ReCoRD [Zhang et al., 2018](https://arxiv.org/abs/1810.12885)
    *   BoolQ [Clark et al., 2019](https://arxiv.org/abs/1905.10044)

[](#training-procedure)Training Procedure
-----------------------------------------

In their [abstract](https://jmlr.org/papers/volume21/20-074/20-074.pdf), the model developers write:

> In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks.

The framework introduced, the T5 framework, involves a training procedure that brings together the approaches studied in the paper. See the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for further details.

[](#evaluation)Evaluation
=========================

[](#testing-data-factors--metrics)Testing Data, Factors & Metrics
-----------------------------------------------------------------

The developers evaluated the model on 24 tasks, see the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for full details.

[](#results)Results
-------------------

For full results for T5-Large, see the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf), Table 14.

[](#environmental-impact)Environmental Impact
=============================================

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

*   **Hardware Type:** Google Cloud TPU Pods
*   **Hours used:** More information needed
*   **Cloud Provider:** GCP
*   **Compute Region:** More information needed
*   **Carbon Emitted:** More information needed

[](#citation)Citation
=====================

**BibTeX:**

    @article{2020t5,
      author  = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},
      title   = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
      journal = {Journal of Machine Learning Research},
      year    = {2020},
      volume  = {21},
      number  = {140},
      pages   = {1-67},
      url     = {http://jmlr.org/papers/v21/20-074.html}
    }
    

**APA:**

*   Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140), 1-67.

[](#model-card-authors)Model Card Authors
=========================================

This model card was written by the team at Hugging Face.

[](#how-to-get-started-with-the-model)How to Get Started with the Model
=======================================================================

Use the code below to get started with the model.

Click to expand

    from transformers import T5Tokenizer, T5Model
    
    tokenizer = T5Tokenizer.from_pretrained("t5-large")
    model = T5Model.from_pretrained("t5-large")
    
    input_ids = tokenizer(
        "Studies have been shown that owning a dog is good for you", return_tensors="pt"
    ).input_ids  # Batch size 1
    decoder_input_ids = tokenizer("Studies show that", return_tensors="pt").input_ids  # Batch size 1
    
    # forward pass
    outputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids)
    last_hidden_states = outputs.last_hidden_state
    

See the [Hugging Face T5](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5Model) docs and a [Colab Notebook](https://colab.research.google.com/github/google-research/text-to-text-transfer-transformer/blob/main/notebooks/t5-trivia.ipynb) created by the model developers for more examples.

## Model overview

The `t5-large` model is a large language model developed by the Google T5 team. It is part of the Text-to-Text Transfer Transformer (T5) series, which reframes NLP tasks into a unified text-to-text format. The [T5 model](https://aimodels.fyi/models/huggingFace/t5-base-google-t5) and its larger variant `t5-large` are trained on a massive corpus of text data and can be applied to a wide range of NLP tasks, from translation to summarization to question answering.

Compared to the smaller [T5-Base](https://aimodels.fyi/models/huggingFace/t5-base-google-t5) model, the `t5-large` has 770 million parameters, making it a more powerful and capable language model. It can handle tasks in multiple languages, including English, French, Romanian, and German.

## Model inputs and outputs

### Inputs
- **Text strings**: The `t5-large` model takes text as input, which can be a sentence, paragraph, or longer passage.

### Outputs
- **Text strings**: The model generates text as output, which can be a translation, summary, answer to a question, or completion of a given prompt.

## Capabilities

The `t5-large` model excels at a wide variety of NLP tasks due to its text-to-text format and large parameter size. It can be used for translation between supported languages, document summarization, question answering, text generation, and more. The model's capabilities make it a versatile tool for applications that require natural language processing.

## What can I use it for?

The `t5-large` model can be utilized in many real-world applications that involve text-based tasks. For example, it could be used to build a multilingual chatbot that can translate between languages, answer questions, and engage in open-ended conversations. It could also be leveraged to automatically summarize long documents or generate high-quality content for marketing and creative purposes.

Additionally, the model's text-to-text format allows it to be fine-tuned on specific datasets or tasks, unlocking even more potential use cases. Researchers and developers can explore using `t5-large` as a foundation for various NLP projects and applications.

## Things to try

One interesting aspect of the `t5-large` model is its ability to handle different NLP tasks using the same architecture and training process. This allows for efficient transfer learning, where the model can be fine-tuned on specific tasks without the need to train from scratch.

Developers could experiment with fine-tuning `t5-large` on domain-specific datasets, such as legal documents or scientific papers, to see how the model's performance and capabilities change. Additionally, exploring the model's few-shot and zero-shot learning abilities could yield interesting insights and applications, as the model may be able to adapt to new tasks with limited training data.

[](#model-card-for-t5-11b)Model Card for T5 11B
===============================================

[![model image](https://camo.githubusercontent.com/623b4dea0b653f2ad3f36c71ebfe749a677ac0a1/68747470733a2f2f6d69726f2e6d656469756d2e636f6d2f6d61782f343030362f312a44304a31674e51663876727255704b657944387750412e706e67)](https://camo.githubusercontent.com/623b4dea0b653f2ad3f36c71ebfe749a677ac0a1/68747470733a2f2f6d69726f2e6d656469756d2e636f6d2f6d61782f343030362f312a44304a31674e51663876727255704b657944387750412e706e67)

[](#table-of-contents)Table of Contents
=======================================

1.  [Model Details](#model-details)
2.  [Uses](#uses)
3.  [Bias, Risks, and Limitations](#bias-risks-and-limitations)
4.  [Training Details](#training-details)
5.  [Evaluation](#evaluation)
6.  [Environmental Impact](#environmental-impact)
7.  [Citation](#citation)
8.  [Model Card Authors](#model-card-authors)
9.  [How To Get Started With the Model](#how-to-get-started-with-the-model)

[](#model-details)Model Details
===============================

[](#model-description)Model Description
---------------------------------------

The developers of the Text-To-Text Transfer Transformer (T5) [write](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html):

> With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to BERT-style models that can only output either a class label or a span of the input. Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task.

T5-11B is the checkpoint with 11 billion parameters.

*   **Developed by:** Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. See [associated paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) and [GitHub repo](https://github.com/google-research/text-to-text-transfer-transformer#released-model-checkpoints)
*   **Model type:** Language model
*   **Language(s) (NLP):** English, French, Romanian, German
*   **License:** Apache 2.0
*   **Related Models:** [All T5 Checkpoints](https://huggingface.co/models?search=t5)
*   **Resources for more information:**
    *   [Research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf)
    *   [Google's T5 Blog Post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html)
    *   [GitHub Repo](https://github.com/google-research/text-to-text-transfer-transformer)
    *   [Hugging Face T5 Docs](https://huggingface.co/docs/transformers/model_doc/t5)

[](#uses)Uses
=============

[](#direct-use-and-downstream-use)Direct Use and Downstream Use
---------------------------------------------------------------

The developers write in a [blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) that the model:

> Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task, including machine translation, document summarization, question answering, and classification tasks (e.g., sentiment analysis). We can even apply T5 to regression tasks by training it to predict the string representation of a number instead of the number itself.

See the [blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) and [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for further details.

[](#out-of-scope-use)Out-of-Scope Use
-------------------------------------

More information needed.

[](#bias-risks-and-limitations)Bias, Risks, and Limitations
===========================================================

More information needed.

[](#recommendations)Recommendations
-----------------------------------

More information needed.

[](#training-details)Training Details
=====================================

[](#training-data)Training Data
-------------------------------

The model is pre-trained on the [Colossal Clean Crawled Corpus (C4)](https://www.tensorflow.org/datasets/catalog/c4), which was developed and released in the context of the same [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) as T5.

The model was pre-trained on a on a **multi-task mixture of unsupervised (1.) and supervised tasks (2.)**. Thereby, the following datasets were being used for (1.) and (2.):

1.  **Datasets used for Unsupervised denoising objective**:

*   [C4](https://huggingface.co/datasets/c4)
*   [Wiki-DPR](https://huggingface.co/datasets/wiki_dpr)

2.  **Datasets used for Supervised text-to-text language modeling objective**

*   Sentence acceptability judgment
    *   CoLA [Warstadt et al., 2018](https://arxiv.org/abs/1805.12471)
*   Sentiment analysis
    *   SST-2 [Socher et al., 2013](https://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf)
*   Paraphrasing/sentence similarity
    *   MRPC [Dolan and Brockett, 2005](https://aclanthology.org/I05-5002)
    *   STS-B [Ceret al., 2017](https://arxiv.org/abs/1708.00055)
    *   QQP [Iyer et al., 2017](https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs)
*   Natural language inference
    *   MNLI [Williams et al., 2017](https://arxiv.org/abs/1704.05426)
    *   QNLI [Rajpurkar et al.,2016](https://arxiv.org/abs/1606.05250)
    *   RTE [Dagan et al., 2005](https://link.springer.com/chapter/10.1007/11736790_9)
    *   CB [De Marneff et al., 2019](https://semanticsarchive.net/Archive/Tg3ZGI2M/Marneffe.pdf)
*   Sentence completion
    *   COPA [Roemmele et al., 2011](https://www.researchgate.net/publication/221251392_Choice_of_Plausible_Alternatives_An_Evaluation_of_Commonsense_Causal_Reasoning)
*   Word sense disambiguation
    *   WIC [Pilehvar and Camacho-Collados, 2018](https://arxiv.org/abs/1808.09121)
*   Question answering
    *   MultiRC [Khashabi et al., 2018](https://aclanthology.org/N18-1023)
    *   ReCoRD [Zhang et al., 2018](https://arxiv.org/abs/1810.12885)
    *   BoolQ [Clark et al., 2019](https://arxiv.org/abs/1905.10044)

[](#training-procedure)Training Procedure
-----------------------------------------

In their [abstract](https://jmlr.org/papers/volume21/20-074/20-074.pdf), the model developers write:

> In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks.

The framework introduced, the T5 framework, involves a training procedure that brings together the approaches studied in the paper. See the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for further details.

[](#evaluation)Evaluation
=========================

[](#testing-data-factors--metrics)Testing Data, Factors & Metrics
-----------------------------------------------------------------

The developers evaluated the model on 24 tasks, see the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for full details.

[](#results)Results
-------------------

For full results for T5-11B, see the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf), Table 14.

[](#environmental-impact)Environmental Impact
=============================================

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

*   **Hardware Type:** Google Cloud TPU Pods
*   **Hours used:** More information needed
*   **Cloud Provider:** GCP
*   **Compute Region:** More information needed
*   **Carbon Emitted:** More information needed

[](#citation)Citation
=====================

**BibTeX:**

    @article{2020t5,
      author  = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},
      title   = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
      journal = {Journal of Machine Learning Research},
      year    = {2020},
      volume  = {21},
      number  = {140},
      pages   = {1-67},
      url     = {http://jmlr.org/papers/v21/20-074.html}
    }
    

**APA:**

*   Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140), 1-67.

[](#model-card-authors)Model Card Authors
=========================================

This model card was written by the team at Hugging Face.

[](#how-to-get-started-with-the-model)How to Get Started with the Model
=======================================================================

[](#disclaimer)Disclaimer
-------------------------

**Before `transformers` v3.5.0**, due do its immense size, `t5-11b` required some special treatment. If you're using transformers `<= v3.4.0`, `t5-11b` should be loaded with flag `use_cdn` set to `False` as follows:

    t5 = transformers.T5ForConditionalGeneration.from_pretrained('t5-11b', use_cdn = False)
    

Secondly, a single GPU will most likely not have enough memory to even load the model into memory as the weights alone amount to over 40 GB.

*   Model parallelism has to be used here to overcome this problem as is explained in this [PR](https://github.com/huggingface/transformers/pull/3578).
*   DeepSpeed's ZeRO-Offload is another approach as explained in this [post](https://github.com/huggingface/transformers/issues/9996).

See the [Hugging Face T5](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5Model) docs and a [Colab Notebook](https://colab.research.google.com/github/google-research/text-to-text-transfer-transformer/blob/main/notebooks/t5-trivia.ipynb) created by the model developers for more context.

## Model overview

`t5-11b` is a large language model developed by the Google AI team as part of their Text-to-Text Transfer Transformer (T5) framework. The T5 framework aims to unify different NLP tasks into a common text-to-text format, allowing the same model to be used for a variety of applications like machine translation, summarization, and question answering. `t5-11b` is the largest checkpoint in the T5 model series, with 11 billion parameters.

The `t5-base` and `t5-large` models are smaller variants of `t5-11b`, with 220 million and 770 million parameters respectively. All T5 models are trained on a diverse set of supervised and unsupervised NLP tasks, allowing them to develop strong general language understanding capabilities.

## Model inputs and outputs

### Inputs
- **Text strings**: T5 models accept text as input, allowing them to be used for a wide variety of NLP tasks.

### Outputs
- **Text strings**: The output of T5 models is also in text form, enabling them to generate natural language as well as classify or extract information from input text.

## Capabilities

The T5 framework allows the same model to be applied to many different NLP tasks, including machine translation, document summarization, question answering, and text classification. For example, the model can be used to translate text from one language to another, summarize long documents into a few key points, answer questions based on given information, or determine the sentiment of a piece of text.

## What can I use it for?

The versatility of `t5-11b` makes it a powerful tool for a wide range of NLP applications. Researchers and developers can fine-tune the model on domain-specific data to create custom language understanding and generation systems. Potential use cases include:

- **Content creation**: Generating news articles, product descriptions, or creative writing with the model's text generation capabilities.
- **Dialogue and chatbots**: Building conversational agents that can engage in natural discussions by leveraging the model's text understanding and generation.
- **Question answering**: Creating systems that can answer questions by extracting relevant information from text.
- **Summarization**: Automatically summarizing long documents or articles into concise overviews.

## Things to try

While `t5-11b` is a powerful model, it's important to carefully evaluate its outputs and monitor for potential biases or inappropriate content generation. The model should be used responsibly, with appropriate safeguards and oversight, especially for high-stakes applications. Experimenting with the model on a variety of tasks and carefully evaluating its performance can help uncover its strengths and limitations.