[Google's mT5](https://github.com/google-research/multilingual-t5)

mT5 is pretrained on the [mC4](https://www.tensorflow.org/datasets/catalog/c4#c4multilingual) corpus, covering 101 languages:

Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese, Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Sotho, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, West Frisian, Xhosa, Yiddish, Yoruba, Zulu.

**Note**: mT5 was only pre-trained on mC4 excluding any supervised training. Therefore, this model has to be fine-tuned before it is useable on a downstream task.

Pretraining Dataset: [mC4](https://www.tensorflow.org/datasets/catalog/c4#c4multilingual)

Other Community Checkpoints: [here](https://huggingface.co/models?search=mt5)

Paper: [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934)

Authors: _Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel_

[](#abstract)Abstract
---------------------

The recent "Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We describe the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. All of the code and model checkpoints used in this work are publicly available.

## Model overview

`mT5` is a multilingual variant of the Text-to-Text Transfer Transformer (T5) model, developed by Google. It was pre-trained on the [mC4](https://www.tensorflow.org/datasets/catalog/c4#c4multilingual) dataset, which covers 101 languages, making it a versatile model for multilingual natural language processing tasks.

The mT5 model shares the same architecture as the original T5 model, but was trained on a much broader set of languages. Like T5, mT5 uses a unified text-to-text format, allowing it to be applied to a wide variety of NLP tasks such as translation, summarization, and question answering. However, mT5 was only pre-trained on the unsupervised mC4 dataset, and requires fine-tuning before it can be used on specific downstream tasks.

Compared to the monolingual T5 models, the multilingual mT5 model offers the advantage of supporting a large number of languages out-of-the-box. This can be particularly useful for applications that need to handle content in multiple languages. The [t5-base](https://aimodels.fyi/models/huggingFace/t5-base-google-t5) and [t5-large](https://aimodels.fyi/models/huggingFace/t5-large-google-t5) models, on the other hand, are optimized for English-language tasks.

## Model inputs and outputs

### Inputs
- **Text**: mT5 takes text as input, which can be in any of the 101 supported languages.

### Outputs
- **Text**: mT5 generates text as output, which can be in any of the supported languages. The output can be used for a variety of tasks, such as:
    - Machine translation
    - Text summarization
    - Question answering
    - Text generation

## Capabilities

mT5 is a powerful multilingual model that can be applied to a wide range of natural language processing tasks. Its key strength lies in its ability to handle content in 101 different languages, making it a valuable tool for applications that need to process multilingual data.

For example, the mT5 model could be used to translate text between any of the supported languages, or to generate summaries of documents in multiple languages. It could also be fine-tuned for tasks such as multilingual question answering or text generation, where the model's ability to understand and produce text in a variety of languages would be a significant advantage.

## What can I use it for?

The mT5 model's multilingual capabilities make it a versatile tool for a variety of applications. Some potential use cases include:

- **Machine translation**: Fine-tune mT5 on parallel text data to create a multilingual translation system that can translate between any of the 101 supported languages.

- **Multilingual text summarization**: Use mT5 to generate concise summaries of documents in multiple languages, helping users quickly understand the key points of content in a variety of languages.

- **Multilingual question answering**: Fine-tune mT5 on multilingual question-answering datasets to create a system that can answer questions in any of the supported languages.

- **Multilingual content generation**: Leverage mT5's text generation capabilities to produce high-quality content in multiple languages, such as news articles, product descriptions, or creative writing.

## Things to try

One interesting aspect of the mT5 model is its ability to handle code-switching, where content contains a mix of multiple languages. This can be a common occurrence in multilingual settings, such as social media or online forums.

To explore mT5's code-switching capabilities, you could try providing the model with input text that contains a mix of languages, and observe how it handles the translation or generation of the output. This could involve creating test cases with varying degrees of language mixing, and evaluating the model's performance on preserving the original meaning and tone across the different languages.

Additionally, you could investigate how mT5 performs on low-resource languages within the 101 language set. Since the model was pre-trained on a diverse corpus, it may be able to generate reasonably high-quality outputs for languages with limited training data, which could be valuable for certain applications.