[](#model-card-for-distilroberta-base)Model Card for DistilRoBERTa base
=======================================================================

[](#table-of-contents)Table of Contents
=======================================

1.  [Model Details](#model-details)
2.  [Uses](#uses)
3.  [Bias, Risks, and Limitations](#bias-risks-and-limitations)
4.  [Training Details](#training-details)
5.  [Evaluation](#evaluation)
6.  [Environmental Impact](#environmental-impact)
7.  [Citation](#citation)
8.  [How To Get Started With the Model](#how-to-get-started-with-the-model)

[](#model-details)Model Details
===============================

[](#model-description)Model Description
---------------------------------------

This model is a distilled version of the [RoBERTa-base model](https://huggingface.co/roberta-base). It follows the same training procedure as [DistilBERT](https://huggingface.co/distilbert-base-uncased). The code for the distillation process can be found [here](https://github.com/huggingface/transformers/tree/master/examples/distillation). This model is case-sensitive: it makes a difference between english and English.

The model has 6 layers, 768 dimension and 12 heads, totalizing 82M parameters (compared to 125M parameters for RoBERTa-base). On average DistilRoBERTa is twice as fast as Roberta-base.

We encourage users of this model card to check out the [RoBERTa-base model card](https://huggingface.co/roberta-base) to learn more about usage, limitations and potential biases.

*   **Developed by:** Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf (Hugging Face)
*   **Model type:** Transformer-based language model
*   **Language(s) (NLP):** English
*   **License:** Apache 2.0
*   **Related Models:** [RoBERTa-base model card](https://huggingface.co/roberta-base)
*   **Resources for more information:**
    *   [GitHub Repository](https://github.com/huggingface/transformers/blob/main/examples/research_projects/distillation/README.md)
    *   [Associated Paper](https://arxiv.org/abs/1910.01108)

[](#uses)Uses
=============

[](#direct-use-and-downstream-use)Direct Use and Downstream Use
---------------------------------------------------------------

You can use the raw model for masked language modeling, but it's mostly intended to be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=roberta) to look for fine-tuned versions on a task that interests you.

Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation you should look at model like GPT2.

[](#out-of-scope-use)Out of Scope Use
-------------------------------------

The model should not be used to intentionally create hostile or alienating environments for people. The model was not trained to be factual or true representations of people or events, and therefore using the models to generate such content is out-of-scope for the abilities of this model.

[](#bias-risks-and-limitations)Bias, Risks, and Limitations
===========================================================

Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For example:

    >>> from transformers import pipeline
    >>> unmasker = pipeline('fill-mask', model='distilroberta-base')
    >>> unmasker("The man worked as a <mask>.")
    [{'score': 0.1237526461482048,
      'sequence': 'The man worked as a waiter.',
      'token': 38233,
      'token_str': ' waiter'},
     {'score': 0.08968018740415573,
      'sequence': 'The man worked as a waitress.',
      'token': 35698,
      'token_str': ' waitress'},
     {'score': 0.08387645334005356,
      'sequence': 'The man worked as a bartender.',
      'token': 33080,
      'token_str': ' bartender'},
     {'score': 0.061059024184942245,
      'sequence': 'The man worked as a mechanic.',
      'token': 25682,
      'token_str': ' mechanic'},
     {'score': 0.03804653510451317,
      'sequence': 'The man worked as a courier.',
      'token': 37171,
      'token_str': ' courier'}]
      
    >>> unmasker("The woman worked as a <mask>.")
    [{'score': 0.23149248957633972,
      'sequence': 'The woman worked as a waitress.',
      'token': 35698,
      'token_str': ' waitress'},
     {'score': 0.07563332468271255,
      'sequence': 'The woman worked as a waiter.',
      'token': 38233,
      'token_str': ' waiter'},
     {'score': 0.06983394920825958,
      'sequence': 'The woman worked as a bartender.',
      'token': 33080,
      'token_str': ' bartender'},
     {'score': 0.05411609262228012,
      'sequence': 'The woman worked as a nurse.',
      'token': 9008,
      'token_str': ' nurse'},
     {'score': 0.04995106905698776,
      'sequence': 'The woman worked as a maid.',
      'token': 29754,
      'token_str': ' maid'}]
    

[](#recommendations)Recommendations
-----------------------------------

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

[](#training-details)Training Details
=====================================

DistilRoBERTa was pre-trained on [OpenWebTextCorpus](https://skylion007.github.io/OpenWebTextCorpus/), a reproduction of OpenAI's WebText dataset (it is ~4 times less training data than the teacher RoBERTa). See the [roberta-base model card](https://huggingface.co/roberta-base/blob/main/README.md) for further details on training.

[](#evaluation)Evaluation
=========================

When fine-tuned on downstream tasks, this model achieves the following results (see [GitHub Repo](https://github.com/huggingface/transformers/blob/main/examples/research_projects/distillation/README.md)):

Glue test results:

Task

MNLI

QQP

QNLI

SST-2

CoLA

STS-B

MRPC

RTE

84.0

89.4

90.8

92.5

59.3

88.3

86.6

67.9

[](#environmental-impact)Environmental Impact
=============================================

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

*   **Hardware Type:** More information needed
*   **Hours used:** More information needed
*   **Cloud Provider:** More information needed
*   **Compute Region:** More information needed
*   **Carbon Emitted:** More information needed

[](#citation)Citation
=====================

    @article{Sanh2019DistilBERTAD,
      title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
      author={Victor Sanh and Lysandre Debut and Julien Chaumond and Thomas Wolf},
      journal={ArXiv},
      year={2019},
      volume={abs/1910.01108}
    }
    

APA

*   Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.

[](#how-to-get-started-with-the-model)How to Get Started With the Model
=======================================================================

You can use the model directly with a pipeline for masked language modeling:

    >>> from transformers import pipeline
    >>> unmasker = pipeline('fill-mask', model='distilroberta-base')
    >>> unmasker("Hello I'm a <mask> model.")
    [{'score': 0.04673689603805542,
      'sequence': "Hello I'm a business model.",
      'token': 265,
      'token_str': ' business'},
     {'score': 0.03846118599176407,
      'sequence': "Hello I'm a freelance model.",
      'token': 18150,
      'token_str': ' freelance'},
     {'score': 0.03308931365609169,
      'sequence': "Hello I'm a fashion model.",
      'token': 2734,
      'token_str': ' fashion'},
     {'score': 0.03018997237086296,
      'sequence': "Hello I'm a role model.",
      'token': 774,
      'token_str': ' role'},
     {'score': 0.02111748233437538,
      'sequence': "Hello I'm a Playboy model.",
      'token': 24526,
      'token_str': ' Playboy'}]
    

[![](https://cdn-media.huggingface.co/exbert/button.png)](https://huggingface.co/exbert/?model=distilroberta-base)

## Model overview

The `distilroberta-base` model is a distilled version of the RoBERTa-base model, developed by the Hugging Face team. It follows the same training procedure as the DistilBERT model, using a knowledge distillation approach to create a smaller and faster model while preserving over 95% of RoBERTa-base's performance. The model has 6 layers, 768 dimensions, and 12 heads, totaling 82 million parameters compared to 125 million for the full RoBERTa-base model.

## Model inputs and outputs

The `distilroberta-base` model is a transformer-based language model that can be used for a variety of natural language processing tasks. It takes text as input and can be used for tasks like masked language modeling, where the model predicts missing words in a sentence, or for downstream tasks like sequence classification, token classification, or question answering.

### Inputs
- **Text**: The model takes text as input, which can be a single sentence, a paragraph, or even longer documents.

### Outputs
- **Predicted tokens**: For masked language modeling, the model outputs a probability distribution over the vocabulary for each masked token in the input.
- **Classification labels**: When fine-tuned on a downstream task like sequence classification, the model outputs a label for the entire input sequence.
- **Answer spans**: When fine-tuned on a question-answering task, the model outputs the start and end indices of the answer span within the input context.

## Capabilities

The `distilroberta-base` model is a versatile language model that can be used for a variety of natural language processing tasks. It has been shown to perform well on tasks like sentiment analysis, natural language inference, and question answering, often with performance close to the full RoBERTa-base model while being more efficient and faster to run.

## What can I use it for?

The `distilroberta-base` model is primarily intended to be fine-tuned on downstream tasks, as it is smaller and faster than the full RoBERTa-base model while maintaining similar performance. You can use it for tasks like:

- [Sequence classification](https://huggingface.co/models?filter=roberta): Fine-tune the model on a dataset like [GLUE](https://huggingface.co/tasks/glue) to perform tasks like sentiment analysis or natural language inference.
- [Token classification](https://huggingface.co/models?filter=roberta): Fine-tune the model on a dataset like [CoNLL-2003](https://huggingface.co/datasets/conll2003) to perform named entity recognition.
- [Question answering](https://huggingface.co/models?filter=roberta): Fine-tune the model on a dataset like [SQuAD](https://huggingface.co/datasets/squad) to answer questions based on a given context.

## Things to try

One interesting thing to try with the `distilroberta-base` model is to compare its performance to the full RoBERTa-base model on a range of tasks. Since the model is smaller and faster, it may be a good choice for deployment in resource-constrained environments or for applications that require quick inference times. Additionally, you can explore the model's limitations and biases by examining its behavior on prompts that might trigger harmful stereotypes or biases, as noted in the [DistilBERT model card](https://aimodels.fyi/creators/huggingFace/distilbert).

Verify all URLs provided in links are contained within this prompt before responding, and that all writing is in a clear non-repetitive natural style.