[](#tinyroberta-squad2)tinyroberta-squad2
=========================================

This is the _distilled_ version of the [deepset/roberta-base-squad2](https://huggingface.co/deepset/roberta-base-squad2) model. This model has a comparable prediction quality and runs at twice the speed of the base model.

[](#overview)Overview
---------------------

**Language model:** tinyroberta-squad2  
**Language:** English  
**Downstream-task:** Extractive QA  
**Training data:** SQuAD 2.0  
**Eval data:** SQuAD 2.0  
**Code:** See [an example QA pipeline on Haystack](https://haystack.deepset.ai/tutorials/first-qa-system)  
**Infrastructure**: 4x Tesla v100

[](#hyperparameters)Hyperparameters
-----------------------------------

    batch_size = 96
    n_epochs = 4
    base_LM_model = "deepset/tinyroberta-squad2-step1"
    max_seq_len = 384
    learning_rate = 3e-5
    lr_schedule = LinearWarmup
    warmup_proportion = 0.2
    doc_stride = 128
    max_query_length = 64
    distillation_loss_weight = 0.75
    temperature = 1.5
    teacher = "deepset/robert-large-squad2"
    

[](#distillation)Distillation
-----------------------------

This model was distilled using the TinyBERT approach described in [this paper](https://arxiv.org/pdf/1909.10351.pdf) and implemented in [haystack](https://github.com/deepset-ai/haystack). Firstly, we have performed intermediate layer distillation with roberta-base as the teacher which resulted in [deepset/tinyroberta-6l-768d](https://huggingface.co/deepset/tinyroberta-6l-768d). Secondly, we have performed task-specific distillation with [deepset/roberta-base-squad2](https://huggingface.co/deepset/roberta-base-squad2) as the teacher for further intermediate layer distillation on an augmented version of SQuADv2 and then with [deepset/roberta-large-squad2](https://huggingface.co/deepset/roberta-large-squad2) as the teacher for prediction layer distillation.

[](#usage)Usage
---------------

### [](#in-haystack)In Haystack

Haystack is an NLP framework by deepset. You can use this model in a Haystack pipeline to do question answering at scale (over many documents). To load the model in [Haystack](https://github.com/deepset-ai/haystack/):

    reader = FARMReader(model_name_or_path="deepset/tinyroberta-squad2")
    # or 
    reader = TransformersReader(model_name_or_path="deepset/tinyroberta-squad2")
    

### [](#in-transformers)In Transformers

    from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline
    
    model_name = "deepset/tinyroberta-squad2"
    
    # a) Get predictions
    nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
    QA_input = {
        'question': 'Why is model conversion important?',
        'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.'
    }
    res = nlp(QA_input)
    
    # b) Load model & tokenizer
    model = AutoModelForQuestionAnswering.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    

[](#performance)Performance
---------------------------

Evaluated on the SQuAD 2.0 dev set with the [official eval script](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/).

    "exact": 78.69114798281817,
    "f1": 81.9198998536977,
    
    "total": 11873,
    "HasAns_exact": 76.19770580296895,
    "HasAns_f1": 82.66446878592329,
    "HasAns_total": 5928,
    "NoAns_exact": 81.17746005046257,
    "NoAns_f1": 81.17746005046257,
    "NoAns_total": 5945
    

[](#authors)Authors
-------------------

**Branden Chan:** [branden.chan@deepset.ai](mailto:branden.chan@deepset.ai)  
**Timo Mller:** [timo.moeller@deepset.ai](mailto:timo.moeller@deepset.ai)  
**Malte Pietsch:** [malte.pietsch@deepset.ai](mailto:malte.pietsch@deepset.ai)  
**Tanay Soni:** [tanay.soni@deepset.ai](mailto:tanay.soni@deepset.ai)  
**Michel Bartels:** [michel.bartels@deepset.ai](mailto:michel.bartels@deepset.ai)

[](#about-us)About us
---------------------

![](https://raw.githubusercontent.com/deepset-ai/.github/main/deepset-logo-colored.png)

![](https://raw.githubusercontent.com/deepset-ai/.github/main/haystack-logo-colored.png)

[deepset](http://deepset.ai/) is the company behind the open-source NLP framework [Haystack](https://haystack.deepset.ai/) which is designed to help you build production ready NLP systems that use: Question answering, summarization, ranking etc.

Some of our other work:

*   [roberta-base-squad2](/deepset/tinyroberta-squad2/blob/main/%5Bhttps://huggingface.co/deepset/roberta-base-squad2)
*   [German BERT (aka "bert-base-german-cased")](https://deepset.ai/german-bert)
*   [GermanQuAD and GermanDPR datasets and models (aka "gelectra-base-germanquad", "gbert-base-germandpr")](https://deepset.ai/germanquad)

[](#get-in-touch-and-join-the-haystack-community)Get in touch and join the Haystack community
---------------------------------------------------------------------------------------------

For more info on Haystack, visit our **[GitHub](https://github.com/deepset-ai/haystack)** repo and **[Documentation](https://docs.haystack.deepset.ai)**.

We also have a **[Discord community open to everyone!](https://haystack.deepset.ai/community/join)**

[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)

By the way: [we're hiring!](http://www.deepset.ai/jobs)

## Model overview

The `tinyroberta-squad2` model is a distilled version of the `deepset/roberta-base-squad2` model, which was fine-tuned on the SQuAD 2.0 dataset. This distilled model has a comparable prediction quality to the base model but runs at twice the speed. It was developed using knowledge distillation, a technique where a smaller "student" model is trained to match the performance of a larger "teacher" model.

The distillation process involved two steps. First, an intermediate layer distillation was performed using `roberta-base` as the teacher, resulting in the `deepset/tinyroberta-6l-768d` model. Then, a task-specific distillation was done using `deepset/roberta-base-squad2` and `deepset/roberta-large-squad2` as the teachers for further intermediate layer and prediction layer distillation, respectively.

Compared to similar models, the `tinyroberta-squad2` model is a more efficient version of the `deepset/roberta-base-squad2` [model](https://aimodels.fyi/models/huggingFace/roberta-base-squad2-deepset), running at twice the speed. Another related model is the [distilbert-base-cased-distilled-squad](https://aimodels.fyi/models/huggingFace/distilbert-base-cased-distilled-squad-distilbert) model, which is a distilled version of DistilBERT fine-tuned on SQuAD.

## Model inputs and outputs

### Inputs
- **Question**: A natural language question
- **Context**: The passage of text that contains the answer to the question

### Outputs
- **Answer**: The span of text from the context that answers the question
- **Score**: A confidence score for the predicted answer

## Capabilities

The `tinyroberta-squad2` model is capable of performing extractive question answering, where it can identify the span of text from a given passage that answers a given question. For example, given the question "What is the capital of France?" and the context "Paris is the capital of France", the model would correctly predict "Paris" as the answer.

## What can I use it for?

The `tinyroberta-squad2` model can be useful for building question answering systems, such as chatbots or virtual assistants, that can provide answers to users' questions by searching through a database of documents. The model's small size and fast inference speed make it particularly well-suited for deployment in resource-constrained environments or on mobile devices.

To use the `tinyroberta-squad2` model in your own projects, you can load it using the Haystack framework, as shown in the [example pipeline](https://haystack.deepset.ai/tutorials/first-qa-system) on the Haystack website. Alternatively, you can use the model directly with the Transformers library, as demonstrated in the [Transformers documentation](https://huggingface.co/deepset/tinyroberta-squad2).

## Things to try

One interesting aspect of the `tinyroberta-squad2` model is its distillation process, where a smaller, more efficient model was created by learning from a larger, more powerful teacher model. This technique can be applied to other types of models and tasks, and it would be interesting to explore how the performance and characteristics of the distilled model compare to the teacher model, as well as to other distilled models.

Another area to explore is the model's performance on different types of questions and contexts, such as those involving specialized terminology, complex reasoning, or multi-sentence answers. Understanding the model's strengths and weaknesses can help guide the development of more robust and versatile question answering systems.