[](#this-model-can-be-used-for-extractive-qa)This model can be used for Extractive QA
-------------------------------------------------------------------------------------

It has been finetuned for 3 epochs on [SQuAD2.0](https://rajpurkar.github.io/SQuAD-explorer/).

[](#usage)Usage
---------------

    from transformers import pipeline
    
    qa_model = pipeline("question-answering", "timpal0l/mdeberta-v3-base-squad2")
    question = "Where do I live?"
    context = "My name is Tim and I live in Sweden."
    qa_model(question = question, context = context)
    # {'score': 0.975547730922699, 'start': 28, 'end': 36, 'answer': ' Sweden.'}
    

[](#evaluation-on-squad20-dev-set)Evaluation on SQuAD2.0 dev set
----------------------------------------------------------------

    {
        "epoch": 3.0,
        "eval_HasAns_exact": 79.65587044534414,
        "eval_HasAns_f1": 85.91387795001529,
        "eval_HasAns_total": 5928,
        "eval_NoAns_exact": 82.10260723296888,
        "eval_NoAns_f1": 82.10260723296888,
        "eval_NoAns_total": 5945,
        "eval_best_exact": 80.8809904826076,
        "eval_best_exact_thresh": 0.0,
        "eval_best_f1": 84.00551406448994,
        "eval_best_f1_thresh": 0.0,
        "eval_exact": 80.8809904826076,
        "eval_f1": 84.00551406449004,
        "eval_samples": 12508,
        "eval_total": 11873,
        "train_loss": 0.7729689576483615,
        "train_runtime": 9118.953,
        "train_samples": 134891,
        "train_samples_per_second": 44.377,
        "train_steps_per_second": 0.925
    }
    

[](#debertav3-improving-deberta-using-electra-style-pre-training-with-gradient-disentangled-embedding-sharing)DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

[DeBERTa](https://arxiv.org/abs/2006.03654) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data.

In [DeBERTa V3](https://arxiv.org/abs/2111.09543), we further improved the efficiency of DeBERTa using ELECTRA-Style pre-training with Gradient Disentangled Embedding Sharing. Compared to DeBERTa, our V3 version significantly improves the model performance on downstream tasks. You can find more technique details about the new model from our [paper](https://arxiv.org/abs/2111.09543).

Please check the [official repository](https://github.com/microsoft/DeBERTa) for more implementation details and updates.

mDeBERTa is multilingual version of DeBERTa which use the same structure as DeBERTa and was trained with CC100 multilingual data. The mDeBERTa V3 base model comes with 12 layers and a hidden size of 768. It has 86M backbone parameters with a vocabulary containing 250K tokens which introduces 190M parameters in the Embedding layer. This model was trained using the 2.5T CC100 data as XLM-R.

## Model overview

The `mdeberta-v3-base-squad2` model is a multilingual version of the DeBERTa model, fine-tuned on the SQuAD 2.0 dataset for extractive question answering. DeBERTa, introduced in [the DeBERTa paper](https://openreview.net/forum?id=XPZIaotutsD), improves upon the BERT and RoBERTa models using disentangled attention and an enhanced mask decoder. Compared to these earlier models, DeBERTa achieves stronger performance on a majority of natural language understanding tasks. 

The [DeBERTa V3 paper](https://arxiv.org/abs/2111.09543) further enhances the efficiency of DeBERTa using ELECTRA-style pre-training with gradient-disentangled embedding sharing. This `mdeberta-v3-base` model is a multilingual version of the DeBERTa V3 base model, which has 12 layers, a hidden size of 768, and 86M backbone parameters.

Compared to the monolingual `deberta-v3-base` model, the `mdeberta-v3-base` model was trained on the 2.5 trillion token CC100 multilingual dataset, giving it the ability to understand and generate text in many languages. Like the monolingual version, this multilingual model demonstrates strong performance on a variety of natural language understanding benchmarks.

## Model inputs and outputs

### Inputs
- **Question**: A natural language question to be answered
- **Context**: The text passage that contains the answer to the question

### Outputs
- **Answer**: The text span from the context that answers the question
- **Score**: The model's confidence in the predicted answer, between 0 and 1
- **Start**: The starting index of the answer span in the context
- **End**: The ending index of the answer span in the context

## Capabilities

The `mdeberta-v3-base-squad2` model is capable of extracting the most relevant answer to a given question from a provided text passage. It was fine-tuned on the SQuAD 2.0 dataset, which tests this exact task of extractive question answering.

On the SQuAD 2.0 dev set, the model achieves an F1 score of 84.01 and an exact match score of 80.88, demonstrating strong performance on this benchmark.

## What can I use it for?

The `mdeberta-v3-base-squad2` model can be used for a variety of question answering applications, such as:

- Building chatbots or virtual assistants that can engage in natural conversations and answer users' questions
- Developing educational or academic applications that can help students find answers to their questions within provided text
- Enhancing search engines to better understand user queries and retrieve the most relevant information

By leveraging the multilingual capabilities of this model, these applications can be made accessible to users across a wide range of languages.

## Things to try

One interesting aspect of the `mdeberta-v3-base-squad2` model is its strong performance on the SQuAD 2.0 dataset, which includes both answerable and unanswerable questions. This means the model has learned to not only extract relevant answers from a given context, but also to identify when the context does not contain enough information to answer a question.

You could experiment with this capability by providing the model with a variety of questions, some of which have clear answers in the context and others that are more open-ended or lacking sufficient information. Observe how the model's outputs and confidence scores differ between these two cases, and consider how this could be leveraged in your applications.

Another interesting direction to explore would be fine-tuning the `mdeberta-v3-base` model on additional datasets or tasks beyond just SQuAD 2.0. The strong performance of the DeBERTa architecture on a wide range of natural language understanding benchmarks suggests that this multilingual version could be effectively adapted to other question answering, reading comprehension, or even general language understanding tasks.