roberta-base for QA

Note: this is a clone of roberta-base-squad2 for internal testing.

This is the roberta-base model, fine-tuned using the SQuAD2.0 dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Question Answering.

Overview

Language model: roberta-base
Language: English
Downstream-task: Extractive QA
Training data: SQuAD 2.0
Eval data: SQuAD 2.0
Code: See an example QA pipeline on Haystack
Infrastructure: 4x Tesla v100

Hyperparameters

batch_size = 96
n_epochs = 2
base_LM_model = "roberta-base"
max_seq_len = 386
learning_rate = 3e-5
lr_schedule = LinearWarmup
warmup_proportion = 0.2
doc_stride=128
max_query_length=64

Using a distilled model instead

Please note that we have also released a distilled version of this model called deepset/tinyroberta-squad2. The distilled model has a comparable prediction quality and runs at twice the speed of the base model.

Usage

In Haystack

Haystack is an NLP framework by deepset. You can use this model in a Haystack pipeline to do question answering at scale (over many documents). To load the model in Haystack:

reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2")
# or 
reader = TransformersReader(model_name_or_path="deepset/roberta-base-squad2",tokenizer="deepset/roberta-base-squad2")

For a complete example of roberta-base-squad2 being used for Question Answering, check out the Tutorials in Haystack Documentation

In Transformers

from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline

model_name = "deepset/roberta-base-squad2"

# a) Get predictions
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
QA_input = {
    'question': 'Why is model conversion important?',
    'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.'
}
res = nlp(QA_input)

# b) Load model & tokenizer
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Performance

Evaluated on the SQuAD 2.0 dev set with the official eval script.

"exact": 79.87029394424324,
"f1": 82.91251169582613,

"total": 11873,
"HasAns_exact": 77.93522267206478,
"HasAns_f1": 84.02838248389763,
"HasAns_total": 5928,
"NoAns_exact": 81.79983179142137,
"NoAns_f1": 81.79983179142137,
"NoAns_total": 5945

Using the official question answering notebook from transformers yields:

{'HasAns_exact': 77.93522267206478,
 'HasAns_f1': 83.93715663402219,
 'HasAns_total': 5928,
 'NoAns_exact': 81.90075693860386,
 'NoAns_f1': 81.90075693860386,
 'NoAns_total': 5945,
 'best_exact': 79.92082877116145,
 'best_exact_thresh': 0.0,
 'best_f1': 82.91749890730902,
 'best_f1_thresh': 0.0,
 'exact': 79.92082877116145,
 'f1': 82.91749890730917,
 'total': 11873}

which is consistent with the officially reported results. Using the question answering Evaluator from evaluate gives:

 {'HasAns_exact': 77.91835357624831,
 'HasAns_f1': 84.07820736158186,
 'HasAns_total': 5928,
 'NoAns_exact': 81.91757779646763,
 'NoAns_f1': 81.91757779646763,
 'NoAns_total': 5945,
 'best_exact': 79.92082877116145,
 'best_exact_thresh': 0.996823787689209,
 'best_f1': 82.99634576260925,
 'best_f1_thresh': 0.996823787689209,
 'exact': 79.92082877116145,
 'f1': 82.9963457626089,
 'latency_in_seconds': 0.016523243643392558,
 'samples_per_second': 60.52080460605492,
 'total': 11873,
 'total_time_in_seconds': 196.18047177799986}

which is also consistent with the officially reported results.

Authors

Branden Chan: [email protected]
Timo Möller: [email protected]
Malte Pietsch: [email protected]
Tanay Soni: [email protected]

About us

deepset is the company behind the open-source NLP framework Haystack which is designed to help you build production ready NLP systems that use: Question answering, summarization, ranking etc.

Some of our other work:

Get in touch and join the Haystack community

For more info on Haystack, visit our GitHub repo and Documentation.

We also have a slackcommunity open to everyone!

Twitter | LinkedIn | Slack | GitHub Discussions | Website

By the way: we're hiring!

Downloads last month
35
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train autoevaluate/roberta-base-squad2