metadata

language: english
license: mit
datasets:
  - race
  - ai2_arc
  - openbookqa
metrics:
  - accuracy

Roberta Large Fine Tuned on RACE

Model description

This model follows the implementation by Allen AI team about Aristo Roberta V7 Model given in ARC Challenge

How to use


import datasets
from transformers import RobertaTokenizer
from transformers import  RobertaForMultipleChoice

tokenizer = RobertaTokenizer.from_pretrained(
"LIAMF-USP/aristo-roberta")
model = RobertaForMultipleChoice.from_pretrained(
"LIAMF-USP/aristo-roberta")
dataset = datasets.load_dataset(
    "arc",,
    split=["train", "validation", "test"],
)
training_examples = dataset[0]
evaluation_examples = dataset[1]
test_examples = dataset[2]

example=training_examples[0] 
example_id = example["example_id"]
question = example["question"]
label_example = example["answer"]
options = example["options"]
if label_example in ["A", "B", "C", "D", "E"]:
    label_map = {label: i for i, label in enumerate(
                    ["A", "B", "C", "D", "E"])}
elif label_example in ["1", "2", "3", "4", "5"]:
    label_map = {label: i for i, label in enumerate(
                    ["1", "2", "3", "4", "5"])}
else:
    print(f"{label_example} not found")
while len(options) < 5:
    empty_option = {}
    empty_option['option_context'] = ''
    empty_option['option_text'] = ''
    options.append(empty_option)
choices_inputs = []
for ending_idx, option in enumerate(options):
    ending = option["option_text"]
    context = option["option_context"]
    if question.find("_") != -1:
        # fill in the banks questions
        question_option = question.replace("_", ending)
    else:
        question_option = question + " " + ending
    
    inputs = tokenizer(
        context,
        question_option,
        add_special_tokens=True,
        max_length=MAX_SEQ_LENGTH,
        padding="max_length",
        truncation=True,
        return_overflowing_tokens=False,
    )
    
    if "num_truncated_tokens" in inputs and inputs["num_truncated_tokens"] > 0:
        logging.warning(f"Question: {example_id} with option {ending_idx} was truncated")
    choices_inputs.append(inputs)
label = label_map[label_example]
input_ids = [x["input_ids"] for x in choices_inputs]
attention_mask = (
    [x["attention_mask"] for x in choices_inputs]
     # as the senteces follow the same structure, just one of them is
     # necessary to check
    if "attention_mask" in choices_inputs[0]
    else None
)
example_encoded = {
    "example_id": example_id,
    "input_ids": input_ids,
    "attention_mask": attention_mask,
    "token_type_ids": token_type_ids,
    "label": label

}
output = model(**example_encoded)

Training data

the Training data was the same as proposed here

The only diferrence was the hypeparameters of RACE fine tuned model, which were reported here

Training procedure

It was necessary to preprocess the data with a method that is exemplified for a single instance in the How to use section. The used hyperparameters were the following:

Hyperparameter	Value
adam_beta1	0.9
adam_beta2	0.98
adam_epsilon	1.000e-8
eval_batch_size	16
train_batch_size	4
fp16	True
gradient_accumulation_steps	4
learning_rate	0.00001
warmup_steps	0.06
max_length	256
epochs	4

The other parameters were the default ones from Trainer and Trainer Arguments

Eval results:

Dataset Acc	Challenge Test
	65.358

The model was trained with a TITAN RTX