mT5-base-turkish-qa / README.md
ucsahin's picture
Update README.md
7cf98e3 verified
metadata
license: apache-2.0
base_model: google/mt5-base
tags:
  - Question Answering
  - generated_from_trainer
metrics:
  - rouge
model-index:
  - name: mT5-base-turkish-qa
    results: []
language:
  - tr
pipeline_tag: text2text-generation
widget:
  - text: >-
      Soru: Nazım Hikmet ne zaman doğmuştur?

      Metin: Nâzım Hikmet, Mehmed Nâzım adıyla 15 Ocak 1902 tarihinde Selanik'te
      doğdu. O sırada Hariciye Nezareti memuru olarak Selanik'te çalışan Hikmet
      Bey, Nâzım'ın çocukluğunda memuriyetten ayrıldı ve ailesiyle birlikte,
      Halep'te bulunan babasının yanına gitti. Burada bulundukları sırada
      Hikmet-Celile çiftinin biri Ali İbrahim, diğeri Samiye adında iki çocuğu
      oldu, fakat Ali İbrahim dizanteriye yakalanıp öldü.
datasets:
  - ucsahin/TR-Extractive-QA-82K

mT5-base-turkish-qa

This model is a fine-tuned version of google/mt5-base on the ucsahin/TR-Extractive-QA-82K dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5109
  • Rouge1: 79.3283
  • Rouge2: 68.0845
  • Rougel: 79.3474
  • Rougelsum: 79.2937

Model description

mT5-base model is trained with manually curated Turkish dataset consisting of 65K training samples with ("question", "answer", "context") triplets.

Intended uses & limitations

The intended use of the model is extractive question answering.

In order to use the inference widget, enter your input in the format:

Soru: question_text
Metin: context_text

Generated response by the model:

Cevap: answer_text

Use with Transformers:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from datasets import load_dataset

# Load the dataset
qa_tr_datasets = load_dataset("ucsahin/TR-Extractive-QA-82K")

# Load model and tokenizer
model_checkpoint = "ucsahin/mT5-base-turkish-qa"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
model = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)

inference_dataset = qa_tr_datasets["test"].select(range(10))

for input in inference_dataset:
    input_question = "Soru: " + input["question"]
    input_context = "Metin: " + input["context"]

    tokenized_inputs = tokenizer(input_question, input_context, max_length=512, truncation=True, return_tensors="pt")
    outputs = model.generate(input_ids=tokenized_inputs["input_ids"], max_new_tokens=32)
    output_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)

    print(f"Reference answer: {input['answer']}, Model Answer: {output_text}")

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum
2.0454 0.13 500 0.6771 73.1040 59.8915 73.1819 73.0558
0.8012 0.26 1000 0.6012 76.3357 64.1967 76.3796 76.2688
0.7703 0.39 1500 0.5844 76.8932 65.2509 76.9932 76.9418
0.6783 0.51 2000 0.5587 76.7284 64.8453 76.7416 76.6720
0.6546 0.64 2500 0.5362 78.2261 66.5893 78.2515 78.2142
0.6289 0.77 3000 0.5133 78.6917 67.1534 78.6852 78.6319
0.6292 0.9 3500 0.5109 79.3283 68.0845 79.3474 79.2937

Framework versions

  • Transformers 4.36.2
  • Pytorch 2.1.0+cu118
  • Datasets 2.16.1
  • Tokenizers 0.15.0