File size: 6,758 Bytes
ce447da
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5e1fd84
ce447da
5e1fd84
ce447da
5e1fd84
ce447da
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67749ce
 
356eeb0
491e575
 
356eeb0
491e575
 
356eeb0
67749ce
 
356eeb0
491e575
 
356eeb0
491e575
 
356eeb0
ce447da
 
2ab24d6
ce447da
2ab24d6
ce447da
 
6db5ec5
2ab24d6
 
 
 
6db5ec5
2ab24d6
6db5ec5
2ab24d6
 
 
 
 
 
 
 
 
 
ce447da
 
 
 
 
 
6db5ec5
ce447da
 
6db5ec5
ce447da
 
6db5ec5
 
 
 
 
 
 
ce447da
6db5ec5
 
ce447da
6db5ec5
 
 
 
5e1fd84
2ab24d6
ce447da
 
 
 
 
 
 
 
 
5e1fd84
ce447da
 
67749ce
 
 
 
 
 
 
ce447da
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6db5ec5
2ab24d6
 
6db5ec5
2ab24d6
6db5ec5
2ab24d6
 
 
 
 
 
 
 
6db5ec5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179

---
license: cc-by-4.0
metrics:
- bleu4
- meteor
- rouge-l
- bertscore
- moverscore
language: ko
datasets:
- lmqg/qg_koquad
pipeline_tag: text2text-generation
tags:
- question generation
widget:
- text: "1990년 영화 《 <hl> 남부군 <hl> 》에서 단역으로 영화배우 첫 데뷔에 이어 같은 해 KBS 드라마 《지구인》에서 단역으로 출연하였고 이듬해 MBC 《여명의 눈동자》를 통해 단역으로 출연하였다."
  example_title: "Question Generation Example 1" 
- text: "백신이 없기때문에 예방책은 <hl> 살충제 <hl> 를 사용하면서 서식 장소(찻찬 받침, 배수로, 고인 물의 열린 저장소, 버려진 타이어 등)의 수를 줄임으로써 매개체를 통제할 수 있다."
  example_title: "Question Generation Example 2" 
- text: "<hl> 원테이크 촬영 <hl> 이기 때문에 한 사람이 실수를 하면 처음부터 다시 찍어야 하는 상황이 발생한다."
  example_title: "Question Generation Example 3" 
model-index:
- name: lmqg/mt5-small-koquad
  results:
  - task:
      name: Text2text Generation
      type: text2text-generation
    dataset:
      name: lmqg/qg_koquad
      type: default
      args: default
    metrics:
    - name: BLEU4
      type: bleu4
      value: 0.10570915349557093
    - name: ROUGE-L
      type: rouge-l
      value: 0.2564353531078813
    - name: METEOR
      type: meteor
      value: 0.2752329744142515
    - name: BERTScore
      type: bertscore
      value: 0.8288608218241639
    - name: MoverScore
      type: moverscore
      value: 0.8249013345139385
    - name: QAAlignedF1Score (BERTScore)
      type: qa_aligned_f1_score_bertscore
      value: 0.8752356040623699
    - name: QAAlignedRecall (BERTScore)
      type: qa_aligned_recall_bertscore
      value: 0.8748548506840597
    - name: QAAlignedPrecision (BERTScore)
      type: qa_aligned_precision_bertscore
      value: 0.8756506176198297
    - name: QAAlignedF1Score (MoverScore)
      type: qa_aligned_f1_score_moverscore
      value: 0.8515086132060411
    - name: QAAlignedRecall (MoverScore)
      type: qa_aligned_recall_moverscore
      value: 0.8508602850530972
    - name: QAAlignedPrecision (MoverScore)
      type: qa_aligned_precision_moverscore
      value: 0.8523246242242705
---

# Model Card of `lmqg/mt5-small-koquad`
This model is fine-tuned version of [google/mt5-small](https://huggingface.co./google/mt5-small) for question generation task on the 
[lmqg/qg_koquad](https://huggingface.co./datasets/lmqg/qg_koquad) (dataset_name: default) via [`lmqg`](https://github.com/asahi417/lm-question-generation).


Please cite our paper if you use the model ([https://arxiv.org/abs/2210.03992](https://arxiv.org/abs/2210.03992)).

```

@inproceedings{ushio-etal-2022-generative,
    title = "{G}enerative {L}anguage {M}odels for {P}aragraph-{L}evel {Q}uestion {G}eneration",
    author = "Ushio, Asahi  and
        Alva-Manchego, Fernando  and
        Camacho-Collados, Jose",
    booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, U.A.E.",
    publisher = "Association for Computational Linguistics",
}

```

### Overview
- **Language model:** [google/mt5-small](https://huggingface.co./google/mt5-small)   
- **Language:** ko  
- **Training data:** [lmqg/qg_koquad](https://huggingface.co./datasets/lmqg/qg_koquad) (default)
- **Online Demo:** [https://autoqg.net/](https://autoqg.net/)
- **Repository:** [https://github.com/asahi417/lm-question-generation](https://github.com/asahi417/lm-question-generation)
- **Paper:** [https://arxiv.org/abs/2210.03992](https://arxiv.org/abs/2210.03992)

### Usage
- With [`lmqg`](https://github.com/asahi417/lm-question-generation#lmqg-language-model-for-question-generation-)
```python

from lmqg import TransformersQG
# initialize model
model = TransformersQG(language='ko', model='lmqg/mt5-small-koquad')
# model prediction
question = model.generate_q(list_context=["1990년 영화 《 남부군 》에서 단역으로 영화배우 첫 데뷔에 이어 같은 해 KBS 드라마 《지구인》에서 단역으로 출연하였고 이듬해 MBC 《여명의 눈동자》를 통해 단역으로 출연하였다."], list_answer=["남부군"])

```

- With `transformers`
```python

from transformers import pipeline
# initialize model
pipe = pipeline("text2text-generation", 'lmqg/mt5-small-koquad')
# question generation
question = pipe('1990년 영화 《 <hl> 남부군 <hl> 》에서 단역으로 영화배우 첫 데뷔에 이어 같은 해 KBS 드라마 《지구인》에서 단역으로 출연하였고 이듬해 MBC 《여명의 눈동자》를 통해 단역으로 출연하였다.')

```

## Evaluation Metrics


### Metrics

| Dataset | Type | BLEU4 | ROUGE-L | METEOR | BERTScore | MoverScore | Link |
|:--------|:-----|------:|--------:|-------:|----------:|-----------:|-----:|
| [lmqg/qg_koquad](https://huggingface.co./datasets/lmqg/qg_koquad) | default | 0.106 | 0.256 | 0.275 | 0.829 | 0.825 | [link](https://huggingface.co./lmqg/mt5-small-koquad/raw/main/eval/metric.first.sentence.paragraph_answer.question.lmqg_qg_koquad.default.json) | 


### Metrics (QAG)

| Dataset | Type | QA Aligned F1 Score (BERTScore) | QA Aligned F1 Score (MoverScore) | Link |
|:--------|:-----|--------------------------------:|---------------------------------:|-----:|
| [lmqg/qg_koquad](https://huggingface.co./datasets/lmqg/qg_koquad) | default | 0.875 | 0.852 | [link](https://huggingface.co./lmqg/mt5-small-koquad/raw/main/eval/metric.first.answer.paragraph.questions_answers.lmqg_qg_koquad.default.json) | 
    



## Training hyperparameters

The following hyperparameters were used during fine-tuning:
 - dataset_path: lmqg/qg_koquad
 - dataset_name: default
 - input_types: ['paragraph_answer']
 - output_types: ['question']
 - prefix_types: None
 - model: google/mt5-small
 - max_length: 512
 - max_length_output: 32
 - epoch: 7
 - batch: 16
 - lr: 0.001
 - fp16: False
 - random_seed: 1
 - gradient_accumulation_steps: 4
 - label_smoothing: 0.15

The full configuration can be found at [fine-tuning config file](https://huggingface.co./lmqg/mt5-small-koquad/raw/main/trainer_config.json).

## Citation
```

@inproceedings{ushio-etal-2022-generative,
    title = "{G}enerative {L}anguage {M}odels for {P}aragraph-{L}evel {Q}uestion {G}eneration",
    author = "Ushio, Asahi  and
        Alva-Manchego, Fernando  and
        Camacho-Collados, Jose",
    booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, U.A.E.",
    publisher = "Association for Computational Linguistics",
}

```