Update README.md
Browse files
README.md
CHANGED
@@ -34,3 +34,22 @@ A 12-layer, 768-hidden-size transformer-based language model.
|
|
34 |
|
35 |
# Training
|
36 |
The model was trained on Vietnamese Oscar dataset (32 GB) to optimize a traditional language modelling objective on v3-8 TPU for around 6 days. It reaches around 13.4 perplexity on a chosen validation set from Oscar.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
|
35 |
# Training
|
36 |
The model was trained on Vietnamese Oscar dataset (32 GB) to optimize a traditional language modelling objective on v3-8 TPU for around 6 days. It reaches around 13.4 perplexity on a chosen validation set from Oscar.
|
37 |
+
|
38 |
+
### GPT-2 Fineturning
|
39 |
+
|
40 |
+
The following example fine-tunes GPT-2 on WikiText-2. We're using the raw WikiText-2 (no tokens were replaced before
|
41 |
+
the tokenization). The loss here is that of causal language modeling.
|
42 |
+
|
43 |
+
The script [here](https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm.py) .
|
44 |
+
|
45 |
+
```bash
|
46 |
+
python run_clm.py \
|
47 |
+
--model_name_or_path NlpHUST/gpt2-vietnamese \
|
48 |
+
--dataset_name wikitext \
|
49 |
+
--dataset_config_name wikitext-2-raw-v1 \
|
50 |
+
--per_device_train_batch_size 8 \
|
51 |
+
--per_device_eval_batch_size 8 \
|
52 |
+
--do_train \
|
53 |
+
--do_eval \
|
54 |
+
--output_dir /tmp/test-clm
|
55 |
+
```
|