Update README.md
Browse files
README.md
CHANGED
@@ -35,7 +35,7 @@ A 12-layer, 768-hidden-size transformer-based language model.
|
|
35 |
# Training
|
36 |
The model was trained on Vietnamese Oscar dataset (32 GB) to optimize a traditional language modelling objective on v3-8 TPU for around 6 days. It reaches around 13.4 perplexity on a chosen validation set from Oscar.
|
37 |
|
38 |
-
### GPT-2
|
39 |
|
40 |
The following example fine-tunes GPT-2 on WikiText-2. We're using the raw WikiText-2 (no tokens were replaced before
|
41 |
the tokenization). The loss here is that of causal language modeling.
|
|
|
35 |
# Training
|
36 |
The model was trained on Vietnamese Oscar dataset (32 GB) to optimize a traditional language modelling objective on v3-8 TPU for around 6 days. It reaches around 13.4 perplexity on a chosen validation set from Oscar.
|
37 |
|
38 |
+
### GPT-2 Finetuning
|
39 |
|
40 |
The following example fine-tunes GPT-2 on WikiText-2. We're using the raw WikiText-2 (no tokens were replaced before
|
41 |
the tokenization). The loss here is that of causal language modeling.
|