maintain README.md
Browse files
README.md
CHANGED
@@ -6,26 +6,24 @@ metrics:
|
|
6 |
- perplexity
|
7 |
library_name: transformers
|
8 |
pipeline_tag: text-generation
|
9 |
-
datasets:
|
10 |
-
- Sakonii/nepalitext-language-model-dataset
|
11 |
---
|
12 |
|
13 |
-
# NepaliGPT:Nepali Language Generative Pretrained Transformer Model
|
14 |
This is an experiment for developing a language generation model for the Nepali language.
|
15 |
Causal Language Model which can predict the next possible tokens given a context in Nepali language.
|
16 |
|
17 |
# Dataset Used
|
18 |
-
A large corpus of 9.3 GB size has been collected from different sources
|
19 |
-
- Nepali Books found online
|
20 |
- Nepali News Article from Nepali news portals.
|
21 |
-
- Nepali text collected from different open
|
22 |
|
23 |
# Hyperparameters Used
|
24 |
-
Learning rate -> 2e-5
|
25 |
-
Weight Decay -> 0.01
|
26 |
-
Number of training epochs -> 5
|
27 |
-
bf16 -> True
|
28 |
-
Base Model Architecture ->
|
29 |
|
30 |
## Training Results
|
31 |
|
@@ -33,5 +31,4 @@ It achieves the following results on the evaluation set:
|
|
33 |
|
34 |
| Training Loss | Validation Loss | Perplexity
|
35 |
|:-------------:|:---------------:|:----------:|
|
36 |
-
| 3.3968 | 3.2705 | 26.3245
|
37 |
-
|
|
|
6 |
- perplexity
|
7 |
library_name: transformers
|
8 |
pipeline_tag: text-generation
|
|
|
|
|
9 |
---
|
10 |
|
11 |
+
# NepaliGPT: Nepali Language Generative Pretrained Transformer Model
|
12 |
This is an experiment for developing a language generation model for the Nepali language.
|
13 |
Causal Language Model which can predict the next possible tokens given a context in Nepali language.
|
14 |
|
15 |
# Dataset Used
|
16 |
+
A large corpus of 9.3 GB size has been collected from different sources on the internet. The sources include
|
17 |
+
- Nepali Books found online.
|
18 |
- Nepali News Article from Nepali news portals.
|
19 |
+
- Nepali text collected from different open source Nepali NLP datasets.
|
20 |
|
21 |
# Hyperparameters Used
|
22 |
+
Learning rate -> 2e-5 \
|
23 |
+
Weight Decay -> 0.01 \
|
24 |
+
Number of training epochs -> 5 \
|
25 |
+
bf16 -> True \
|
26 |
+
Base Model Architecture -> GPT-2 \
|
27 |
|
28 |
## Training Results
|
29 |
|
|
|
31 |
|
32 |
| Training Loss | Validation Loss | Perplexity
|
33 |
|:-------------:|:---------------:|:----------:|
|
34 |
+
| 3.3968 | 3.2705 | 26.3245
|
|