alakxender
commited on
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,99 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- dv
|
4 |
+
---
|
5 |
+
|
6 |
+
# GPT 2 DV base
|
7 |
+
|
8 |
+
This is a GPT-2 model fine-tuned on Dhivehi language texts. The model was trained on a curated dataset of Dhivehi Wikipedia articles and can be used for text generation in the Dhivehi language.
|
9 |
+
|
10 |
+
## Model Description
|
11 |
+
|
12 |
+
- **Model Type:** GPT-2
|
13 |
+
- **Language:** Dhivehi (ދިވެހި)
|
14 |
+
- **Training Data:** Dhivehi Wikipedia articles
|
15 |
+
- **Last Updated:** 2024-11-25
|
16 |
+
- **License:** MIT
|
17 |
+
|
18 |
+
## Performance Metrics
|
19 |
+
|
20 |
+
|
21 |
+
Evaluation metrics on the test set:
|
22 |
+
- Average Perplexity: 3.80
|
23 |
+
- Perplexity Std: 2.23
|
24 |
+
- Best Perplexity: 2.72
|
25 |
+
|
26 |
+
## Usage Example
|
27 |
+
|
28 |
+
```python
|
29 |
+
from transformers import GPT2LMHeadModel, GPT2TokenizerFast
|
30 |
+
|
31 |
+
# Load model and tokenizer
|
32 |
+
model = GPT2LMHeadModel.from_pretrained("your-username/dhivehi-gpt2")
|
33 |
+
tokenizer = GPT2TokenizerFast.from_pretrained("your-username/dhivehi-gpt2")
|
34 |
+
|
35 |
+
# Prepare your prompt
|
36 |
+
prompt = "ދިވެހިރާއްޖެއަކީ"
|
37 |
+
inputs = tokenizer(prompt, return_tensors="pt")
|
38 |
+
|
39 |
+
# Generate text
|
40 |
+
outputs = model.generate(
|
41 |
+
**inputs,
|
42 |
+
max_length=200,
|
43 |
+
temperature=0.7,
|
44 |
+
top_p=0.9,
|
45 |
+
do_sample=True,
|
46 |
+
num_return_sequences=1
|
47 |
+
)
|
48 |
+
|
49 |
+
# Decode the generated text
|
50 |
+
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
51 |
+
print(generated_text)
|
52 |
+
```
|
53 |
+
|
54 |
+
## Training Details
|
55 |
+
|
56 |
+
The model was trained using the following configuration:
|
57 |
+
- Base model: GPT-2
|
58 |
+
- Training type: Full fine-tuning
|
59 |
+
- Hardware: NVIDIA A40 GPU
|
60 |
+
- Mixed precision: FP16
|
61 |
+
- Gradient checkpointing: Enabled
|
62 |
+
|
63 |
+
### Hyperparameters:
|
64 |
+
- Learning rate: 5e-5
|
65 |
+
- Batch size: 32
|
66 |
+
- Gradient accumulation steps: 2
|
67 |
+
- Epochs: 3
|
68 |
+
- Weight decay: 0.01
|
69 |
+
- Warmup steps: 1000
|
70 |
+
|
71 |
+
## Limitations
|
72 |
+
|
73 |
+
- Primary training data is from Wikipedia, which may not cover all Dhivehi language contexts
|
74 |
+
- May not perform well on specialized or technical content
|
75 |
+
- Could reflect biases present in the training data
|
76 |
+
- Not recommended for production use without thorough evaluation
|
77 |
+
|
78 |
+
## Intended Uses
|
79 |
+
|
80 |
+
This model is suitable for:
|
81 |
+
- Dhivehi text generation
|
82 |
+
- Research on Dhivehi NLP
|
83 |
+
- Educational purposes
|
84 |
+
- Experimental applications
|
85 |
+
|
86 |
+
Not intended for:
|
87 |
+
- Critical or production systems
|
88 |
+
- Decision-making applications
|
89 |
+
- Tasks requiring factual accuracy
|
90 |
+
|
91 |
+
## Citation
|
92 |
+
|
93 |
+
```bibtex
|
94 |
+
@misc{dhivehi-gpt2,
|
95 |
+
title = {Dhivehi GPT-2: A Language Model for Dhivehi Text Generation},
|
96 |
+
year = {2024},
|
97 |
+
publisher = {Hugging Face},
|
98 |
+
}
|
99 |
+
```
|