alakxender commited on
Commit
ec1f017
·
verified ·
1 Parent(s): b1e2ac9

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +99 -0
README.md ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - dv
4
+ ---
5
+
6
+ # GPT 2 DV base
7
+
8
+ This is a GPT-2 model fine-tuned on Dhivehi language texts. The model was trained on a curated dataset of Dhivehi Wikipedia articles and can be used for text generation in the Dhivehi language.
9
+
10
+ ## Model Description
11
+
12
+ - **Model Type:** GPT-2
13
+ - **Language:** Dhivehi (ދިވެހި)
14
+ - **Training Data:** Dhivehi Wikipedia articles
15
+ - **Last Updated:** 2024-11-25
16
+ - **License:** MIT
17
+
18
+ ## Performance Metrics
19
+
20
+
21
+ Evaluation metrics on the test set:
22
+ - Average Perplexity: 3.80
23
+ - Perplexity Std: 2.23
24
+ - Best Perplexity: 2.72
25
+
26
+ ## Usage Example
27
+
28
+ ```python
29
+ from transformers import GPT2LMHeadModel, GPT2TokenizerFast
30
+
31
+ # Load model and tokenizer
32
+ model = GPT2LMHeadModel.from_pretrained("your-username/dhivehi-gpt2")
33
+ tokenizer = GPT2TokenizerFast.from_pretrained("your-username/dhivehi-gpt2")
34
+
35
+ # Prepare your prompt
36
+ prompt = "ދިވެހިރާއްޖެއަކީ"
37
+ inputs = tokenizer(prompt, return_tensors="pt")
38
+
39
+ # Generate text
40
+ outputs = model.generate(
41
+ **inputs,
42
+ max_length=200,
43
+ temperature=0.7,
44
+ top_p=0.9,
45
+ do_sample=True,
46
+ num_return_sequences=1
47
+ )
48
+
49
+ # Decode the generated text
50
+ generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
51
+ print(generated_text)
52
+ ```
53
+
54
+ ## Training Details
55
+
56
+ The model was trained using the following configuration:
57
+ - Base model: GPT-2
58
+ - Training type: Full fine-tuning
59
+ - Hardware: NVIDIA A40 GPU
60
+ - Mixed precision: FP16
61
+ - Gradient checkpointing: Enabled
62
+
63
+ ### Hyperparameters:
64
+ - Learning rate: 5e-5
65
+ - Batch size: 32
66
+ - Gradient accumulation steps: 2
67
+ - Epochs: 3
68
+ - Weight decay: 0.01
69
+ - Warmup steps: 1000
70
+
71
+ ## Limitations
72
+
73
+ - Primary training data is from Wikipedia, which may not cover all Dhivehi language contexts
74
+ - May not perform well on specialized or technical content
75
+ - Could reflect biases present in the training data
76
+ - Not recommended for production use without thorough evaluation
77
+
78
+ ## Intended Uses
79
+
80
+ This model is suitable for:
81
+ - Dhivehi text generation
82
+ - Research on Dhivehi NLP
83
+ - Educational purposes
84
+ - Experimental applications
85
+
86
+ Not intended for:
87
+ - Critical or production systems
88
+ - Decision-making applications
89
+ - Tasks requiring factual accuracy
90
+
91
+ ## Citation
92
+
93
+ ```bibtex
94
+ @misc{dhivehi-gpt2,
95
+ title = {Dhivehi GPT-2: A Language Model for Dhivehi Text Generation},
96
+ year = {2024},
97
+ publisher = {Hugging Face},
98
+ }
99
+ ```