alakxender
/

dhivehi-gpt2-base

Model card Files Files and versions Community

alakxender commited on Nov 24, 2024

Commit

ec1f017

·

verified ·

1 Parent(s): b1e2ac9

Create README.md

Files changed (1) hide show

README.md +99 -0

README.md ADDED Viewed

	@@ -0,0 +1,99 @@

+---
+language:
+- dv
+---
+# GPT 2 DV base
+This is a GPT-2 model fine-tuned on Dhivehi language texts. The model was trained on a curated dataset of Dhivehi Wikipedia articles and can be used for text generation in the Dhivehi language.
+## Model Description
+- **Model Type:** GPT-2
+- **Language:** Dhivehi (ދިވެހި)
+- **Training Data:** Dhivehi Wikipedia articles
+- **Last Updated:** 2024-11-25
+- **License:** MIT
+## Performance Metrics
+Evaluation metrics on the test set:
+- Average Perplexity: 3.80
+- Perplexity Std: 2.23
+- Best Perplexity: 2.72
+## Usage Example
+```python
+from transformers import GPT2LMHeadModel, GPT2TokenizerFast
+# Load model and tokenizer
+model = GPT2LMHeadModel.from_pretrained("your-username/dhivehi-gpt2")
+tokenizer = GPT2TokenizerFast.from_pretrained("your-username/dhivehi-gpt2")
+# Prepare your prompt
+prompt = "ދިވެހިރާއްޖެއަކީ"
+inputs = tokenizer(prompt, return_tensors="pt")
+# Generate text
+outputs = model.generate(
+    **inputs,
+    max_length=200,
+    temperature=0.7,
+    top_p=0.9,
+    do_sample=True,
+    num_return_sequences=1
+)
+# Decode the generated text
+generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(generated_text)
+```
+## Training Details
+The model was trained using the following configuration:
+- Base model: GPT-2
+- Training type: Full fine-tuning
+- Hardware: NVIDIA A40 GPU
+- Mixed precision: FP16
+- Gradient checkpointing: Enabled
+### Hyperparameters:
+- Learning rate: 5e-5
+- Batch size: 32
+- Gradient accumulation steps: 2
+- Epochs: 3
+- Weight decay: 0.01
+- Warmup steps: 1000
+## Limitations
+- Primary training data is from Wikipedia, which may not cover all Dhivehi language contexts
+- May not perform well on specialized or technical content
+- Could reflect biases present in the training data
+- Not recommended for production use without thorough evaluation
+## Intended Uses
+This model is suitable for:
+- Dhivehi text generation
+- Research on Dhivehi NLP
+- Educational purposes
+- Experimental applications
+Not intended for:
+- Critical or production systems
+- Decision-making applications
+- Tasks requiring factual accuracy
+## Citation
+```bibtex
+@misc{dhivehi-gpt2,
+  title = {Dhivehi GPT-2: A Language Model for Dhivehi Text Generation},
+  year = {2024},
+  publisher = {Hugging Face},
+}
+```