File size: 2,347 Bytes
ec1f017 afe0cbc ec1f017 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
---
language:
- dv
base_model:
- openai-community/gpt2
---
# GPT 2 DV base
This is a GPT-2 model fine-tuned on Dhivehi language texts. The model was trained on a curated dataset of Dhivehi Wikipedia articles and can be used for text generation in the Dhivehi language.
## Model Description
- **Model Type:** GPT-2
- **Language:** Dhivehi (ދިވެހި)
- **Training Data:** Dhivehi Wikipedia articles
- **Last Updated:** 2024-11-25
- **License:** MIT
## Performance Metrics
Evaluation metrics on the test set:
- Average Perplexity: 3.80
- Perplexity Std: 2.23
- Best Perplexity: 2.72
## Usage Example
```python
from transformers import GPT2LMHeadModel, GPT2TokenizerFast
# Load model and tokenizer
model = GPT2LMHeadModel.from_pretrained("your-username/dhivehi-gpt2")
tokenizer = GPT2TokenizerFast.from_pretrained("your-username/dhivehi-gpt2")
# Prepare your prompt
prompt = "ދިވެހިރާއްޖެއަކީ"
inputs = tokenizer(prompt, return_tensors="pt")
# Generate text
outputs = model.generate(
**inputs,
max_length=200,
temperature=0.7,
top_p=0.9,
do_sample=True,
num_return_sequences=1
)
# Decode the generated text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
```
## Training Details
The model was trained using the following configuration:
- Base model: GPT-2
- Training type: Full fine-tuning
- Hardware: NVIDIA A40 GPU
- Mixed precision: FP16
- Gradient checkpointing: Enabled
### Hyperparameters:
- Learning rate: 5e-5
- Batch size: 32
- Gradient accumulation steps: 2
- Epochs: 3
- Weight decay: 0.01
- Warmup steps: 1000
## Limitations
- Primary training data is from Wikipedia, which may not cover all Dhivehi language contexts
- May not perform well on specialized or technical content
- Could reflect biases present in the training data
- Not recommended for production use without thorough evaluation
## Intended Uses
This model is suitable for:
- Dhivehi text generation
- Research on Dhivehi NLP
- Educational purposes
- Experimental applications
Not intended for:
- Critical or production systems
- Decision-making applications
- Tasks requiring factual accuracy
## Citation
```bibtex
@misc{dhivehi-gpt2,
title = {Dhivehi GPT-2: A Language Model for Dhivehi Text Generation},
year = {2024},
publisher = {Hugging Face},
}
``` |