alakxender
/

dhivehi-gpt2-base

Model card Files Files and versions Community

dhivehi-gpt2-base / README.md

alakxender's picture

Update README.md

c229d39 verified 30 days ago

|

history blame contribute delete

2.17 kB

	---
	language:
	- dv
	base_model:
	- openai-community/gpt2
	datasets:
	- wikimedia/wikipedia
	---

	# GPT 2 DV base

	This is a GPT-2 model fine-tuned on Dhivehi language texts. The model was trained on a curated dataset of Dhivehi Wikipedia articles and can be used for text generation in the Dhivehi language.

	## Model Description

	- Model Type: GPT-2
	- Language: Dhivehi (ދިވެހި)
	- Training Data: Dhivehi Wikipedia articles
	- Last Updated: 2024-11-25

	## Performance Metrics


	Evaluation metrics on the test set:
	- Average Perplexity: 3.80
	- Perplexity Std: 2.23
	- Best Perplexity: 2.72

	## Usage Example

	```python
	from transformers import GPT2LMHeadModel, GPT2TokenizerFast

	# Load model and tokenizer
	model = GPT2LMHeadModel.from_pretrained("alakxender/dhivehi-gpt2-base")
	tokenizer = GPT2TokenizerFast.from_pretrained("alakxender/dhivehi-gpt2-base")

	# Prepare your prompt
	prompt = "ދިވެހިރާއްޖެއަކީ"
	inputs = tokenizer(prompt, return_tensors="pt")

	# Generate text
	outputs = model.generate(
	**inputs,
	max_length=200,
	temperature=0.7,
	top_p=0.9,
	do_sample=True,
	num_return_sequences=1
	)

	# Decode the generated text
	generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(generated_text)
	```

	## Training Details

	The model was trained using the following configuration:
	- Base model: GPT-2
	- Training type: Full fine-tuning
	- Mixed precision: FP16
	- Gradient checkpointing: Enabled

	### Hyperparameters:
	- Learning rate: 5e-5
	- Batch size: 32
	- Gradient accumulation steps: 2
	- Epochs: 3
	- Weight decay: 0.01
	- Warmup steps: 1000

	## Limitations

	- Primary training data is from Wikipedia, which may not cover all Dhivehi language contexts
	- May not perform well on specialized or technical content
	- Could reflect biases present in the training data
	- Not recommended for production use without thorough evaluation

	## Intended Uses

	This model is suitable for:
	- Dhivehi text generation
	- Research on Dhivehi NLP
	- Educational purposes
	- Experimental applications

	Not intended for:
	- Critical or production systems
	- Decision-making applications
	- Tasks requiring factual accuracy