DeepSeek-R1-Distill-Llama-8B-ENK-Aligned
Overview
DeepSeek-R1-Distill-Llama-8B-ENK-Aligned is a safety-aligned version of deepseek-ai/DeepSeek-R1-Distill-Llama-8B
. It has been aligned using the Enkrypt AI Safety Alignment dataset, which was generated with the SAGE process:
SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming
Anurakt Kumar, Divyanshu Kumar, Jatan Loya, Nitin Aravind Birur, Tanay Baswa, Sahil Agarwal, Prashanth Harshangi (2024)
[arXiv:2408.11851]
This alignment significantly reduces toxicity, harmfulness, and jailbreak vulnerabilities across various safety topics while maintaining model performance.
Red Team Results
Performance Results
Model | MMLU-Pro Score |
---|---|
DeepSeek-R1-Distill-Llama-8B (Base) | 44.71 |
DeepSeek-R1-Distill-Llama-8B-ENK-Aligned | 46.43 |
Training Configuration
The model was trained using the SimPO (Simple Preference Optimization) approach with the following hyperparameters:
cpo_config:
loss_type: 'simpo'
max_prompt_length: 1800
max_length: 3600
per_device_train_batch_size: 8
gradient_accumulation_steps: 1
learning_rate: 1.8e-6
optim: 'adamw_torch'
lr_scheduler_type: 'cosine'
gradient_checkpointing: True
beta: 5
num_train_epochs: 1
bf16: False
simpo_gamma: 0.8
warmup_ratio: 0.1
cpo_alpha: 0.0
Key Improvements
- Enhanced Safety: Significant reduction in harmful or toxic outputs.
- Improved Robustness: Stronger resistance to adversarial jailbreak prompts.
- Minimal Performance Tradeoff: Slight improvement in MMLU-Pro despite additional alignment constraints.
Use Cases
This model is ideal for applications requiring safe, aligned, and high-performance language generation, including:
- Conversational AI: Ensuring responsible and aligned assistant behavior.
- Content Moderation: Filtering harmful content while maintaining contextual understanding.
- Education & Research: Deploying AI in sensitive environments with reduced risks.
For questions or contributions, reach out to the Enkrypt AI team!
- Downloads last month
- 33
Model tree for enkryptai/DeepSeek-R1-Distill-Llama-8B-Enkrypt-Aligned
Base model
deepseek-ai/DeepSeek-R1-Distill-Llama-8B