DeepSeek-R1-Distill-Llama-8B-ENK-Aligned

Overview

DeepSeek-R1-Distill-Llama-8B-ENK-Aligned is a safety-aligned version of deepseek-ai/DeepSeek-R1-Distill-Llama-8B. It has been aligned using the Enkrypt AI Safety Alignment dataset, which was generated with the SAGE process:

SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming
Anurakt Kumar, Divyanshu Kumar, Jatan Loya, Nitin Aravind Birur, Tanay Baswa, Sahil Agarwal, Prashanth Harshangi (2024)
[arXiv:2408.11851]

This alignment significantly reduces toxicity, harmfulness, and jailbreak vulnerabilities across various safety topics while maintaining model performance.

Red Team Results

Safety Comparison

Performance Results

Model MMLU-Pro Score
DeepSeek-R1-Distill-Llama-8B (Base) 44.71
DeepSeek-R1-Distill-Llama-8B-ENK-Aligned 46.43

Training Configuration

The model was trained using the SimPO (Simple Preference Optimization) approach with the following hyperparameters:

cpo_config:
  loss_type: 'simpo'
  max_prompt_length: 1800
  max_length: 3600
  per_device_train_batch_size: 8
  gradient_accumulation_steps: 1
  learning_rate: 1.8e-6
  optim: 'adamw_torch'
  lr_scheduler_type: 'cosine'
  gradient_checkpointing: True
  beta: 5
  num_train_epochs: 1
  bf16: False
  simpo_gamma: 0.8
  warmup_ratio: 0.1
  cpo_alpha: 0.0

Key Improvements

  • Enhanced Safety: Significant reduction in harmful or toxic outputs.
  • Improved Robustness: Stronger resistance to adversarial jailbreak prompts.
  • Minimal Performance Tradeoff: Slight improvement in MMLU-Pro despite additional alignment constraints.

Use Cases

This model is ideal for applications requiring safe, aligned, and high-performance language generation, including:

  • Conversational AI: Ensuring responsible and aligned assistant behavior.
  • Content Moderation: Filtering harmful content while maintaining contextual understanding.
  • Education & Research: Deploying AI in sensitive environments with reduced risks.

For questions or contributions, reach out to the Enkrypt AI team!

Downloads last month
33
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for enkryptai/DeepSeek-R1-Distill-Llama-8B-Enkrypt-Aligned

Finetuned
(26)
this model