ToxicHermes
OpenHermes-2.5 model + toxic-dpo Dataset = ToxicHermes
fine-tuned with Direct Preference Optimization (DPO)
- Base Model: teknium/OpenHermes-2.5-Mistral-7B
- Dataset: unalignment/toxic-dpo-v0.1
Usage
You can also run this model using the following code:
import transformers
from transformers import AutoTokenizer
model = "joey00072/ToxicHermes-2.5-Mistral-7B"
# Format prompt
message = [
{"role": "system", "content": "You are a helpful assistant chatbot."},
{"role": "user", "content": "What is a Large Language Model?"}
]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)
# Create pipeline
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer
)
# Generate text
sequences = pipeline(
prompt,
do_sample=True,
temperature=0.7,
top_p=0.9,
num_return_sequences=1,
max_length=200,
)
print(sequences[0]['generated_text'])
Training hyperparameters
LoRA:
- r=16
- lora_alpha=16
- lora_dropout=0.05
- bias="none"
- task_type="CAUSAL_LM"
- target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
Training arguments:
- per_device_train_batch_size=4
- gradient_accumulation_steps=4
- gradient_checkpointing=True
- learning_rate=5e-5
- lr_scheduler_type="cosine"
- max_steps=200
- optim="paged_adamw_32bit"
- warmup_steps=100
DPOTrainer:
- beta=0.1
- max_prompt_length=1024
- max_length=1536
- Downloads last month
- 19
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for joey00072/ToxicHermes-2.5-Mistral-7B
Base model
mistralai/Mistral-7B-v0.1
Finetuned
teknium/OpenHermes-2.5-Mistral-7B