README.md · delayedkarma/NeuralHermes-2.5-Mistral-7B at 44177ada255b456e97fd9ab246c7dda3869950c3

NeuralHermes-2.5-Mistral-7B / README.md

delayedkarma

Update README.md

44177ad verified 8 months ago

preview code

raw

history blame

2.07 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- mistral
	- instruct
	- finetune
	- chatml
	- gpt4
	- synthetic data
	- distillation
	- dpo
	- rlhf
	datasets:
	- Intel/orca_dpo_pairs
	base_model: teknium/OpenHermes-2.5-Mistral-7B
	---
	### Credits: Maxime Labonne https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac

	(With minor alterations)

	# NeuralHermes 2.5 - Mistral 7B

	NeuralHermes is based on the [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co./teknium/OpenHermes-2.5-Mistral-7B) model that has been further fine-tuned with Direct Preference Optimization (DPO) using the [Intel/orca_dpo_pairs](https://huggingface.co./datasets/Intel/orca_dpo_pairs) dataset. .


	## Usage

	You can run this model using the following code:

	```python
	import transformers
	from transformers import AutoTokenizer

	# Format prompt
	message = [
	{"role": "system", "content": "You are a helpful assistant chatbot."},
	{"role": "user", "content": "What is a Large Language Model?"}
	]
	tokenizer = AutoTokenizer.from_pretrained(new_model)
	prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)

	# Create pipeline
	pipeline = transformers.pipeline(
	"text-generation",
	model=new_model,
	tokenizer=tokenizer
	)

	# Generate text
	sequences = pipeline(
	prompt,
	do_sample=True,
	temperature=0.7,
	top_p=0.9,
	num_return_sequences=1,
	max_length=200,
	)
	print(sequences[0]['generated_text'])
	```

	## Training hyperparameters

	LoRA:
	* r=16
	* lora_alpha=16
	* lora_dropout=0.05
	* bias="none"
	* task_type="CAUSAL_LM"
	* target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']

	Training arguments:
	* per_device_train_batch_size=2 # Changed from 4
	* gradient_accumulation_steps=4
	* gradient_checkpointing=True
	* learning_rate=2e-5 # Changed from 5e-5
	* lr_scheduler_type="cosine"
	* max_steps=250 # Changed from 200
	* optim="paged_adamw_32bit"
	* warmup_steps=100

	DPOTrainer:
	* beta=0.1
	* max_prompt_length=1024
	* max_length=1536