monsterapi
/

llama2_70B_dolly15k

databricks-dolly-15k

Model card Files Files and versions Community

llama2_70B_dolly15k / README.md

Zangs3011's picture

Update README.md

9a7edbf about 1 year ago

|

2.51 kB

	---
	library_name: peft
	tags:
	- meta-llama
	- code
	- instruct
	- databricks-dolly-15k
	- Llama-2-70b-hf
	datasets:
	- databricks/databricks-dolly-15k
	base_model: meta-llama/Llama-2-70b-hf
	---

	For our finetuning process, we utilized the meta-llama/Llama-2-70b-hf model and the Databricks-dolly-15k dataset.

	This dataset, a meticulous compilation of over 15,000 records, was a result of the dedicated work of thousands of Databricks professionals. It was specifically designed to further improve the interactive capabilities of ChatGPT-like systems.
	The dataset contributors crafted prompt / response pairs across eight distinct instruction categories. Besides the seven categories mentioned in the InstructGPT paper, they also ventured into an open-ended, free-form category. The contributors, emphasizing genuine and original content, refrained from sourcing information online, except in special cases where Wikipedia was the source for certain instruction categories. There was also a strict directive against the use of generative AI for crafting instructions or responses.
	The contributors could address questions from their peers. Rephrasing the original question was encouraged, and there was a clear preference to answer only those queries they were certain about.
	In some categories, the data comes with reference texts sourced from Wikipedia. Users might find bracketed Wikipedia citation numbers (like [42]) within the context field of the dataset. For smoother downstream applications, it's advisable to exclude these.

	Our finetuning leveraged the [MonsterAPI](https://monsterapi.ai)'s intuitive, no-code [LLM finetuner](https://docs.monsterapi.ai/fine-tune-a-large-language-model-llm).

	This efficient process, surprisingly cost-effective,
	was completed in just 17.5 hours for 3 epochs, running on an A100 80GB GPU.
	Breaking it down further, each epoch took only 5.8 hours and cost a mere `$19.25`. The total cost for all 3 epochs came to `$57.75`.

	#### Hyperparameters & Run details:
	- Epochs: 3
	- Cost per epoch: $19.25
	- Total finetuning Cost: $57.75
	- Model Path: meta-llama/Llama-2-70b-hf
	- Dataset: databricks/databricks-dolly-15k
	- Learning rate: 0.0002
	- Number of epochs: 3
	- Data split: Training 90% / Validation 10%
	- Gradient accumulation steps: 4

	license: apache-2.0
	---

	######

	Prompt Used:

	```
	### INSTRUCTION:
	[instruction]

	[context]

	### RESPONSE:
	[response]
	```

	Loss metrics

	Training loss (Blue) Validation Loss (orange):
	![training loss](train-loss.png "Training loss")