Adding Evaluation Results

8911fdc verified 7 months ago

6.09 kB

	---
	license: mit
	datasets:
	- Intel/orca_dpo_pairs
	model-index:
	- name: SuperAligned-Jawade
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 71.59
	name: normalized accuracy
	source:
	url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=bhavinjawade/SuperAligned-Jawade
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 90.58
	name: normalized accuracy
	source:
	url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=bhavinjawade/SuperAligned-Jawade
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 60.81
	name: accuracy
	source:
	url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=bhavinjawade/SuperAligned-Jawade
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 69.17
	source:
	url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=bhavinjawade/SuperAligned-Jawade
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 83.82
	name: accuracy
	source:
	url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=bhavinjawade/SuperAligned-Jawade
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 49.2
	name: accuracy
	source:
	url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=bhavinjawade/SuperAligned-Jawade
	name: Open LLM Leaderboard
	---

	## SOLAR-10B-OrcaDPO-Jawade

	### Overview
	This model card is instruction finetuned version of `upstage/SOLAR-10.7B-Instruct-v1.0` model. Trained on the Intel DPO Orca dataset using LoRA. Though it should be noted SOLAR-10.7B paper states that the
	original model for alignment was trained on Intel ORCA DPO pairs. Retraining using DPO and LoRA shows slight (<1%) improvement on OpenLLM Leaderboard benchmarks against `SOLAR 10.7B-Instruct` and significant over `SOLAR 10.7B`

	![model_card_image](SOLAR_ORCA.png)

	## How to Use This Model

	To use the model `bhavinjawade/SOLAR-10B-OrcaDPO-Jawade`, follow these steps:

	1. Import and Load the Model and Tokenizer
	Begin by importing the model and tokenizer. Load them using the `from_pretrained` method.

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	model = AutoModelForCausalLM.from_pretrained("bhavinjawade/SOLAR-10B-OrcaDPO-Jawade")
	tokenizer = AutoTokenizer.from_pretrained("bhavinjawade/SOLAR-10B-OrcaDPO-Jawade")
	```

	2. Format the Prompt
	Format the chat input as a list of messages, each with a role ('system' or 'user') and content.

	```python
	message = [
	{"role": "system", "content": "You are a helpful assistant chatbot."},
	{"role": "user", "content": "Is the universe real? or is it a simulation? whats your opinion?"}
	]
	prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)
	```

	3. Create a Pipeline
	Set up a pipeline for text generation with the loaded model and tokenizer.

	```python
	pipeline = transformers.pipeline(
	"text-generation",
	model=model,
	tokenizer=tokenizer
	)
	```

	4. Generate Text
	Use the pipeline to generate a sequence of text based on the prompt. You can adjust parameters like temperature and top_p for different styles of responses.

	```python
	sequences = pipeline(
	prompt,
	do_sample=True,
	temperature=0.7,
	top_p=0.9,
	num_return_sequences=1,
	max_length=200,
	)
	print(sequences[0]['generated_text'])
	```

	This setup allows you to utilize the capabilities of the bhavinjawade/SOLAR-10B-OrcaDPO-Jawade model for generating responses to chat inputs.

	### License
	- Type: MIT License
	- Details: This license permits reuse, modification, and distribution for both private and commercial purposes under the terms of the MIT License.

	### Model Details
	- Model Name: SOLAR-10.7B-Instruct-v1.0
	- Organization: Upstage
	- Training Dataset: Intel/orca_dpo_pairs
	- Technique Used: LoRA (Low-Rank Adaptation)

	### Contact Information
	- https://bhavinjawade.github.io
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co./datasets/open-llm-leaderboard/details_bhavinjawade__SuperAligned-Jawade)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|70.86\|
	\|AI2 Reasoning Challenge (25-Shot)\|71.59\|
	\|HellaSwag (10-Shot) \|90.58\|
	\|MMLU (5-Shot) \|60.81\|
	\|TruthfulQA (0-shot) \|69.17\|
	\|Winogrande (5-shot) \|83.82\|
	\|GSM8k (5-shot) \|49.20\|