open-aditi-hi-v2 / README.md

Adding Evaluation Results (#1)

e55fc6e verified 8 months ago

6.09 kB

	---
	language:
	- hi
	- en
	license: apache-2.0
	base_model: teknium/OpenHermes-2.5
	model-index:
	- name: open-aditi-hi-v2
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 59.39
	name: normalized accuracy
	source:
	url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=manishiitg/open-aditi-hi-v2
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 82.01
	name: normalized accuracy
	source:
	url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=manishiitg/open-aditi-hi-v2
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 61.41
	name: accuracy
	source:
	url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=manishiitg/open-aditi-hi-v2
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 45.84
	source:
	url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=manishiitg/open-aditi-hi-v2
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 77.19
	name: accuracy
	source:
	url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=manishiitg/open-aditi-hi-v2
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 30.02
	name: accuracy
	source:
	url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=manishiitg/open-aditi-hi-v2
	name: Open LLM Leaderboard
	---


	Model trained on Hindi and English data.

	Try it out: https://colab.research.google.com/drive/1A_hbsq1vrCeAh3dEMvtwxxNxcNZ1BUyW?usp=sharing

	For sample responose on different prompts checkout: https://github.com/manishiitg/hi-llm-eval


	#### Language Hi

	\| Model \| implicit_hate \| flores \| indicwikibio \| hellaswag-indic \| truthfulqa-hi \| boolq-hi \| indicheadline \| indic-arc-easy \| indicqa \| indic-arc-challenge \| indicsentiment \| xlsum-hi \| indicxparaphrase \| mmlu_hi \|
	\| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \|
	\| open-aditi-hi-v2 \| 11.5021 \| 43.6822 \| 0.4846 \| 0.2404 \| 0.6934 \| 0.8541 \| 0.4565 \| 0.4979 \| 0.0795 \| 0.4462 \| 0.9729 \| 0.4213 \| 0.6838 \| 0.3253 \|
	\| OpenHermes-2.5-Mistral-7B \| 0.2068 \| 30.3465 \| 0.3332 \| 0.2485 \| 0.3234 \| 0.5979 \| 0.1996 \| 0.3523 \| 0.2721 \| 0.3396 \| 0.9048 \| 0.1774 \| 0.8766 \| 0.2769 \|
	\| open-aditi-hi-v1 \| 8.6105 \| 40.2376 \| 0.4104 \| 0.0848 \| 0.4230 \| 0.3758 \| 0.4248 \| 0.3889 \| 0.1306 \| 0.3558 \| 0.8798 \| 0.4212 \| 0.5939 \| 0.1398 \|
	\| Airavata \| 0.0663 \| 58.0555 \| 0.0637 \| 0.0254 \| 0.2122 \| 0.0373 \| 0.4346 \| 0.1128 \| 0.1008 \| 0.0836 \| 0.8437 \| 0.4650 \| 0.3277 \| 0.1336 \|

	#### Language En

	\| Model \| boolq \| hellaswag \| mmlu \| truthfulqa \| xlsum \| arc-easy-exact \| arc-challenge \|
	\| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \|
	\| OpenHermes-2.5-Mistral-7B \| 0.4061 \| 0.7999 \| 0.5991 \| 0.2081 \| 0.4328 \| 0.8687 \| 0.7790 \|
	\| open-aditi-hi-v2 \| 0.3982 \| 0.4738 \| 0.5544 \| 0.2999 \| 0.4349 \| 0.8388 \| 0.7235 \|
	\| open-aditi-hi-v1 \| 0.0434 \| 0.3509 \| 0.2597 \| 0.3317 \| 0.4288 \| 0.7588 \| 0.6271 \|
	\| Airavata \| 0.0437 \| 0.0277 \| 0.1165 \| 0.3586 \| 0.4393 \| 0.2534 \| 0.1630 \|

	Task: flores Metric: chrf

	Task: implicit_hate Metric: chrf

	Task: indicsentiment Metric: accuracy

	Task: indicxparaphrase Metric: accuracy

	Task: boolq-hi Metric: accuracy

	Task: truthfulqa-hi Metric: accuracy

	Task: indic-arc-easy Metric: accuracy

	Task: indicwikibio Metric: bleurt

	Task: xlsum-hi Metric: bleurt

	Task: indicheadline Metric: bleurt

	Task: indic-arc-challenge Metric: accuracy

	Task: mmlu_hi Metric: average_acc

	Task: indicqa Metric: accuracy

	Task: hellaswag-indic Metric: accuracy

	Task: arc-easy-exact Metric: accuracy

	Task: hellaswag Metric: accuracy

	Task: arc-challenge Metric: accuracy

	Task: mmlu Metric: average_acc

	Task: xlsum Metric: bleurt

	Task: boolq Metric: accuracy

	Task: truthfulqa Metric: accuracy




	Model evaluation on OpenLLM LeaderBoard

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/5dfae476da6d0311fd3d5432/ENzZwV2Z98uNlpyUz3Blp.png)

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/5dfae476da6d0311fd3d5432/SpSiu5lzA6JKJx8ICX_zd.png)




	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co./datasets/open-llm-leaderboard/details_manishiitg__open-aditi-hi-v2)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|59.31\|
	\|AI2 Reasoning Challenge (25-Shot)\|59.39\|
	\|HellaSwag (10-Shot) \|82.01\|
	\|MMLU (5-Shot) \|61.41\|
	\|TruthfulQA (0-shot) \|45.84\|
	\|Winogrande (5-shot) \|77.19\|
	\|GSM8k (5-shot) \|30.02\|