|
--- |
|
language: |
|
- hi |
|
- en |
|
license: apache-2.0 |
|
base_model: teknium/OpenHermes-2.5 |
|
model-index: |
|
- name: open-aditi-hi-v2 |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: AI2 Reasoning Challenge (25-Shot) |
|
type: ai2_arc |
|
config: ARC-Challenge |
|
split: test |
|
args: |
|
num_few_shot: 25 |
|
metrics: |
|
- type: acc_norm |
|
value: 59.39 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=manishiitg/open-aditi-hi-v2 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: HellaSwag (10-Shot) |
|
type: hellaswag |
|
split: validation |
|
args: |
|
num_few_shot: 10 |
|
metrics: |
|
- type: acc_norm |
|
value: 82.01 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=manishiitg/open-aditi-hi-v2 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU (5-Shot) |
|
type: cais/mmlu |
|
config: all |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 61.41 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=manishiitg/open-aditi-hi-v2 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: TruthfulQA (0-shot) |
|
type: truthful_qa |
|
config: multiple_choice |
|
split: validation |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: mc2 |
|
value: 45.84 |
|
source: |
|
url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=manishiitg/open-aditi-hi-v2 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: Winogrande (5-shot) |
|
type: winogrande |
|
config: winogrande_xl |
|
split: validation |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 77.19 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=manishiitg/open-aditi-hi-v2 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GSM8k (5-shot) |
|
type: gsm8k |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 30.02 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=manishiitg/open-aditi-hi-v2 |
|
name: Open LLM Leaderboard |
|
--- |
|
|
|
|
|
Model trained on Hindi and English data. |
|
|
|
Try it out: https://colab.research.google.com/drive/1A_hbsq1vrCeAh3dEMvtwxxNxcNZ1BUyW?usp=sharing |
|
|
|
For sample responose on different prompts checkout: https://github.com/manishiitg/hi-llm-eval |
|
|
|
|
|
#### Language Hi |
|
|
|
| Model | implicit_hate | flores | indicwikibio | hellaswag-indic | truthfulqa-hi | boolq-hi | indicheadline | indic-arc-easy | indicqa | indic-arc-challenge | indicsentiment | xlsum-hi | indicxparaphrase | mmlu_hi | |
|
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | |
|
| open-aditi-hi-v2 | 11.5021 | 43.6822 | 0.4846 | 0.2404 | 0.6934 | 0.8541 | 0.4565 | 0.4979 | 0.0795 | 0.4462 | 0.9729 | 0.4213 | 0.6838 | 0.3253 | |
|
| OpenHermes-2.5-Mistral-7B | 0.2068 | 30.3465 | 0.3332 | 0.2485 | 0.3234 | 0.5979 | 0.1996 | 0.3523 | 0.2721 | 0.3396 | 0.9048 | 0.1774 | 0.8766 | 0.2769 | |
|
| open-aditi-hi-v1 | 8.6105 | 40.2376 | 0.4104 | 0.0848 | 0.4230 | 0.3758 | 0.4248 | 0.3889 | 0.1306 | 0.3558 | 0.8798 | 0.4212 | 0.5939 | 0.1398 | |
|
| Airavata | 0.0663 | 58.0555 | 0.0637 | 0.0254 | 0.2122 | 0.0373 | 0.4346 | 0.1128 | 0.1008 | 0.0836 | 0.8437 | 0.4650 | 0.3277 | 0.1336 | |
|
|
|
#### Language En |
|
|
|
| Model | boolq | hellaswag | mmlu | truthfulqa | xlsum | arc-easy-exact | arc-challenge | |
|
| --- | --- | --- | --- | --- | --- | --- | --- | |
|
| OpenHermes-2.5-Mistral-7B | 0.4061 | 0.7999 | 0.5991 | 0.2081 | 0.4328 | 0.8687 | 0.7790 | |
|
| open-aditi-hi-v2 | 0.3982 | 0.4738 | 0.5544 | 0.2999 | 0.4349 | 0.8388 | 0.7235 | |
|
| open-aditi-hi-v1 | 0.0434 | 0.3509 | 0.2597 | 0.3317 | 0.4288 | 0.7588 | 0.6271 | |
|
| Airavata | 0.0437 | 0.0277 | 0.1165 | 0.3586 | 0.4393 | 0.2534 | 0.1630 | |
|
|
|
Task: flores Metric: chrf |
|
|
|
Task: implicit_hate Metric: chrf |
|
|
|
Task: indicsentiment Metric: accuracy |
|
|
|
Task: indicxparaphrase Metric: accuracy |
|
|
|
Task: boolq-hi Metric: accuracy |
|
|
|
Task: truthfulqa-hi Metric: accuracy |
|
|
|
Task: indic-arc-easy Metric: accuracy |
|
|
|
Task: indicwikibio Metric: bleurt |
|
|
|
Task: xlsum-hi Metric: bleurt |
|
|
|
Task: indicheadline Metric: bleurt |
|
|
|
Task: indic-arc-challenge Metric: accuracy |
|
|
|
Task: mmlu_hi Metric: average_acc |
|
|
|
Task: indicqa Metric: accuracy |
|
|
|
Task: hellaswag-indic Metric: accuracy |
|
|
|
Task: arc-easy-exact Metric: accuracy |
|
|
|
Task: hellaswag Metric: accuracy |
|
|
|
Task: arc-challenge Metric: accuracy |
|
|
|
Task: mmlu Metric: average_acc |
|
|
|
Task: xlsum Metric: bleurt |
|
|
|
Task: boolq Metric: accuracy |
|
|
|
Task: truthfulqa Metric: accuracy |
|
|
|
|
|
|
|
|
|
Model evaluation on OpenLLM LeaderBoard |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/5dfae476da6d0311fd3d5432/ENzZwV2Z98uNlpyUz3Blp.png) |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/5dfae476da6d0311fd3d5432/SpSiu5lzA6JKJx8ICX_zd.png) |
|
|
|
|
|
|
|
|
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co./datasets/open-llm-leaderboard/details_manishiitg__open-aditi-hi-v2) |
|
|
|
| Metric |Value| |
|
|---------------------------------|----:| |
|
|Avg. |59.31| |
|
|AI2 Reasoning Challenge (25-Shot)|59.39| |
|
|HellaSwag (10-Shot) |82.01| |
|
|MMLU (5-Shot) |61.41| |
|
|TruthfulQA (0-shot) |45.84| |
|
|Winogrande (5-shot) |77.19| |
|
|GSM8k (5-shot) |30.02| |
|
|
|
|