|
--- |
|
license: apache-2.0 |
|
tags: |
|
- merge |
|
- mergekit |
|
- lazymergekit |
|
- openchat/openchat-3.5-0106 |
|
- teknium/OpenHermes-2.5-Mistral-7B |
|
base_model: |
|
- openchat/openchat-3.5-0106 |
|
- teknium/OpenHermes-2.5-Mistral-7B |
|
model-index: |
|
- name: chatty-djinn-14B |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: AI2 Reasoning Challenge (25-Shot) |
|
type: ai2_arc |
|
config: ARC-Challenge |
|
split: test |
|
args: |
|
num_few_shot: 25 |
|
metrics: |
|
- type: acc_norm |
|
value: 70.39 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=mayacinka/chatty-djinn-14B |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: HellaSwag (10-Shot) |
|
type: hellaswag |
|
split: validation |
|
args: |
|
num_few_shot: 10 |
|
metrics: |
|
- type: acc_norm |
|
value: 86.45 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=mayacinka/chatty-djinn-14B |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU (5-Shot) |
|
type: cais/mmlu |
|
config: all |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 64.4 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=mayacinka/chatty-djinn-14B |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: TruthfulQA (0-shot) |
|
type: truthful_qa |
|
config: multiple_choice |
|
split: validation |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: mc2 |
|
value: 67.57 |
|
source: |
|
url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=mayacinka/chatty-djinn-14B |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: Winogrande (5-shot) |
|
type: winogrande |
|
config: winogrande_xl |
|
split: validation |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 83.11 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=mayacinka/chatty-djinn-14B |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GSM8k (5-shot) |
|
type: gsm8k |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 60.58 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=mayacinka/chatty-djinn-14B |
|
name: Open LLM Leaderboard |
|
--- |
|
![thumbnail](djinn-14b.webp) |
|
# djinn |
|
|
|
|
|
djinn is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing): |
|
* [openchat/openchat-3.5-0106](https://huggingface.co./openchat/openchat-3.5-0106) |
|
* [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co./teknium/OpenHermes-2.5-Mistral-7B) |
|
* bardsai/jaskier-7b-dpo-v6.1 |
|
* senseable/WestLake-7B-v2 |
|
* NousResearch/Nous-Hermes-2-Mistral-7B-DPO |
|
* paulml/OGNO-7B |
|
* paulml/DPOB-INMTOB-7B |
|
* mlabonne/AlphaMonarch-7B |
|
|
|
# 🏆 Benchmarks |
|
Nous benchmarks, find more [details here](https://gist.github.com/majacinka/3f2a797c8872ca9bfdaa2bbf3369edb5) |
|
|
|
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average| |
|
|---------------------------------------------------------------------|------:|------:|---------:|-------:|------:| |
|
|[chatty-djinn-14B](https://huggingface.co./mayacinka/chatty-djinn-14B)| 38.43| 76.29| 68.02| 47.6| 57.59| |
|
|
|
### AGIEval |
|
| Task |Version| Metric |Value| |Stderr| |
|
|------------------------------|------:|--------|----:|---|-----:| |
|
|agieval_aqua_rat | 0|acc |23.62|± | 2.67| |
|
| | |acc_norm|21.65|± | 2.59| |
|
|agieval_logiqa_en | 0|acc |32.26|± | 1.83| |
|
| | |acc_norm|33.79|± | 1.86| |
|
|agieval_lsat_ar | 0|acc |23.04|± | 2.78| |
|
| | |acc_norm|23.04|± | 2.78| |
|
|agieval_lsat_lr | 0|acc |38.82|± | 2.16| |
|
| | |acc_norm|39.22|± | 2.16| |
|
|agieval_lsat_rc | 0|acc |59.48|± | 3.00| |
|
| | |acc_norm|54.65|± | 3.04| |
|
|agieval_sat_en | 0|acc |75.73|± | 2.99| |
|
| | |acc_norm|74.27|± | 3.05| |
|
|agieval_sat_en_without_passage| 0|acc |35.92|± | 3.35| |
|
| | |acc_norm|34.47|± | 3.32| |
|
|agieval_sat_math | 0|acc |31.36|± | 3.14| |
|
| | |acc_norm|26.36|± | 2.98| |
|
|
|
Average: 38.43% |
|
|
|
### GPT4All |
|
| Task |Version| Metric |Value| |Stderr| |
|
|-------------|------:|--------|----:|---|-----:| |
|
|arc_challenge| 0|acc |62.12|± | 1.42| |
|
| | |acc_norm|65.44|± | 1.39| |
|
|arc_easy | 0|acc |83.88|± | 0.75| |
|
| | |acc_norm|78.58|± | 0.84| |
|
|boolq | 1|acc |88.07|± | 0.57| |
|
|hellaswag | 0|acc |65.18|± | 0.48| |
|
| | |acc_norm|86.45|± | 0.34| |
|
|openbookqa | 0|acc |39.60|± | 2.19| |
|
| | |acc_norm|48.60|± | 2.24| |
|
|piqa | 0|acc |82.26|± | 0.89| |
|
| | |acc_norm|83.62|± | 0.86| |
|
|winogrande | 0|acc |83.27|± | 1.05| |
|
|
|
Average: 76.29% |
|
|
|
### TruthfulQA |
|
| Task |Version|Metric|Value| |Stderr| |
|
|-------------|------:|------|----:|---|-----:| |
|
|truthfulqa_mc| 1|mc1 |50.55|± | 1.75| |
|
| | |mc2 |68.02|± | 1.52| |
|
|
|
Average: 68.02% |
|
|
|
### Bigbench |
|
| Task |Version| Metric |Value| |Stderr| |
|
|------------------------------------------------|------:|---------------------|----:|---|-----:| |
|
|bigbench_causal_judgement | 0|multiple_choice_grade|57.89|± | 3.59| |
|
|bigbench_date_understanding | 0|multiple_choice_grade|64.50|± | 2.49| |
|
|bigbench_disambiguation_qa | 0|multiple_choice_grade|32.56|± | 2.92| |
|
|bigbench_geometric_shapes | 0|multiple_choice_grade|26.18|± | 2.32| |
|
| | |exact_str_match | 1.11|± | 0.55| |
|
|bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|30.80|± | 2.07| |
|
|bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|22.86|± | 1.59| |
|
|bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|57.67|± | 2.86| |
|
|bigbench_movie_recommendation | 0|multiple_choice_grade|62.00|± | 2.17| |
|
|bigbench_navigate | 0|multiple_choice_grade|56.20|± | 1.57| |
|
|bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|65.65|± | 1.06| |
|
|bigbench_ruin_names | 0|multiple_choice_grade|64.73|± | 2.26| |
|
|bigbench_salient_translation_error_detection | 0|multiple_choice_grade|17.33|± | 1.20| |
|
|bigbench_snarks | 0|multiple_choice_grade|76.24|± | 3.17| |
|
|bigbench_sports_understanding | 0|multiple_choice_grade|75.15|± | 1.38| |
|
|bigbench_temporal_sequences | 0|multiple_choice_grade|48.90|± | 1.58| |
|
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|22.32|± | 1.18| |
|
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|18.17|± | 0.92| |
|
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|57.67|± | 2.86| |
|
|
|
Average: 47.6% |
|
|
|
Average score: 57.59% |
|
## 🧩 Configuration |
|
Inspired by [theprofessor's config](https://huggingface.co./abacusai/TheProfessor-155b) |
|
|
|
```yaml |
|
merge_method: linear # use linear so we can include multiple models, albeit at a zero weight |
|
parameters: |
|
weight: 1.0 # weight everything as 1 unless specified otherwise - linear with one model weighted at 1 is a no-op like passthrough |
|
slices: |
|
- sources: |
|
- model: openchat/openchat-3.5-0106 |
|
layer_range: [0, 1] |
|
- model: teknium/OpenHermes-2.5-Mistral-7B |
|
layer_range: [0, 1] |
|
parameters: |
|
weight: 0 |
|
- sources: |
|
- model: bardsai/jaskier-7b-dpo-v6.1 |
|
layer_range: [1, 10] |
|
- sources: |
|
- model: senseable/WestLake-7B-v2 |
|
layer_range: [10, 20] |
|
- sources: |
|
- model: NousResearch/Nous-Hermes-2-Mistral-7B-DPO |
|
layer_range: [20, 30] |
|
- sources: |
|
- model: paulml/OGNO-7B |
|
layer_range: [15, 25] |
|
- sources: |
|
- model: paulml/DPOB-INMTOB-7B |
|
layer_range: [22, 32] |
|
- sources: |
|
- model: mlabonne/AlphaMonarch-7B |
|
layer_range: [5, 15] |
|
- sources: |
|
- model: openchat/openchat-3.5-0106 |
|
layer_range: [31, 32] |
|
- model: teknium/OpenHermes-2.5-Mistral-7B |
|
layer_range: [31, 32] |
|
parameters: |
|
weight: 0 |
|
dtype: float16 |
|
tokenizer_source: model:openchat/openchat-3.5-0106 |
|
``` |
|
|
|
## An example |
|
|
|
*User:* Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have? |
|
Think step by step and the result MUST BE A NUMBER |
|
|
|
*djinn:* |
|
In this family structure, we know there are 4 aka '1 + 3-brothers' (1 being initially for Sally) existing individuals who are |
|
also the siblings of the 3-brother-group. As each of the 3-brothers has 2-sisters each, that sums up to 6 total sister- roles. Since |
|
we already accounted for one of the sisters (assuming to be the oldest or in alphabetical order, as names were not given) to be part |
|
of the 3-brother group, we subtract one from the 6-sisters total. So, considering the provided information, Sally would have 6 - 1 = 5 |
|
remaining younger or alphabetically-wise- placed, biological-sisters. However, note that this problem statement might not perfectly align |
|
with real-life complexities, such as same-sex twins, adoption, etc., but within the given context, the solution is 5. |
|
|
|
Remember, in more intricate or unclear family tree scenarios, it'd be best to directly ask for clarifications or consider additional data |
|
points. In short, with the given constraints, Sally has 5 known, direct, biological, and likely-in-age-order, younger/alphabetically-placed, |
|
sister-siblings.(To reiterate the core result in a concise manner: 5). |
|
|
|
(Please excuse the verbosity, aiming for thoroughness and clarity.) |
|
|
|
This response maintains the initial request's numerical focus while providing a detailed explanation to ensure understanding. |
|
|
|
## 💻 Usage |
|
|
|
```python |
|
!pip install -qU transformers accelerate |
|
|
|
from transformers import AutoTokenizer |
|
import transformers |
|
import torch |
|
|
|
model = "mayacinka/djinn" |
|
messages = [{"role": "user", "content": "What is a large language model?"}] |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model) |
|
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
pipeline = transformers.pipeline( |
|
"text-generation", |
|
model=model, |
|
torch_dtype=torch.float16, |
|
device_map="auto", |
|
) |
|
|
|
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) |
|
print(outputs[0]["generated_text"]) |
|
``` |
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co./datasets/open-llm-leaderboard/details_mayacinka__chatty-djinn-14B) |
|
|
|
| Metric |Value| |
|
|---------------------------------|----:| |
|
|Avg. |72.08| |
|
|AI2 Reasoning Challenge (25-Shot)|70.39| |
|
|HellaSwag (10-Shot) |86.45| |
|
|MMLU (5-Shot) |64.40| |
|
|TruthfulQA (0-shot) |67.57| |
|
|Winogrande (5-shot) |83.11| |
|
|GSM8k (5-shot) |60.58| |
|
|
|
|