File size: 7,977 Bytes
e40fd1d bf7a0ab e40fd1d b99ffed f1bfe18 b99ffed 8210bbb b99ffed bf7a0ab |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
---
library_name: transformers
tags:
- mergekit
- merge
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Llama-8B
- sethuiyer/Llama-3.1-8B-Experimental-1206-Instruct
model-index:
- name: LlamaZero-3.1-8B-Experimental-1208
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: HuggingFaceH4/ifeval
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 60.51
name: strict accuracy
source:
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/LlamaZero-3.1-8B-Experimental-1208
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: BBH
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 28.61
name: normalized accuracy
source:
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/LlamaZero-3.1-8B-Experimental-1208
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: hendrycks/competition_math
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 9.67
name: exact match
source:
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/LlamaZero-3.1-8B-Experimental-1208
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 2.46
name: acc_norm
source:
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/LlamaZero-3.1-8B-Experimental-1208
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 7.15
name: acc_norm
source:
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/LlamaZero-3.1-8B-Experimental-1208
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 22.22
name: accuracy
source:
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/LlamaZero-3.1-8B-Experimental-1208
name: Open LLM Leaderboard
---
# LlamaZero 3.1-8B
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
```python
question = '''
Framework to Reason:
Restate the most critical elements of the scenario concisely to ensure focus on relevant details.
Group evidence by relevance to means, opportunity, and actions:
Focus on the most plausible hypothesis based on the evidence.
Evaluate Evidence for Each Hypothesis:
Draw a Conclusion
Optional: Address Uncertainty or Gaps:
Tips:
Avoid repeating the same point unnecessarily.
Focus on the strongest evidence for each hypothesis.
Briefly mention any uncertainties but do not overcomplicate the analysis.
Prompt:
You are a detective investigating a robbery at a museum. The robbery occurred last night between 10 PM and 11 PM. Here are the key facts:
Security footage shows no signs of entry through the main door.
The back door was locked, but the key is missing from the security guard's desk.
A museum employee, Alex, was seen near the guard's desk at 9:45 PM.
Another employee, Jordan, reported hearing a loud noise from the exhibit hall at 10:15 PM but did not investigate.
The stolen artifact is a small, lightweight sculpture that could fit in a bag.
The alarm system was disabled from the control room, but the exact time of deactivation is unknown.
Task:
Analyze the situation and determine who is most likely responsible for the robbery. Include your reasoning based on the evidence provided.
'''
response = assistant.generate_response(question, max_new_tokens=2048)
print(response)
```
```text
To determine the most likely responsible person for the robbery, I will analyze the evidence provided.
**Critical Elements:**
- The robbery occurred between 10 PM and 11 PM.
- No signs of entry through the main door.
- The back door was locked, but the key is missing from the security guard's desk.
- Alex was seen near the guard's desk at 9:45 PM.
- Jordan reported hearing a loud noise from the exhibit hall at 10:15 PM but did not investigate.
- The stolen artifact is small and lightweight.
- The alarm system was disabled from the control room, but the exact time of deactivation is unknown.
**Grouping Evidence:**
- **Means:** Access to the back door key or knowledge of an alternate entry point.
- **Opportunity:** Being in the vicinity of the back door or the control room during the time of the robbery.
- **Actions:** Disabling the alarm system, entering the exhibit hall, and taking the sculpture.
**Most Plausible Hypothesis:**
Based on the evidence, the most likely responsible person is Alex. Here's why:
- Alex was seen near the guard's desk at 9:45 PM, which is around the time the robbery occurred. This suggests Alex had access to the back door key or was involved in disabling the alarm.
- The back door key was missing from the security guard's desk, indicating that someone with access to it was involved. Alex's presence near the desk at 9:45 PM suggests they may have taken the key.
- Jordan heard a noise at 10:15 PM, which is around the time the robbery occurred. However, Jordan did not investigate, which suggests they may not have been the one to enter or they may have been distracted.
- The alarm system was disabled, which could have been done by Alex or someone else. However, since Alex was near the guard's desk, it's likely they disabled the alarm and then entered through the back door.
- The sculpture was stolen, and Alex had the opportunity to take it after disabling the alarm and entering through the back door.
**Evaluation of Evidence:**
The evidence strongly suggests Alex was involved. Alex's presence near the guard's desk at 9:45 PM and the missing key indicate they had the means to enter the back door. The timing of their presence is also consistent with the time of the robbery. Additionally, Alex would have had the opportunity to take the sculpture and disable the alarm.
**Conclusion:**
Based on the evidence, Alex is the most likely responsible for the robbery.
**Uncertainty or Gaps:**
- The exact time the alarm was disabled is unknown, but since Alex was seen near the desk at 9:45 PM, it's likely they disabled the alarm shortly before the robbery. This is a minor uncertainty but does not affect the conclusion.
- Jordan's lack of investigation could be due to being asleep or distracted, but this does not necessarily rule out Alex as the perpetrator.
- There is no direct evidence linking Alex to the actual theft, but the timing and access to the key make it highly probable.
Therefore, the most likely responsible person is Alex.
```
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co./datasets/open-llm-leaderboard/sethuiyer__LlamaZero-3.1-8B-Experimental-1208-details)
| Metric |Value|
|-------------------|----:|
|Avg. |21.77|
|IFEval (0-Shot) |60.51|
|BBH (3-Shot) |28.61|
|MATH Lvl 5 (4-Shot)| 9.67|
|GPQA (0-shot) | 2.46|
|MuSR (0-shot) | 7.15|
|MMLU-PRO (5-shot) |22.22|
|