File size: 7,991 Bytes
a676ec7 a4c5422 a676ec7 a4c5422 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 |
---
license: apache-2.0
language:
- en
- zh
base_model:
- Qwen/Qwen2.5-14B-Instruct
pipeline_tag: text-generation
library_name: transformers
tags:
- StreamlinedMemory
- Code
- Math
- Qwen
- text-generation-inference
model-index:
- name: Sombrero-Opus-14B-Sm2
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: wis-k/instruction-following-eval
split: train
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 42.72
name: averaged accuracy
source:
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FSombrero-Opus-14B-Sm2
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: SaylorTwift/bbh
split: test
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 51.25
name: normalized accuracy
source:
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FSombrero-Opus-14B-Sm2
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: lighteval/MATH-Hard
split: test
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 48.64
name: exact match
source:
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FSombrero-Opus-14B-Sm2
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
split: train
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 18.46
name: acc_norm
source:
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FSombrero-Opus-14B-Sm2
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 24.53
name: acc_norm
source:
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FSombrero-Opus-14B-Sm2
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 48.28
name: accuracy
source:
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FSombrero-Opus-14B-Sm2
name: Open LLM Leaderboard
---

# **Sombrero-Opus-14B-Sm2**
> Sombrero-Opus-14B-Sm2 is based on the Qwen 2.5 14B modality architecture, designed to enhance coding efficiency and computational reasoning. This model is optimized for streamlined memory usage, avoiding unwanted textual token generation, and excelling in coding, explanatory reasoning, mathematical problem-solving, and technical tasks. It has been fine-tuned using specialized datasets to improve code generation, structured programming logic, and problem-solving capabilities.
## **Key Improvements**
1. **Optimized for Coding**: The model specializes in generating high-quality, structured code with minimal redundant tokens, ensuring efficient execution.
2. **Enhanced Memory Utilization**: Implements streamlined memory optimization to reduce computational overhead and improve performance.
3. **Superior Reasoning Capabilities**: Excels in solving complex mathematical and algorithmic problems with logical and structured explanations.
4. **Long-Context Support**: Supports up to 128K tokens for input context and can generate up to 8K tokens in a single output, making it ideal for detailed coding responses.
5. **Reduced Unwanted Textual Tokens**: Ensures a more focused output for coding tasks by minimizing excessive textual responses.
## **Quickstart with transformers**
Here is a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and generate content:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "prithivMLmods/Sombrero-Opus-14B-Sm2"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Write a Python function to find the Fibonacci sequence."
messages = [
{"role": "system", "content": "You are an advanced coding assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```
## **Intended Use**
1. **Code Generation & Optimization**:
Designed for developers, assisting in writing, refactoring, and optimizing code across multiple programming languages.
2. **Algorithm & Mathematical Problem Solving**:
Provides precise explanations and solutions for computational and mathematical problems.
3. **Technical Explanations & Documentation**:
Generates clear and structured explanations for coding concepts, libraries, and APIs.
4. **Debugging Assistance**:
Helps analyze code snippets, detect errors, and suggest corrections.
5. **Educational Use**:
Assists students and learners by breaking down complex programming topics into easily understandable sections.
6. **Structured Data Processing**:
Capable of analyzing and generating structured outputs, such as JSON, XML, and tables, making it ideal for data science applications.
## **Limitations**
1. **Hardware Requirements**:
Requires high-memory GPUs or TPUs due to its large parameter size and long-context support.
2. **Potential Bias in Responses**:
While designed to be neutral, outputs may still reflect biases present in training data.
3. **Inconsistent Outputs in Creative Tasks**:
May produce variable results in storytelling and non-technical topics.
4. **Limited Real-World Awareness**:
Does not have access to real-time events beyond its training cutoff.
5. **Error Propagation in Extended Outputs**:
Minor errors in early responses may affect overall coherence in long-form code outputs.
6. **Prompt Sensitivity**:
The effectiveness of responses may depend on how well the input prompt is structured.
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co./datasets/open-llm-leaderboard/prithivMLmods__Sombrero-Opus-14B-Sm2-details)!
Summarized results can be found [here](https://huggingface.co./datasets/open-llm-leaderboard/contents/viewer/default/train?q=prithivMLmods%2FSombrero-Opus-14B-Sm2&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
| Metric |Value (%)|
|-------------------|--------:|
|**Average** | 38.98|
|IFEval (0-Shot) | 42.72|
|BBH (3-Shot) | 51.25|
|MATH Lvl 5 (4-Shot)| 48.64|
|GPQA (0-shot) | 18.46|
|MuSR (0-shot) | 24.53|
|MMLU-PRO (5-shot) | 48.28|
|