license: apache-2.0
language:
- en
- zh
base_model:
- Qwen/Qwen2.5-14B-Instruct
pipeline_tag: text-generation
library_name: transformers
tags:
- text-generation-inference
- trl
- vlm
- sft
- code
- math
model-index:
- name: Gauss-Opus-14B-R999
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: wis-k/instruction-following-eval
split: train
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 39.07
name: averaged accuracy
source:
url: >-
https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FGauss-Opus-14B-R999
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: SaylorTwift/bbh
split: test
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 44.94
name: normalized accuracy
source:
url: >-
https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FGauss-Opus-14B-R999
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: lighteval/MATH-Hard
split: test
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 57.55
name: exact match
source:
url: >-
https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FGauss-Opus-14B-R999
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
split: train
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 18.9
name: acc_norm
source:
url: >-
https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FGauss-Opus-14B-R999
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 27.83
name: acc_norm
source:
url: >-
https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FGauss-Opus-14B-R999
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 44.53
name: accuracy
source:
url: >-
https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FGauss-Opus-14B-R999
name: Open LLM Leaderboard
Gauss-Opus-14B-R999
Gauss-Opus-14B-R999 is based on the Qwen 2.5 14B modality architecture, designed to enhance mathematical and constructive reasoning capabilities. This model is optimized for advanced problem-solving, logical structuring, and mathematical comprehension. It excels in numerical reasoning, theorem proving, and multi-step calculations. Fine-tuned with specialized datasets in mathematics, physics, and formal logic, it delivers structured, high-accuracy outputs with a strong emphasis on precision and clarity.
Key Improvements
- Enhanced Mathematical Reasoning: Optimized for algebra, calculus, number theory, and logical deduction, providing precise and structured solutions.
- Improved Instruction Following: Capable of interpreting and following complex mathematical proofs, equations, and problem-solving instructions with high accuracy.
- Versatile Adaptability: Handles diverse reasoning tasks, including step-by-step solutions, mathematical proofs, and constructive problem-solving.
- Long-Context Support: Supports up to 128K tokens for input context and can generate up to 8K tokens in a single output, making it ideal for detailed mathematical derivations.
- Multilingual Proficiency: Supports over 29 languages, including English, Chinese, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more, ensuring broad accessibility.
Quickstart with transformers
Here is a code snippet with apply_chat_template
to show you how to load the tokenizer and model and generate content:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "prithivMLmods/Gauss-Opus-14B-R999"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Solve the integral \int x^2 dx and explain the steps."
messages = [
{"role": "system", "content": "You are a mathematical assistant specialized in problem-solving and theorem proving."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
Intended Use
Mathematical Problem-Solving:
Designed for high-precision mathematical reasoning, step-by-step calculations, and structured solutions.Theorem Proving and Logical Reasoning:
Useful for verifying mathematical proofs, formal logic derivations, and theorem-based reasoning.STEM Education and Research:
Ideal for educators, researchers, and students requiring assistance in complex problem-solving and mathematical modeling.Algorithm Development and Optimization:
Supports structured reasoning in algorithmic problem-solving, coding optimizations, and computational logic.Long-Form Explanatory Content:
Can generate detailed mathematical articles, research summaries, and explanatory guides with structured step-by-step reasoning.Multilingual Mathematical Assistance:
Supports global accessibility for mathematical discussions, translations, and problem explanations across multiple languages.
Limitations
Hardware Requirements:
Requires high-memory GPUs or TPUs due to its large parameter size and long-context support.Potential Bias in Training Data:
While optimized for accuracy, the model may inherit biases from training data in certain problem-solving approaches.Complexity in Abstract Theories:
May struggle with highly abstract or unsolved mathematical problems that require intuitive leaps beyond computational logic.Error Propagation in Extended Proofs:
Small errors in early steps may compound in multi-step proofs and long-form mathematical derivations.Prompt Sensitivity:
The quality of responses depends on how well the problem is structured and framed within the input prompt.
Open LLM Leaderboard Evaluation Results
Detailed results can be found here! Summarized results can be found here!
Metric | Value (%) |
---|---|
Average | 38.80 |
IFEval (0-Shot) | 39.07 |
BBH (3-Shot) | 44.94 |
MATH Lvl 5 (4-Shot) | 57.55 |
GPQA (0-shot) | 18.90 |
MuSR (0-shot) | 27.83 |
MMLU-PRO (5-shot) | 44.53 |