Gauss-Opus-14B-R999 / README.md
prithivMLmods's picture
Adding Evaluation Results (#1)
a6de899 verified
metadata
license: apache-2.0
language:
  - en
  - zh
base_model:
  - Qwen/Qwen2.5-14B-Instruct
pipeline_tag: text-generation
library_name: transformers
tags:
  - text-generation-inference
  - trl
  - vlm
  - sft
  - code
  - math
model-index:
  - name: Gauss-Opus-14B-R999
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: IFEval (0-Shot)
          type: wis-k/instruction-following-eval
          split: train
          args:
            num_few_shot: 0
        metrics:
          - type: inst_level_strict_acc and prompt_level_strict_acc
            value: 39.07
            name: averaged accuracy
        source:
          url: >-
            https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FGauss-Opus-14B-R999
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BBH (3-Shot)
          type: SaylorTwift/bbh
          split: test
          args:
            num_few_shot: 3
        metrics:
          - type: acc_norm
            value: 44.94
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FGauss-Opus-14B-R999
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MATH Lvl 5 (4-Shot)
          type: lighteval/MATH-Hard
          split: test
          args:
            num_few_shot: 4
        metrics:
          - type: exact_match
            value: 57.55
            name: exact match
        source:
          url: >-
            https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FGauss-Opus-14B-R999
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GPQA (0-shot)
          type: Idavidrein/gpqa
          split: train
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 18.9
            name: acc_norm
        source:
          url: >-
            https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FGauss-Opus-14B-R999
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MuSR (0-shot)
          type: TAUR-Lab/MuSR
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 27.83
            name: acc_norm
        source:
          url: >-
            https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FGauss-Opus-14B-R999
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU-PRO (5-shot)
          type: TIGER-Lab/MMLU-Pro
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 44.53
            name: accuracy
        source:
          url: >-
            https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FGauss-Opus-14B-R999
          name: Open LLM Leaderboard

ccccccccccccc.png

Gauss-Opus-14B-R999

Gauss-Opus-14B-R999 is based on the Qwen 2.5 14B modality architecture, designed to enhance mathematical and constructive reasoning capabilities. This model is optimized for advanced problem-solving, logical structuring, and mathematical comprehension. It excels in numerical reasoning, theorem proving, and multi-step calculations. Fine-tuned with specialized datasets in mathematics, physics, and formal logic, it delivers structured, high-accuracy outputs with a strong emphasis on precision and clarity.

Key Improvements

  1. Enhanced Mathematical Reasoning: Optimized for algebra, calculus, number theory, and logical deduction, providing precise and structured solutions.
  2. Improved Instruction Following: Capable of interpreting and following complex mathematical proofs, equations, and problem-solving instructions with high accuracy.
  3. Versatile Adaptability: Handles diverse reasoning tasks, including step-by-step solutions, mathematical proofs, and constructive problem-solving.
  4. Long-Context Support: Supports up to 128K tokens for input context and can generate up to 8K tokens in a single output, making it ideal for detailed mathematical derivations.
  5. Multilingual Proficiency: Supports over 29 languages, including English, Chinese, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more, ensuring broad accessibility.

Quickstart with transformers

Here is a code snippet with apply_chat_template to show you how to load the tokenizer and model and generate content:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "prithivMLmods/Gauss-Opus-14B-R999"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Solve the integral \int x^2 dx and explain the steps."
messages = [
    {"role": "system", "content": "You are a mathematical assistant specialized in problem-solving and theorem proving."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Intended Use

  1. Mathematical Problem-Solving:
    Designed for high-precision mathematical reasoning, step-by-step calculations, and structured solutions.

  2. Theorem Proving and Logical Reasoning:
    Useful for verifying mathematical proofs, formal logic derivations, and theorem-based reasoning.

  3. STEM Education and Research:
    Ideal for educators, researchers, and students requiring assistance in complex problem-solving and mathematical modeling.

  4. Algorithm Development and Optimization:
    Supports structured reasoning in algorithmic problem-solving, coding optimizations, and computational logic.

  5. Long-Form Explanatory Content:
    Can generate detailed mathematical articles, research summaries, and explanatory guides with structured step-by-step reasoning.

  6. Multilingual Mathematical Assistance:
    Supports global accessibility for mathematical discussions, translations, and problem explanations across multiple languages.

Limitations

  1. Hardware Requirements:
    Requires high-memory GPUs or TPUs due to its large parameter size and long-context support.

  2. Potential Bias in Training Data:
    While optimized for accuracy, the model may inherit biases from training data in certain problem-solving approaches.

  3. Complexity in Abstract Theories:
    May struggle with highly abstract or unsolved mathematical problems that require intuitive leaps beyond computational logic.

  4. Error Propagation in Extended Proofs:
    Small errors in early steps may compound in multi-step proofs and long-form mathematical derivations.

  5. Prompt Sensitivity:
    The quality of responses depends on how well the problem is structured and framed within the input prompt.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here! Summarized results can be found here!

Metric Value (%)
Average 38.80
IFEval (0-Shot) 39.07
BBH (3-Shot) 44.94
MATH Lvl 5 (4-Shot) 57.55
GPQA (0-shot) 18.90
MuSR (0-shot) 27.83
MMLU-PRO (5-shot) 44.53