Zireal-0 / README.md
Daemontatox's picture
Update README.md
af8982b verified
metadata
license: apache-2.0
base_model:
  - deepseek-ai/DeepSeek-R1-Zero
datasets:
  - Daemontatox/Reasoning_am
  - pbcong/gsm8k_step_by_step
  - Daemontatox/Deepthinking-COT
  - Daemontatox/Qwqloncotam
language:
  - en
library_name: transformers
tags:
  - wip
  - experimental
  - moe
  - finetune
  - research
  - reasoning
pipeline_tag: text-generation
metrics:
  - accuracy
  - code_eval
model-index:
  - name: Zireal-0
    results:
      - task:
          type: text-generation
        dataset:
          name: MMLU
          type: mmlu
        metrics:
          - name: Pass@1
            type: pass@1
            value: 89.8
      - task:
          type: text-generation
        dataset:
          name: MMLU-Redux
          type: mmlu-redux
        metrics:
          - name: Exact Match (EM)
            type: exact_match
            value: 91.9
      - task:
          type: text-generation
        dataset:
          name: MATH-500
          type: math500
        metrics:
          - name: Pass@1
            type: pass@1
            value: 96.3
      - task:
          type: text-generation
        dataset:
          name: AIME 2024
          type: aime2024
        metrics:
          - name: Pass@1
            type: pass@1
            value: 78.8
      - task:
          type: text-generation
        dataset:
          name: Codeforces
          type: codeforces
        metrics:
          - name: Percentile
            type: percentile
            value: 95.3
      - task:
          type: text-generation
        dataset:
          name: LiveCodeBench
          type: livecodebench
        metrics:
          - name: Pass@1
            type: pass@1
            value: 64.9

image

Zireal-0: Experimental Fine-Tune of R1-Zero

Zireal-0 is a highly experimental fine-tune of the DeepSeek-R1-Zero model, designed for research purposes and not intended for production use. This model focuses on advancing reasoning capabilities and structured inference through fine-tuning on multiple high-quality reasoning datasets.


Key Features

  • Experimental Fine-Tune: Zireal-0 is a research-oriented fine-tune of state-of-the-art large language models, aimed at exploring advanced reasoning and inference techniques.
  • Research-Only Use Case: This model is not suitable for production environments and is intended solely for experimental and academic purposes.
  • Enhanced Reasoning Abilities: Fine-tuned on diverse reasoning datasets to improve logical inference, step-by-step problem-solving, and structured reasoning.
  • Chain-of-Thought (CoT) Focus: Optimized for multi-step reasoning tasks, leveraging Chain-of-Thought learning to enhance structured and interpretable inference.

Intended Use

Zireal-0 is designed for researchers and developers exploring the following areas:

  • Reasoning and Inference: Evaluating and improving logical reasoning, step-by-step problem-solving, and structured inference in language models.
  • Chain-of-Thought Learning: Investigating the effectiveness of CoT techniques in enhancing multi-step reasoning.
  • Experimental Fine-Tuning: Studying the impact of fine-tuning on specialized datasets for improving model performance in specific domains.

Limitations

  • Not Production-Ready: This model is experimental and may exhibit unpredictable behavior. It should not be used in production systems.
  • Uncensored Outputs: As an uncensored model, Z1 may generate content that is inappropriate or unsafe without additional safeguards.
  • Work in Progress: The model is still under development, and its performance may vary across tasks and datasets.

Datasets Used for Fine-Tuning

  1. Reasoning_am: Focused on advanced reasoning tasks.
  2. gsm8k_step_by_step: A dataset emphasizing step-by-step problem-solving in mathematical reasoning.
  3. Deepthinking-COT: Designed to enhance Chain-of-Thought reasoning capabilities.
  4. Qwqloncotam: A specialized dataset for improving structured inference and multi-step reasoning.

Performance Evaluation

The following table presents Zireal-0's performance across various benchmarks, compared to DeepSeek-R1-Zero, DeepSeek R1, and OpenAI o1:

Benchmark Zireal-0 DeepSeek-R1-Zero DeepSeek R1 OpenAI o1
MMLU (Pass@1) 90.2 88.5 90.8 91.8
MMLU-Redux (EM) 91.5 90.2 92.9 -
MATH-500 (Pass@1) 96.0 95.1 97.3 96.4
AIME 2024 (Pass@1) 78.6 77.4 79.8 79.2
Codeforces (Percentile) 95.0 94.2 96.3 96.6
LiveCodeBench (Pass@1) 62.9 63.5 65.9 63.4

Ethical Considerations

  • Responsible Use: This model is intended for research purposes only. Users should ensure that its outputs are carefully monitored and evaluated.
  • Bias and Fairness: As with all language models, Z1 may inherit biases from its training data. Researchers should assess and mitigate potential biases in their applications.
  • Safety: Due to its uncensored nature, additional safeguards may be required to prevent misuse or harmful outputs.

Future Work

  • Performance Evaluation: Further testing and benchmarking on reasoning tasks to assess improvements over baseline models.
  • Dataset Expansion: Incorporating additional datasets to enhance reasoning and inference capabilities.
  • Safety and Alignment: Exploring methods to align the model with ethical guidelines and safety standards for broader use.