README.md · Daemontatox/Zireal-0 at main

Zireal-0 / README.md

Daemontatox

Update README.md

af8982b verified 6 days ago

preview code

raw

history blame contribute delete

5.86 kB

	---
	license: apache-2.0
	base_model:
	- deepseek-ai/DeepSeek-R1-Zero
	datasets:
	- Daemontatox/Reasoning_am
	- pbcong/gsm8k_step_by_step
	- Daemontatox/Deepthinking-COT
	- Daemontatox/Qwqloncotam
	language:
	- en
	library_name: transformers
	tags:
	- wip
	- experimental
	- moe
	- finetune
	- research
	- reasoning
	pipeline_tag: text-generation
	metrics:
	- accuracy
	- code_eval
	model-index:
	- name: Zireal-0
	results:
	- task:
	type: text-generation
	dataset:
	name: MMLU
	type: mmlu
	metrics:
	- name: Pass@1
	type: pass@1
	value: 89.8
	- task:
	type: text-generation
	dataset:
	name: MMLU-Redux
	type: mmlu-redux
	metrics:
	- name: Exact Match (EM)
	type: exact_match
	value: 91.9
	- task:
	type: text-generation
	dataset:
	name: MATH-500
	type: math500
	metrics:
	- name: Pass@1
	type: pass@1
	value: 96.3
	- task:
	type: text-generation
	dataset:
	name: AIME 2024
	type: aime2024
	metrics:
	- name: Pass@1
	type: pass@1
	value: 78.8
	- task:
	type: text-generation
	dataset:
	name: Codeforces
	type: codeforces
	metrics:
	- name: Percentile
	type: percentile
	value: 95.3
	- task:
	type: text-generation
	dataset:
	name: LiveCodeBench
	type: livecodebench
	metrics:
	- name: Pass@1
	type: pass@1
	value: 64.9
	---
	![image](./image.webp)

	# Zireal-0: Experimental Fine-Tune of R1-Zero

	Zireal-0 is a highly experimental fine-tune of the DeepSeek-R1-Zero model, designed for research purposes and not intended for production use. This model focuses on advancing reasoning capabilities and structured inference through fine-tuning on multiple high-quality reasoning datasets.

	---

	## Key Features

	- Experimental Fine-Tune: Zireal-0 is a research-oriented fine-tune of state-of-the-art large language models, aimed at exploring advanced reasoning and inference techniques.
	- Research-Only Use Case: This model is not suitable for production environments and is intended solely for experimental and academic purposes.
	- Enhanced Reasoning Abilities: Fine-tuned on diverse reasoning datasets to improve logical inference, step-by-step problem-solving, and structured reasoning.
	- Chain-of-Thought (CoT) Focus: Optimized for multi-step reasoning tasks, leveraging Chain-of-Thought learning to enhance structured and interpretable inference.

	---

	## Intended Use

	Zireal-0 is designed for researchers and developers exploring the following areas:
	- Reasoning and Inference: Evaluating and improving logical reasoning, step-by-step problem-solving, and structured inference in language models.
	- Chain-of-Thought Learning: Investigating the effectiveness of CoT techniques in enhancing multi-step reasoning.
	- Experimental Fine-Tuning: Studying the impact of fine-tuning on specialized datasets for improving model performance in specific domains.

	---

	## Limitations

	- Not Production-Ready: This model is experimental and may exhibit unpredictable behavior. It should not be used in production systems.
	- Uncensored Outputs: As an uncensored model, Z1 may generate content that is inappropriate or unsafe without additional safeguards.
	- Work in Progress: The model is still under development, and its performance may vary across tasks and datasets.

	---

	## Datasets Used for Fine-Tuning

	1. Reasoning_am: Focused on advanced reasoning tasks.
	2. gsm8k_step_by_step: A dataset emphasizing step-by-step problem-solving in mathematical reasoning.
	3. Deepthinking-COT: Designed to enhance Chain-of-Thought reasoning capabilities.
	4. Qwqloncotam: A specialized dataset for improving structured inference and multi-step reasoning.

	---

	## Performance Evaluation

	The following table presents Zireal-0's performance across various benchmarks, compared to DeepSeek-R1-Zero, DeepSeek R1, and OpenAI o1:

	\| Benchmark \|Zireal-0\| DeepSeek-R1-Zero \| DeepSeek R1 \| OpenAI o1 \|
	\|------------------------------\|--------\|------------------\|-------------\|-----------\|
	\| MMLU (Pass@1) \| 90.2 \| 88.5 \| 90.8 \| 91.8 \|
	\| MMLU-Redux (EM) \| 91.5 \| 90.2 \| 92.9 \| - \|
	\| MATH-500 (Pass@1) \| 96.0 \| 95.1 \| 97.3 \| 96.4 \|
	\| AIME 2024 (Pass@1) \| 78.6 \| 77.4 \| 79.8 \| 79.2 \|
	\| Codeforces (Percentile) \| 95.0 \| 94.2 \| 96.3 \| 96.6 \|
	\| LiveCodeBench (Pass@1) \| 62.9 \| 63.5 \| 65.9 \| 63.4 \|

	---

	## Ethical Considerations

	- Responsible Use: This model is intended for research purposes only. Users should ensure that its outputs are carefully monitored and evaluated.
	- Bias and Fairness: As with all language models, Z1 may inherit biases from its training data. Researchers should assess and mitigate potential biases in their applications.
	- Safety: Due to its uncensored nature, additional safeguards may be required to prevent misuse or harmful outputs.

	---

	## Future Work

	- Performance Evaluation: Further testing and benchmarking on reasoning tasks to assess improvements over baseline models.
	- Dataset Expansion: Incorporating additional datasets to enhance reasoning and inference capabilities.
	- Safety and Alignment: Exploring methods to align the model with ethical guidelines and safety standards for broader use.