Model Card: OLAIR/ko-r1-7b-v2.0.3
This document describes the OLAIR/ko-r1-7b-v2.0.3 model, including its training data, intended use, performance benchmarks, limitations, and ethical considerations.
1. Overview
Model Name: OLAIR/ko-r1-7b-v2.0.3
Model Type: Large Language Model (LLM) for Korean language understanding and reasoning
Version: 2.0.3
This model is designed to provide Korean language capabilities with a focus on reasoning tasks. It is the second version in its series, building upon previous iterations with improvements in training data and fine-tuning methodologies.
2. Training Data
The model was trained on the dataset provided by OLAIR, specifically the Open-R1-Ko-SFT-v2.0 dataset. This dataset includes a curated collection of Korean language data, optimized for supervised fine-tuning (SFT) to enhance reasoning and natural language understanding capabilities in Korean.
3. Benchmark Performance
The model's performance has been evaluated using the HAE-RAE Reasoning Challenge (HRC), which measures reasoning abilities across various domains. Below are the benchmark results for several models, including OLAIR/ko-r1-7b-v2.0.3:
Model | Chemistry | Math | Physics | Physics Word Puzzles | Puzzles | Average |
---|---|---|---|---|---|---|
o1-2024-12-17 | 42.9 | 74.5 | 77.8 | 70.0 | 30.8 | 59.2 |
o3-mini-high | 35.7 | 72.7 | 70.4 | 70.0 | 23.1 | 54.4 |
o3-mini-2025-01-31 | 35.7 | 74.5 | 74.1 | 60.0 | 7.7 | 50.4 |
o1-mini-2024-09-12 | 35.7 | 54.5 | 63.0 | 60.0 | 0.0 | 42.6 |
Deepseek-R1 | 35.7 | 52.7 | 51.9 | 60.0 | 0.0 | 40.1 |
gpt-4o-2024-11-20 | 28.6 | 21.8 | 37.0 | 50.0 | 0.0 | 27.5 |
Ko-R1-7B-v2.0.3 | 7.1 | 56.4 | 29.6 | 40.0 | 0.0 | 26.6 |
Qwen2.5-72B-Instruct | 35.7 | 29.1 | 37.0 | 30.0 | 0.0 | 26.4 |
Ko-R1-7B-v1 | 0.0 | 60.0 | 22.2 | 40.0 | 0.0 | 24.4 |
Exaone-3.5-32B-Instruct | 28.6 | 27.3 | 22.2 | 40.0 | 0.0 | 23.6 |
gpt-4o-mini-2024-07-18 | 7.1 | 29.1 | 22.2 | 50.0 | 0.0 | 21.7 |
UNIVA-Bllossom_DeepSeek-llama3.1-Bllossom-8B | 14.3 | 10.9 | 33.3 | 0.0 | 0.0 | 11.7 |
Note: The above table reflects performance across multiple reasoning domains. The metrics indicate that while OLAIR/ko-r1-7b-v2.0.3 shows competitive performance in certain areas (e.g., Math), there remain challenges, particularly in Chemistry and Physics-related tasks, compared to some higher-performing counterparts.
4. Limitations
- The model is still vulnerable to Korean-related inputs, leading to endless loops of thinking. We are working to fix it.
ETC
How to Cite
To be added
Contact
[email protected]
- Downloads last month
- 74
Model tree for OLAIR/ko-r1-7b-v2.0.3
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B