Model Card: OLAIR/ko-r1-7b-v2.0.3

This document describes the OLAIR/ko-r1-7b-v2.0.3 model, including its training data, intended use, performance benchmarks, limitations, and ethical considerations.

1. Overview

Model Name: OLAIR/ko-r1-7b-v2.0.3
Model Type: Large Language Model (LLM) for Korean language understanding and reasoning
Version: 2.0.3

This model is designed to provide Korean language capabilities with a focus on reasoning tasks. It is the second version in its series, building upon previous iterations with improvements in training data and fine-tuning methodologies.

2. Training Data

The model was trained on the dataset provided by OLAIR, specifically the Open-R1-Ko-SFT-v2.0 dataset. This dataset includes a curated collection of Korean language data, optimized for supervised fine-tuning (SFT) to enhance reasoning and natural language understanding capabilities in Korean.

3. Benchmark Performance

The model's performance has been evaluated using the HAE-RAE Reasoning Challenge (HRC), which measures reasoning abilities across various domains. Below are the benchmark results for several models, including OLAIR/ko-r1-7b-v2.0.3:

Model	Chemistry	Math	Physics	Physics Word Puzzles	Puzzles	Average
o1-2024-12-17	42.9	74.5	77.8	70.0	30.8	59.2
o3-mini-high	35.7	72.7	70.4	70.0	23.1	54.4
o3-mini-2025-01-31	35.7	74.5	74.1	60.0	7.7	50.4
o1-mini-2024-09-12	35.7	54.5	63.0	60.0	0.0	42.6
Deepseek-R1	35.7	52.7	51.9	60.0	0.0	40.1
gpt-4o-2024-11-20	28.6	21.8	37.0	50.0	0.0	27.5
Ko-R1-7B-v2.0.3	7.1	56.4	29.6	40.0	0.0	26.6
Qwen2.5-72B-Instruct	35.7	29.1	37.0	30.0	0.0	26.4
Ko-R1-7B-v1	0.0	60.0	22.2	40.0	0.0	24.4
Exaone-3.5-32B-Instruct	28.6	27.3	22.2	40.0	0.0	23.6
gpt-4o-mini-2024-07-18	7.1	29.1	22.2	50.0	0.0	21.7
UNIVA-Bllossom_DeepSeek-llama3.1-Bllossom-8B	14.3	10.9	33.3	0.0	0.0	11.7

Note: The above table reflects performance across multiple reasoning domains. The metrics indicate that while OLAIR/ko-r1-7b-v2.0.3 shows competitive performance in certain areas (e.g., Math), there remain challenges, particularly in Chemistry and Physics-related tasks, compared to some higher-performing counterparts.

4. Limitations

The model is still vulnerable to Korean-related inputs, leading to endless loops of thinking. We are working to fix it.

ETC

How to Cite

To be added

Contact

[email protected]

OLAIR
/

ko-r1-7b-v2.0.3

Model Card: OLAIR/ko-r1-7b-v2.0.3

1. Overview

2. Training Data

3. Benchmark Performance

4. Limitations

ETC

Model tree for OLAIR/ko-r1-7b-v2.0.3

Dataset used to train OLAIR/ko-r1-7b-v2.0.3