Model Card: OLAIR/ko-r1-7b-v2.0.3

This document describes the OLAIR/ko-r1-7b-v2.0.3 model, including its training data, intended use, performance benchmarks, limitations, and ethical considerations.


1. Overview

Model Name: OLAIR/ko-r1-7b-v2.0.3
Model Type: Large Language Model (LLM) for Korean language understanding and reasoning
Version: 2.0.3

This model is designed to provide Korean language capabilities with a focus on reasoning tasks. It is the second version in its series, building upon previous iterations with improvements in training data and fine-tuning methodologies.

2. Training Data

The model was trained on the dataset provided by OLAIR, specifically the Open-R1-Ko-SFT-v2.0 dataset. This dataset includes a curated collection of Korean language data, optimized for supervised fine-tuning (SFT) to enhance reasoning and natural language understanding capabilities in Korean.

3. Benchmark Performance

The model's performance has been evaluated using the HAE-RAE Reasoning Challenge (HRC), which measures reasoning abilities across various domains. Below are the benchmark results for several models, including OLAIR/ko-r1-7b-v2.0.3:

Model Chemistry Math Physics Physics Word Puzzles Puzzles Average
o1-2024-12-17 42.9 74.5 77.8 70.0 30.8 59.2
o3-mini-high 35.7 72.7 70.4 70.0 23.1 54.4
o3-mini-2025-01-31 35.7 74.5 74.1 60.0 7.7 50.4
o1-mini-2024-09-12 35.7 54.5 63.0 60.0 0.0 42.6
Deepseek-R1 35.7 52.7 51.9 60.0 0.0 40.1
gpt-4o-2024-11-20 28.6 21.8 37.0 50.0 0.0 27.5
Ko-R1-7B-v2.0.3 7.1 56.4 29.6 40.0 0.0 26.6
Qwen2.5-72B-Instruct 35.7 29.1 37.0 30.0 0.0 26.4
Ko-R1-7B-v1 0.0 60.0 22.2 40.0 0.0 24.4
Exaone-3.5-32B-Instruct 28.6 27.3 22.2 40.0 0.0 23.6
gpt-4o-mini-2024-07-18 7.1 29.1 22.2 50.0 0.0 21.7
UNIVA-Bllossom_DeepSeek-llama3.1-Bllossom-8B 14.3 10.9 33.3 0.0 0.0 11.7

Note: The above table reflects performance across multiple reasoning domains. The metrics indicate that while OLAIR/ko-r1-7b-v2.0.3 shows competitive performance in certain areas (e.g., Math), there remain challenges, particularly in Chemistry and Physics-related tasks, compared to some higher-performing counterparts.

4. Limitations

  • The model is still vulnerable to Korean-related inputs, leading to endless loops of thinking. We are working to fix it.

ETC

How to Cite

To be added

Contact

[email protected]
Downloads last month
74
Safetensors
Model size
7.61B params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for OLAIR/ko-r1-7b-v2.0.3

Finetuned
(54)
this model

Dataset used to train OLAIR/ko-r1-7b-v2.0.3