File size: 3,284 Bytes
db959b7 1283418 68b4650 db959b7 1283418 db959b7 1283418 db959b7 1283418 db959b7 1283418 db959b7 1283418 db959b7 1283418 db959b7 1283418 db959b7 1283418 db959b7 1283418 db959b7 1283418 db959b7 1283418 db959b7 1283418 db959b7 1283418 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
---
library_name: transformers
license: mit
datasets:
- OLAIR/Open-R1-Ko-SFT-v2.0
language:
- ko
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
---
## 1. Overview
**Model Name:** OLAIR/ko-r1-14b-v2.0.3
**Model Type:** Large Language Model (LLM) for Korean language understanding and reasoning
**Version:** 2.0.3
This model is designed to provide Korean language capabilities with a focus on reasoning tasks. It is a scaled-up version of [OLAIR/ko-r1-7b-v2.0.3](https://huggingface.co./OLAIR/ko-r1-7b-v2.0.3) for experimental purposes.
## 2. Benchmark Performance
The model's performance has been evaluated using the HAE-RAE Reasoning Challenge (HRC), which measures reasoning abilities across five domains.
| Model | Chemistry | Math | Physics | Physics Word Puzzles | Puzzles | Average |
|---------------------------------------|-----------|--------|---------|----------------------|---------|---------|
| o1-2024-12-17 | 57.14 | 78.18 | 77.78 | 80.00 | 84.62 | 75.54 |
| o3-mini-high | 57.14 | 81.82 | 77.78 | 70.00 | 69.23 | 71.19 |
| o3-mini-2025-01-31 | 50.00 | 80.00 | 70.37 | 50.00 | 76.92 | 65.46 |
| o1-mini-2024-09-12 | 42.86 | 56.36 | 70.37 | 60.00 | 15.38 | 48.99 |
| Deepseek-R1 | 50.00 | 54.55 | 62.96 | 70.00 | 7.69 | 49.04 |
| gpt-4o-2024-11-20 | 35.71 | 32.73 | 51.85 | 50.00 | 53.85 | 44.83 |
| **Ko-R1-14B-v2.0.3** | 28.57 | 50.9 | 48.14 | 30.00 | 30.77 | 37.68 |
| Exaone-3.5-32B-Instruct | 21.43 | 30.91 | 25.93 | 50.00 | 38.46 | 33.35 |
| Qwen2.5-72B-Instruct | 35.71 | 30.91 | 51.85 | 20.00 | 23.08 | 32.31 |
| **Ko-R1-7B-v2.0.3** | 7.14 | 61.82 | 40.74 | 40.00 | 0.00 | 29.94 |
| Ko-R1-7B-v1 | 7.14 | 63.64 | 37.04 | 40.00 | 0.00 | 29.56 |
| gpt-4o-mini-2024-07-18 | 21.43 | 29.09 | 37.04 | 50.00 | 0.00 | 27.51 |
| UNIVA-Bllossom_DeepSeek-llama3.1-Bllossom-8B | 28.57 | 16.36 | 33.33 | 10.00 | 15.38 | 20.73 |
**Comparison of Average Scores**
Ko-R1-14B-v2.0.3 outperforms non-reasoning models of way bigger size like Exaone-3.5-32B-Instruct, Qwen2.5-72B-Instruct, and gpt-4o-mini-2024-07-18.
<p align="center"><img src="https://cdn-uploads.huggingface.co/production/uploads/60d3e619b8448e1785bbda2a/GA8UI4jr4GlSRrFJBa8nl.png" width="525"/>
**Training Rewards**
Even when trained with the same datasets, bigger models just learn MORE.
<p align="center"><img src="https://cdn-uploads.huggingface.co/production/uploads/60d3e619b8448e1785bbda2a/XAeOlEBnMSp9pQP0I1IU7.png" width="525"/>
## 3. Limitations
- The model is still vulnerable to Korean-related inputs, leading to endless loops of thinking. We are working to fix it.
## ETC
How to Cite
```
To be added
```
Contact
```
[email protected]
``` |