--- library_name: transformers license: mit datasets: - OLAIR/Open-R1-Ko-SFT-v2.0 language: - ko base_model: - deepseek-ai/DeepSeek-R1-Distill-Qwen-14B --- ## 1. Overview **Model Name:** OLAIR/ko-r1-14b-v2.0.3 **Model Type:** Large Language Model (LLM) for Korean language understanding and reasoning **Version:** 2.0.3 This model is designed to provide Korean language capabilities with a focus on reasoning tasks. It is a scaled-up version of [OLAIR/ko-r1-7b-v2.0.3](https://huggingface.co./OLAIR/ko-r1-7b-v2.0.3) for experimental purposes. ## 2. Benchmark Performance The model's performance has been evaluated using the HAE-RAE Reasoning Challenge (HRC), which measures reasoning abilities across five domains. | Model | Chemistry | Math | Physics | Physics Word Puzzles | Puzzles | Average | |---------------------------------------|-----------|--------|---------|----------------------|---------|---------| | o1-2024-12-17 | 57.14 | 78.18 | 77.78 | 80.00 | 84.62 | 75.54 | | o3-mini-high | 57.14 | 81.82 | 77.78 | 70.00 | 69.23 | 71.19 | | o3-mini-2025-01-31 | 50.00 | 80.00 | 70.37 | 50.00 | 76.92 | 65.46 | | o1-mini-2024-09-12 | 42.86 | 56.36 | 70.37 | 60.00 | 15.38 | 48.99 | | Deepseek-R1 | 50.00 | 54.55 | 62.96 | 70.00 | 7.69 | 49.04 | | gpt-4o-2024-11-20 | 35.71 | 32.73 | 51.85 | 50.00 | 53.85 | 44.83 | | **Ko-R1-14B-v2.0.3** | 28.57 | 50.9 | 48.14 | 30.00 | 30.77 | 37.68 | | Exaone-3.5-32B-Instruct | 21.43 | 30.91 | 25.93 | 50.00 | 38.46 | 33.35 | | Qwen2.5-72B-Instruct | 35.71 | 30.91 | 51.85 | 20.00 | 23.08 | 32.31 | | **Ko-R1-7B-v2.0.3** | 7.14 | 61.82 | 40.74 | 40.00 | 0.00 | 29.94 | | Ko-R1-7B-v1 | 7.14 | 63.64 | 37.04 | 40.00 | 0.00 | 29.56 | | gpt-4o-mini-2024-07-18 | 21.43 | 29.09 | 37.04 | 50.00 | 0.00 | 27.51 | | UNIVA-Bllossom_DeepSeek-llama3.1-Bllossom-8B | 28.57 | 16.36 | 33.33 | 10.00 | 15.38 | 20.73 | **Comparison of Average Scores** Ko-R1-14B-v2.0.3 outperforms non-reasoning models of way bigger size like Exaone-3.5-32B-Instruct, Qwen2.5-72B-Instruct, and gpt-4o-mini-2024-07-18.
**Training Rewards**
Even when trained with the same datasets, bigger models just learn MORE.
## 3. Limitations
- The model is still vulnerable to Korean-related inputs, leading to endless loops of thinking. We are working to fix it.
## ETC
How to Cite
```
To be added
```
Contact
```
spthsrbwls123@yonsei.ac.kr
```