|
--- |
|
library_name: transformers |
|
tags: [] |
|
--- |
|
|
|
# Introduction |
|
|
|
We release **STILL-3-1.5B-preview**, a slow-thinking reasoning model achieves 39.33% accuracy on AIME benchmark! We adapt reinforcement learning on 1.5B model and observe the continuous performance improvement as the number of training steps increased. For better reproducing our work and advancing research progress, we open-source our code, model, and data. |
|
|
|
Code: https://github.com/RUCAIBox/Slow_Thinking_with_LLMs |
|
|
|
# Evaluation |
|
|
|
We evaluated the model on four benchmarks: MATH, AIME, OMNI, and LiveAOPS. For MATH and AIME, we employed a sampling decoding setup with a sampling temperature of 0.6 and a top-p sampling probability of 0.95. Each question was sampled 64 times, and the average score was calculated. For OMNI and LiveAOPS (August-November 2024), we randomly sampled a subset of answers as integers to facilitate automated evaluation, and used greedy search decoding for the evaluation. The trained model, STILL-3-1.5B-preview, achieved significant improvement. The accuracy on the AIME task increased from 28.67% to 39.33%, resulting in a relative improvement of 37.18%. |
|
|
|
| | MATH | AIME | OMNI | LiveAOPS | Avg. | |
|
| --- | :---: | :---: | :---: | :---: | :---: | |
|
| Backbone | 84.04 | 28.67 | 25.60 | 33.33 | 42.91 | |
|
| STILL-3-1.5B-preview | **85.48** | **39.33** | **33.00** | **39.50** | **49.33** | |
|
|
|
|
|
# Quick Start |
|
|
|
``` |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
from vllm import LLM, SamplingParams |
|
|
|
# Load model and tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained("RUC-AIBOX/STILL-3-1.5B-preview") |
|
model = AutoModelForCausalLM.from_pretrained("RUC-AIBOX/STILL-3-1.5B-preview") |
|
|
|
# Input text |
|
question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\\theta),$ where $r > 0$ and $0 \\le \\theta < 2 \\pi.$" |
|
|
|
input_prompts = tokenizer.apply_chat_template( |
|
[ |
|
{"role": "user", "content": question}], |
|
tokenize=False, |
|
add_generation_prompt=True |
|
) |
|
|
|
|
|
# Params |
|
llm = LLM(model=model_path, tensor_parallel_size=1, dtype='bfloat16') |
|
|
|
sampling_params_gs = SamplingParams(temperature=0.6, top_p=0.95, max_tokens=32768, stop=stop_words, seed=42, skip_special_tokens=False) |
|
|
|
|
|
# Completion |
|
responses = model.generate(input_prompts, sampling_params) |
|
print(responses[0].outputs[0].text) |
|
``` |
|
|
|
|