README.md · ApatheticWithoutTheA/gemma-2-2b-it-R1-Reasoning at main

metadata

license: mit
datasets:
  - sequelbox/Raiden-DeepSeek-R1
language:
  - en
base_model:
  - google/gemma-2-2b-it
pipeline_tag: text-generation
library_name: mlx
tags:
  - gguf
  - reasoning
  - chain-of-thought
  - CoT
  - Gemma

Model Summary

This model is a fine-tuned version of gemma-2-2b-it, optimized for instruction-following and reasoning tasks. It was trained using MLX and LoRA on the sequelbox/Raiden-DeepSeek-R1 dataset, which consists of 62.9k examples generated by Deepseek R1. The fine-tuning process ran for 600 iterations to enhance the model’s ability to reason through more complex problems.

Model Details

Base Model: gemma-2-2b-it
Fine-tuning Method: MLX + LoRA
Dataset: sequelbox/Raiden-DeepSeek-R1
Iterations: 600

Capabilities

This model improves upon gemma-2-2b-it with additional instruction-following and reasoning capabilities derived from Deepseek R1-generated examples. The model will answer questions with a straight-forward answer for simple questions, and generate long chain-of-thought reasoning text for more complex problems. It is well-suited for:

Question answering
Reasoning-based tasks
Coding
Running on consumer hardware

Limitations

Sometimes chain-of-thought reasoning is not triggered for more complex problems when it probably should be. You can nudge the model if needed by simply asking it to show its thoughts and it will generate think tags and begin reasoning.
With harder than average complex reasoning problems, the model can get stuck in long "thinking" thought loops without ever coming to a conclusive answer.