Model Overview

This model is a fine-tuned version of the Qwen2.5-3B base model, enhanced using Low-Rank Adaptation (LoRA) techniques via the MLX framework. The fine-tuning process utilized the isaiahbjork/chain-of-thought dataset, comprising 7,143 examples, over 600 iterations. This enhancement aims to improve the model's performance in tasks requiring multi-step reasoning and problem-solving.

Model Architecture

  • Base Model: Qwen2.5-3B
  • Model Type: Causal Language Model
  • Architecture: Transformer with Rotary Position Embedding (RoPE), SwiGLU activation, RMSNorm normalization, attention QKV bias, and tied word embeddings
  • Parameters: 3.09 billion
  • Layers: 36
  • Attention Heads: 16 for query, 2 for key and value (GQA)

Fine-Tuning Details

  • Technique: Low-Rank Adaptation (LoRA)
  • Framework: MLX
  • Dataset: isaiahbjork/chain-of-thought
  • Dataset Size: 7,143 examples
  • Iterations: 600

LoRA was employed to efficiently fine-tune the model by adjusting a subset of parameters, reducing computational requirements while maintaining performance. The MLX framework facilitated this process, leveraging Apple silicon hardware for optimized training.

Downloads last month
32
GGUF
Model size
3.09B params
Architecture
qwen2
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support text-generation models for mlx library.

Model tree for ApatheticWithoutTheA/Qwen-2.5-3B-Reasoning

Base model

Qwen/Qwen2.5-3B
Quantized
(118)
this model

Dataset used to train ApatheticWithoutTheA/Qwen-2.5-3B-Reasoning