--- library_name: transformers tags: - math license: mit datasets: - openai/gsm8k language: - en base_model: - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B pipeline_tag: text-generation --- # DeepMath-7B-M ## Model Overview DeepMath-7B-M is a fine-tuned version of [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co./deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) on the [GSM8K dataset](https://huggingface.co./datasets/gsm8k). This model is designed for mathematical reasoning and problem-solving, excelling in arithmetic, algebra, and word problems. ## Model Details - **Base Model:** DeepSeek-R1-Distill-Qwen-1.5B - **Fine-Tuning Dataset:** GSM8K - **Parameters:** 1.5 Billion - **Task:** Mathematical Question Answering (Math QA) - **Repository:** [codewithdark/deepmath-7b-m](https://huggingface.co./codewithdark/deepmath-7b-m) - **Commit Message:** "Full merged model for math QA" ## Training Details - **Dataset:** GSM8K (Grade School Math 8K) - a high-quality dataset for mathematical reasoning - **Fine-Tuning Framework:** Hugging Face Transformers & PyTorch - **Optimization Techniques:** - AdamW Optimizer - Learning rate scheduling - Gradient accumulation - Mixed precision training (FP16) - **Training Steps:** Multiple epochs on a high-performance GPU cluster ## Capabilities & Performance DeepMath-7B-M excels in: - Solving word problems with step-by-step reasoning - Performing algebraic and arithmetic computations - Understanding complex problem structures - Generating structured solutions with explanations ## Usage You can load and use the model via the Hugging Face `transformers` library: ```python from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("codewithdark/deepmath-7b-m") model = AutoModelForCausalLM.from_pretrained("codewithdark/deepmath-7b-m") input_text = "A farmer has 5 chickens and each lays 3 eggs a day. How many eggs in total after a week?" inputs = tokenizer(input_text, return_tensors="pt") outputs = model.generate(**inputs, max_length=100) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Limitations - May struggle with extremely complex mathematical proofs - Performance is limited to the scope of GSM8K-type problems - Potential biases in training data ## Future Work - Extending training to more diverse math datasets - Exploring larger models for improved accuracy - Fine-tuning on physics and higher-level mathematical reasoning datasets ## License This model is released under the mit License. ## Citation If you use this model, please cite: ```bibtex @misc{DeepMath-7B-M, author = {Ahsan}, title = {DeepMath-7B-M: Fine-Tuned DeepSeek-R1-Distill-Qwen-1.5B on GSM8K}, year = {2025}, url = {https://huggingface.co./codewithdark/deepmath-7b-m} } ```