d9-mAgyravvwWXZGi3sK5.png

SmolLM2-360M-Grpo-r999

SmolLM2-360M-Grpo-r999 is fine-tuned based on SmolLM2-360M-Instruct. SmolLM2 demonstrates significant advances over its predecessor, SmolLM1, particularly in instruction following, knowledge, and reasoning. The 360M model was trained on 2 trillion tokens using a diverse combination of datasets: FineWeb-Edu, DCLM, The Stack, along with new filtered datasets that we curated and will release soon. We developed the instruct version through supervised fine-tuning (SFT) using a combination of public datasets and our own curated datasets.

How to Use

Transformers

pip install transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "prithivMLmods/SmolLM2-360M-Grpo-r999"

device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

messages = [{"role": "user", "content": "What is gravity?"}]
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
print(input_text)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0]))

Limitations of SmolLM2-360M-Grpo-r999

  1. Model Size: While 360M parameters provide enhanced capabilities, the model still has limitations in handling highly complex reasoning tasks or long-context dependencies compared to larger models.

  2. Bias and Inaccuracy: Despite fine-tuning on diverse datasets, the model may generate biased, inaccurate, or factually incorrect responses, particularly for niche topics or specialized knowledge areas.

  3. Context Length: The model might struggle with very long conversations or extended prompts, potentially leading to truncation or loss of contextual coherence.

  4. Fine-Tuning Specificity: Performance on specialized domains may require additional fine-tuning with domain-specific datasets.

  5. Generalization: The model may not generalize as effectively to rare queries or unseen tasks compared to larger models, sometimes providing generic or incomplete answers.

  6. Limited Multi-Turn Conversations: While it supports multi-turn interactions, its ability to retain and use context over extended conversations is not as strong as larger models.

Intended Use of SmolLM2-360M-Grpo-r999

  1. General-purpose Conversational AI – Ideal for chatbots, virtual assistants, and interactive applications requiring basic reasoning and knowledge retrieval.

  2. Education & Tutoring – Supports answering educational queries, explaining concepts, and aiding learning across multiple domains.

  3. Content Generation – Can generate short-form text, summaries, and brainstorming ideas for writing assistants or creativity tools.

  4. Code Assistance – Fine-tuned on programming datasets, making it useful for debugging, explaining code, and assisting developers.

  5. Instruction Following – Optimized for following structured commands, making it suitable for task-based applications.

  6. Prototyping & Experimentation – Lightweight model for fast deployment in new AI applications, balancing performance with efficiency.

  7. Low-Resource Environments – Runs on edge devices, mobile apps, and local servers where larger models are infeasible.

  8. Research & Development – Can be used as a base model for further fine-tuning or model optimizations.

Downloads last month
289
Safetensors
Model size
362M params
Tensor type
F32
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for prithivMLmods/SmolLM2-360M-Grpo-r999

Finetuned
(46)
this model
Quantizations
3 models

Collection including prithivMLmods/SmolLM2-360M-Grpo-r999