Jaward Sesay

Jaward

AI & ML interests

I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Recent Activity

posted an update 25 minutes ago

made a few improvements on custom grpo trainer: - added sequence similarity reward (seems to work) - improved vllm support (5x inference speed) - adjusted reward scores (this helped with format/accuracy) - can now push to hf hub (already pushed mine lol: https://huggingface.co./Jaward/smollm2_360m_grpo_gsm8k_reasoner) Code: https://github.com/Jaykef/ai-algorithms/blob/main/smollm2_360M_135M_grpo_gsm8k.ipynb

updated a model about 1 hour ago

Jaward/smollm2_360m_grpo_gsm8k_reasoner

published a model about 1 hour ago

Jaward/smollm2_360m_grpo_gsm8k_reasoner

View all activity

Organizations

Jaward's activity

posted an update 25 minutes ago

Post

made a few improvements on custom grpo trainer:
- added sequence similarity reward (seems to work)
- improved vllm support (5x inference speed)
- adjusted reward scores (this helped with format/accuracy)
- can now push to hf hub (already pushed mine lol: Jaward/smollm2_360m_grpo_gsm8k_reasoner)

Code: https://github.com/Jaykef/ai-algorithms/blob/main/smollm2_360M_135M_grpo_gsm8k.ipynb

updated a model about 1 hour ago

Jaward/smollm2_360m_grpo_gsm8k_reasoner

Text Generation • Updated about 1 hour ago

published a model about 1 hour ago

Jaward/smollm2_360m_grpo_gsm8k_reasoner

Text Generation • Updated about 1 hour ago

liked a dataset 1 day ago

facebook/natural_reasoning

Viewer • Updated 7 days ago • 1.15M • 3.78k • 252

replied to their post 10 days ago

bro if you had read the repo you would see that this implementation is for educational purpose, it's not done because it's easy. Not to mention unsloth is using trl's GRPO trainer which is super slow on cpu and does not scale for models under 500M params, I tried it both on cpu and gpu. This custom implementation cuts most of the heavy lifting allowing you to train and scale faster even on cpu, plus a bunch of custom configs with a simplified GRPO trainer in under 500 lines of code. There's a lot one can learn from it.

posted an update 12 days ago

Post

3832

Finally here it is: a faster, custom, scalable GRPO trainer for smaller models with < 500M params, can train on 8gb ram cpu, also supports gpu for sanity sake (includes support for vllm + flash attention). Using smolLM2-135M/360M-instructs as ref & base models. Experience your own “aha” moment 🐳 on 8gb ram.
Code: https://github.com/Jaykef/ai-algorithms/blob/main/smollm2_360M_135M_grpo_gsm8k.ipynb

2 replies

liked a model 17 days ago

HuggingFaceTB/SmolVLM-Instruct

Image-Text-to-Text • Updated Dec 2, 2024 • 105k • 399

liked a model 21 days ago

HuggingFaceTB/SmolLM2-135M-Instruct

Text Generation • Updated 22 days ago • 159k • • 142

posted an update 24 days ago

Post

3410

ByteDance drops OmniHuman🔥
This is peak SOTA performance - flawless natural gestures with perfect lip sync and facial expressions. This is the second time they've released SOTA level talking-heads only this time with hands and body motion.
Project: https://omnihuman-lab.github.io/

3 replies

upvoted 2 papers 24 days ago

Process Reinforcement through Implicit Rewards

Paper • 2502.01456 • Published 25 days ago • 54

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

Paper • 2502.01061 • Published 25 days ago • 183

posted an update 28 days ago

Post

1497

The beauty in GRPO is the fact that it doesn’t care if the rewards are rule-based or learned, the hack: let the data self-normalize— trajectories in a batch compete against their mean, no value model, no extra params, just clean, efficient RL that cuts memory usage by 50%, while maintaining SOTA performance. btw it was introduced 9months prior to R1: arxiv.org/pdf/2402.03300

1 reply

upvoted an article 28 days ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28

• 782

liked a model about 1 month ago

deepseek-ai/DeepSeek-R1

Text Generation • Updated 5 days ago • 4.63M • • 10.5k

liked a Space about 1 month ago

542

DeepSeek-R1 WebGPU

🧠

Next-generation reasoning model that runs locally in-browser

upvoted a paper about 1 month ago

Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17 • 106

reacted to mlabonne's post with 🧠 about 1 month ago

Post

5247

🆕 LLM Course 2025 edition!

I updated the LLM Scientist roadmap and added a ton of new information and references. It covers training, datasets, evaluation, quantization, and new trends like test-time compute scaling.

The LLM Course has been incredibly popular (41.3k stars!) and I've been touched to receive many, many messages about how it helped people in their careers.

I know how difficult this stuff can be, so I'm super proud of the impact it had. I want to keep updating it in 2025, especially with the LLM Engineer roadmap.

Thanks everyone, hope you'll enjoy it!

💻 LLM Course: https://huggingface.co./blog/mlabonne/llm-course

liked a model about 1 month ago

unsloth/phi-4-GGUF

Text Generation • Updated Jan 13 • 35k • 155

posted an update about 2 months ago

Post

1875

minimal single script implementation of knowledge distillation in LLMs. In this implementation, we use GPT-2 (124M) as student model and GPT-2 Medium (340M) as teacher via reverse Kullback-Leibler (KL) divergence, trained on a small chunk of openwebtext.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/llm_knowledge_distillation.ipynb

liked a model about 2 months ago

deepseek-ai/DeepSeek-V3

Text Generation • Updated 5 days ago • 3.29M • • 3.57k