忍者

byteprobe

AI & ML interests

RL | NLP | LLM | multimodal | agent

Recent Activity

liked a model 15 days ago

tomg-group-umd/huginn-0125

upvoted a collection 15 days ago

Nomic Embed v2

liked a model 15 days ago

nomic-ai/nomic-embed-text-v2-moe

View all activity

Organizations

byteprobe's activity

upvoted a collection 15 days ago

Nomic Embed v2

Collection

Multilingual Embedding Models • 4 items • Updated 13 days ago • 13

upvoted 16 papers 15 days ago

Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate

Paper • 2501.17703 • Published about 1 month ago • 55

Demystifying Long Chain-of-Thought Reasoning in LLMs

Paper • 2502.03373 • Published 23 days ago • 54

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published 21 days ago • 120

Baichuan-Omni-1.5 Technical Report

Paper • 2501.15368 • Published Jan 26 • 61

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published 29 days ago • 56

GuardReasoner: Towards Reasoning-based LLM Safeguards

Paper • 2501.18492 • Published 29 days ago • 82

The Differences Between Direct Alignment Algorithms are a Blur

Paper • 2502.01237 • Published 25 days ago • 111

s1: Simple test-time scaling

Paper • 2501.19393 • Published 28 days ago • 107

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 108

Reward-Guided Speculative Decoding for Efficient LLM Reasoning

Paper • 2501.19324 • Published 28 days ago • 37

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

Paper • 2502.01061 • Published 26 days ago • 183

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 24 days ago • 195

upvoted an article 15 days ago

Article

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

Jan 23

• 143

upvoted a paper 15 days ago

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published 18 days ago • 140

upvoted an article 15 days ago

Article

1 Billion Classifications

16 days ago

• 39