3 86 87

Shyam Sunder Kumar

theainerd

AI & ML interests

Natural Language Processing

Recent Activity

upvoted an article 1 day ago

SigLIP 2: A better multilingual vision language encoder

reacted to AdinaY's post with 🔥 1 day ago

Wan2.1 🔥📹 new OPEN video model by Alibaba Wan team! Model: https://huggingface.co./Wan-AI/Wan2.1-T2V-14B Demo: https://huggingface.co./spaces/Wan-AI/Wan2.1 ✨Apache 2.0 ✨8.19GB VRAM, runs on most GPUs ✨Multi-Tasking: T2V, I2V, Video Editing, T2I, V2A ✨Text Generation: Supports Chinese & English ✨Powerful Video VAE: Encode/decode 1080P w/ temporal precision

reacted to burtenshaw's post with 🔥 2 days ago

Now the Hugging Face agent course is getting real! With frameworks like smolagents, LlamaIndex, and LangChain. 🔗 Follow the org for updates https://huggingface.co./agents-course This week we are releasing the first framework unit in the course and it’s on smolagents. This is what the unit covers: - why should you use smolagents vs another library? - how to build agents that use code - build multiagents systems - use vision language models for browser use The team has been working flat out on this for a few weeks. Led by @sergiopaniego and supported by smolagents author @m-ric.

View all activity

Organizations

theainerd's activity

upvoted an article 1 day ago

Article

SigLIP 2: A better multilingual vision language encoder

8 days ago

• 114

reacted to AdinaY's post with 🔥 1 day ago

Post

2584

Wan2.1 🔥📹 new OPEN video model by Alibaba Wan team!

Model: Wan-AI/Wan2.1-T2V-14B
Demo: Wan-AI/Wan2.1

✨Apache 2.0
✨8.19GB VRAM, runs on most GPUs
✨Multi-Tasking: T2V, I2V, Video Editing, T2I, V2A
✨Text Generation: Supports Chinese & English
✨Powerful Video VAE: Encode/decode 1080P w/ temporal precision

1 reply

reacted to burtenshaw's post with 🔥 2 days ago

Post

5462

Now the Hugging Face agent course is getting real! With frameworks like smolagents, LlamaIndex, and LangChain.

🔗 Follow the org for updates https://huggingface.co./agents-course

This week we are releasing the first framework unit in the course and it’s on smolagents. This is what the unit covers:

- why should you use smolagents vs another library?
- how to build agents that use code
- build multiagents systems
- use vision language models for browser use

The team has been working flat out on this for a few weeks. Led by @sergiopaniego and supported by smolagents author @m-ric .

liked a Space 3 days ago

209

AI Deadlines

⚡

Generate project deadlines

reacted to stefan-it's post with 👍 3 days ago

Post

5009

She arrived 😍

[Expect more models soon...]

2 replies

upvoted a paper 5 days ago

LightThinker: Thinking Step-by-Step Compression

Paper • 2502.15589 • Published 7 days ago • 25

liked a dataset 6 days ago

facebook/natural_reasoning

Viewer • Updated 8 days ago • 1.15M • 3.78k • 254

upvoted 2 papers 6 days ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published 8 days ago • 118

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published 8 days ago • 167

upvoted a paper 8 days ago

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published 8 days ago • 92

reacted to cogwheelhead's post with 👍 8 days ago

Post

2497

Me and my team have performed an in-depth investigation comparing o1 to R1 (and other reasoning models)

Link: https://toloka.ai/blog/r1-is-not-on-par-with-o1-and-the-difference-is-qualitative-not-quantitative

It started with us evaluating them on our own university-math benchmarks: U-MATH for problem-solving and μ-MATH for judging solution correctness (see the HF leaderboard: toloka/u-math-leaderboard)

tl;dr: R1 sure is amazing, but what we find is that it lags behind in novelty adaptation and reliability:
* performance drops when updating benchmarks with fresh unseen tasks (e.g. AIME 2024 -> 2025)
* R1-o1 gap widens when evaluating niche subdomains (e.g. university-specific math instead of the more common Olympiad-style contests)
* same with going into altogether unconventional domains (e.g. chess) or skills (e.g. judgment instead of problem-solving)
* R1 also runs into failure modes way more often (e.g. making illegal chess moves or falling into endless generation loops)

Our point here is not to bash on DeepSeek — they've done exceptional work, R1 is a game-changer, and we have no intention to downplay that. R1's release is a perfect opportunity to study where all these models differ and gain understanding on how to move forward from here

liked 2 Spaces 8 days ago

155

Open Object Detection Leaderboard

🏆

Request model evaluation on COCO val 2017 dataset

Paligemma2 Mix

🌖

Generate text or segment objects from an image

liked a dataset 8 days ago

microsoft/IMAGE_UNDERSTANDING

Viewer • Updated Sep 20, 2024 • 10.2k • 560 • 6

upvoted 2 papers 8 days ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published 9 days ago • 150

Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs

Paper • 2502.12982 • Published 10 days ago • 13

reacted to dreamerdeo's post with ➕🚀 9 days ago

Post

2752

🚀 Excited to share our technical report on the Southeast Asian multilingual model Sailor2 and its latest updates!

Our 49-page report details Sailor2's development journey, including multilingual data cleaning, small model data mixture simulations, multi-stage continual pre-training, multi-stage post-training, and multi-cultural multi-lingual evaluations. Sailor2 aims to streamline the multilingual model pre-training process efficiently for the community.

🧭 We highlight Sailor2's impressive performance in low-resource language translation scenarios and its cultural understanding advantages in Southeast Asia, promoting practical applications for regional languages.

Model updates include:
💡 More precise outputs: Reduced redundancy in model outputs through refined post-training data and optimization techniques.
🌈 Handling longer texts: Expanded to handle up to 128K context length in Southeast Asian languages through long-text training.
⚡️ Faster inference: Achieved 2.5x faster inference speed with speculative decoding.
🌪️ More model sizes: Introduced new sizes of 3B and 14B through model pruning.

🌟 All models are Apache-licensed for commercial use; development tools (code, resources) are open-source.

📚 Technical report: Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs (2502.12982)
🤖️ Models: sail/sailor2-language-models-674d7c9e6b4dbbd9a869906b
💬 Demo: sail/Sailor2-20B-Chat
📣 Sailor2 community: https://huggingface.co./sailor2

liked 2 Spaces 9 days ago

809

FineWeb: decanting the web for the finest text data at scale

🍷

Generate high-quality web text data for LLM training

1.79k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters