Ross Wightman's picture

Ross Wightman

rwightman

·

AI & ML interests

Computer vision, transfer learning, semi/self supervised learning, robotics.

Recent Activity

liked a dataset about 12 hours ago

MLCommons/unsupervised_peoples_speech

upvoted an article 6 days ago

Open-R1: a fully open reproduction of DeepSeek-R1

reacted to merve's post with 🔥 10 days ago

Oof, what a week! 🥵 So many things have happened, let's recap! https://huggingface.co./collections/merve/jan-24-releases-6793d610774073328eac67a9 Multimodal 💬 - We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG 💗 - UI-TARS are new models by ByteDance to unlock agentic GUI control 🤯 in 2B, 7B and 72B - Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B - MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context - Dataset: Yale released a new benchmark called MMVU - Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark LLMs 📖 - DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! 🤯 - Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B - NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!) Audio 🗣️ - Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B - TangoFlux is a new audio generation model trained from scratch and aligned with CRPO Image/Video/3D Generation ⏯️ - Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux - tencent released Hunyuan3D-2, new 3D asset generation from images

View all activity

Articles

Timm ❤️ Transformers: Use any timm model with transformers

Trick or ResNet Treat

Mamba Out

Tiny Test Models

Searching for better (Full) ImageNet ViT Baselines

MobileNet Baselines

MobileNet-V4 (now in timm)

Organizations

rwightman's activity

upvoted an article 6 days ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

6 days ago

• 570

upvoted an article 17 days ago

Article

MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era

By

•

19 days ago

• 40

upvoted an article 18 days ago

Article

Timm ❤️ Transformers: Use any timm model with transformers

18 days ago

• 37

upvoted 3 papers about 2 months ago

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 125

Phi-4 Technical Report

Paper • 2412.08905 • Published Dec 12, 2024 • 106

PaliGemma 2: A Family of Versatile VLMs for Transfer

Paper • 2412.03555 • Published Dec 4, 2024 • 125

upvoted a collection about 2 months ago

Common Models

The first generation of models pretrained on Common Corpus. • 5 items • Updated Dec 5, 2024 • 28

upvoted 2 papers 2 months ago

MARS: Unleashing the Power of Variance Reduction for Training Large Models

Paper • 2411.10438 • Published Nov 15, 2024 • 13

Cautious Optimizers: Improving Training with One Line of Code

Paper • 2411.16085 • Published Nov 25, 2024 • 15

upvoted an article 2 months ago

Article

🤗 Serve any model with Inference Endpoints + Custom Handlers

By

•

Nov 22, 2024

• 3

upvoted 2 collections 4 months ago

RDNet

DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs [ECCV 2024] • 9 items • Updated Oct 16, 2024 • 3

timm tiny test models

A collection of very small (~300-500k parameter) models at 160x160 resolution, for testing purposes. Trained on ImageNet-1k. • 13 items • Updated Oct 2, 2024 • 5

upvoted 2 articles 6 months ago

Article

MobileNet Baselines

By

•

Jul 26, 2024

• 23

Article

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

Jul 25, 2024

• 18

upvoted a collection 6 months ago

🍃 MINT-1T

Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" • 13 items • Updated Jul 24, 2024 • 58

upvoted 2 papers 7 months ago

PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10, 2024 • 68

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Paper • 2406.16860 • Published Jun 24, 2024 • 60

upvoted a collection 7 months ago

Cambrian Data

3 items • Updated Jun 25, 2024 • 10

upvoted a paper 8 months ago

An Image is Worth 32 Tokens for Reconstruction and Generation

Paper • 2406.07550 • Published Jun 11, 2024 • 57

upvoted a collection 8 months ago

MobileCLIP Models + DataCompDR Data

MobileCLIP: Mobile-friendly image-text models with SOTA zero-shot capabilities. DataCompDR: Improved datasets for training image-text SOTA models. • 22 items • Updated Oct 4, 2024 • 26