Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
1324
163
260
Merve Noyan
merve
Follow
Salwa-Zeitoun's profile picture
layeco7648's profile picture
TinySuitStarfish's profile picture
6188 followers
Β·
226 following
https://github.com/merveenoyan/smol-vision
mervenoyann
merveenoyan
merve.bsky.social
AI & ML interests
VLMs, vision & co
Recent Activity
posted
an
update
1 day ago
Oof, what a week! π₯΅ So many things have happened, let's recap! https://huggingface.co./collections/merve/jan-24-releases-6793d610774073328eac67a9 Multimodal π¬ - We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG π - UI-TARS are new models by ByteDance to unlock agentic GUI control π€― in 2B, 7B and 72B - Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B - MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context - Dataset: Yale released a new benchmark called MMVU - Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark LLMs π - DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! π€― - Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B - NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!) Audio π£οΈ - Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B - TangoFlux is a new audio generation model trained from scratch and aligned with CRPO Image/Video/3D Generation β―οΈ - Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux - tencent released Hunyuan3D-2, new 3D asset generation from images
updated
a collection
1 day ago
Jan 24 Releases
updated
a collection
1 day ago
Jan 24 Releases
View all activity
Articles
We now support VLMs in smolagents!
2 days ago
β’
27
SmolVLM Grows Smaller β Introducing the 250M & 500M Models!
3 days ago
β’
74
Introducing smolagents: simple agents that write actions in code.
26 days ago
β’
522
Welcome PaliGemma 2 β New vision language models by Google
Dec 5, 2024
β’
128
SmolVLM - small yet mighty Vision Language Model
Nov 26, 2024
β’
166
Llama can now see and run on your device - welcome Llama 3.2
Sep 25, 2024
β’
182
Preference Optimization for Vision Language Models
Jul 10, 2024
β’
55
Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models
Jun 24, 2024
β’
184
PaliGemma β Google's Cutting-Edge Open Vision Language Model
May 14, 2024
β’
234
Vision Language Models Explained
Apr 11, 2024
β’
246
Introduction to Quantization cooked in π€ with ππ§βπ³
Aug 25, 2023
β’
26
Deploy MusicGen in no time with Inference Endpoints
Aug 4, 2023
β’
4
Open-Source Text Generation & LLM Ecosystem at Hugging Face
Jul 17, 2023
β’
2
Jupyter X Hugging Face
Mar 23, 2023
β’
2
Using Machine Learning to Aid Survivors and Race through Time
Mar 3, 2023
β’
6
Introducing Skops
Aug 12, 2022
β’
1
Announcing the Hugging Face Fellowship Program
May 17, 2022
β’
6
Hosting your Models and Datasets on Hugging Face Spaces using Streamlit
Oct 5, 2021
β’
3
Showcase Your Projects in Spaces using Gradio
Oct 5, 2021
β’
6
Organizations
merve
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
liked
a model
1 day ago
Qwen/Qwen2.5-Math-PRM-72B
Text Classification
β’
Updated
9 days ago
β’
884
β’
64
liked
2 datasets
1 day ago
nvidia/AceMath-Instruct-Training-Data
Viewer
β’
Updated
9 days ago
β’
5.56M
β’
1.24k
β’
24
cais/hle
Viewer
β’
Updated
3 days ago
β’
3k
β’
695
β’
85
liked
7 models
1 day ago
deepseek-ai/DeepSeek-R1
Text Generation
β’
Updated
3 days ago
β’
109k
β’
2.6k
MiniMaxAI/MiniMax-VL-01
Image-Text-to-Text
β’
Updated
about 13 hours ago
β’
1.84k
β’
220
VITA-MLLM/VITA-1.5
Video-Text-to-Text
β’
Updated
10 days ago
β’
728
β’
32
bytedance-research/UI-TARS-7B-DPO
Image-Text-to-Text
β’
Updated
about 12 hours ago
β’
4.85k
β’
61
bytedance-research/UI-TARS-72B-SFT
Image-Text-to-Text
β’
Updated
about 12 hours ago
β’
104
β’
8
DAMO-NLP-SG/VideoLLaMA3-7B
Visual Question Answering
β’
Updated
1 day ago
β’
676
β’
17
declare-lab/TangoFlux
Text-to-Audio
β’
Updated
4 days ago
β’
2.85k
β’
70
liked
2 Spaces
1 day ago
Running
on
Zero
31
π
SmolVLM
Running
19
π»
SmolVLM 500M Instruct WebGPU
liked
a Space
9 days ago
Running
on
Zero
116
π’
MatchAnything
liked
2 datasets
9 days ago
omkarthawakar/VRC-Bench
Viewer
β’
Updated
13 days ago
β’
1k
β’
2.06k
β’
12
microsoft/PEACE
Viewer
β’
Updated
16 days ago
β’
7.73k
β’
2.52k
β’
12
liked
5 models
9 days ago
ByteDance/Sa2VA-26B
Image-Text-to-Text
β’
Updated
12 days ago
β’
124
β’
10
lightblue/lb-reranker-0.5B-v1.0
Text Generation
β’
Updated
5 days ago
β’
1.88k
β’
60
jxm/cde-small-v2
Feature Extraction
β’
Updated
9 days ago
β’
3.52k
β’
70
jinaai/ReaderLM-v2
Text Generation
β’
Updated
4 days ago
β’
18k
β’
415
MiniMaxAI/MiniMax-Text-01
Text Generation
β’
Updated
9 days ago
β’
4.48k
β’
476
Load more