Merve Noyan's picture

Merve Noyan

merve

·

https://github.com/merveenoyan/smol-vision

AI & ML interests

VLMs, vision & co

Recent Activity

posted an update 1 day ago

Oof, what a week! 🥵 So many things have happened, let's recap! https://huggingface.co./collections/merve/jan-24-releases-6793d610774073328eac67a9 Multimodal 💬 - We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG 💗 - UI-TARS are new models by ByteDance to unlock agentic GUI control 🤯 in 2B, 7B and 72B - Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B - MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context - Dataset: Yale released a new benchmark called MMVU - Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark LLMs 📖 - DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! 🤯 - Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B - NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!) Audio 🗣️ - Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B - TangoFlux is a new audio generation model trained from scratch and aligned with CRPO Image/Video/3D Generation ⏯️ - Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux - tencent released Hunyuan3D-2, new 3D asset generation from images

updated a collection 1 day ago

Jan 24 Releases

updated a collection 1 day ago

Jan 24 Releases

View all activity

Articles

We now support VLMs in smolagents!

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

Introducing smolagents: simple agents that write actions in code.

Welcome PaliGemma 2 – New vision language models by Google

SmolVLM - small yet mighty Vision Language Model

Llama can now see and run on your device - welcome Llama 3.2

Preference Optimization for Vision Language Models

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

PaliGemma – Google's Cutting-Edge Open Vision Language Model

Vision Language Models Explained

Introduction to Quantization cooked in 🤗 with 💗🧑‍🍳

Deploy MusicGen in no time with Inference Endpoints

Open-Source Text Generation & LLM Ecosystem at Hugging Face

Jupyter X Hugging Face

Using Machine Learning to Aid Survivors and Race through Time

Introducing Skops

Announcing the Hugging Face Fellowship Program

Hosting your Models and Datasets on Hugging Face Spaces using Streamlit

Showcase Your Projects in Spaces using Gradio

Organizations

merve's activity

liked a model 1 day ago

Qwen/Qwen2.5-Math-PRM-72B

Text Classification • Updated 9 days ago • 884 • 64

liked 2 datasets 1 day ago

nvidia/AceMath-Instruct-Training-Data

Viewer • Updated 9 days ago • 5.56M • 1.24k • 24

cais/hle

Viewer • Updated 3 days ago • 3k • 695 • 85

liked 7 models 1 day ago

deepseek-ai/DeepSeek-R1

Text Generation • Updated 3 days ago • 109k • 2.6k

MiniMaxAI/MiniMax-VL-01

Image-Text-to-Text • Updated about 13 hours ago • 1.84k • 220

VITA-MLLM/VITA-1.5

Video-Text-to-Text • Updated 10 days ago • 728 • 32

bytedance-research/UI-TARS-7B-DPO

Image-Text-to-Text • Updated about 12 hours ago • 4.85k • 61

bytedance-research/UI-TARS-72B-SFT

Image-Text-to-Text • Updated about 12 hours ago • 104 • 8

DAMO-NLP-SG/VideoLLaMA3-7B

Visual Question Answering • Updated 1 day ago • 676 • 17

declare-lab/TangoFlux

Text-to-Audio • Updated 4 days ago • 2.85k • 70

liked 2 Spaces 1 day ago

Running on Zero

SmolVLM

SmolVLM 500M Instruct WebGPU

liked a Space 9 days ago

Running on Zero

MatchAnything

liked 2 datasets 9 days ago

omkarthawakar/VRC-Bench

Viewer • Updated 13 days ago • 1k • 2.06k • 12

microsoft/PEACE

Viewer • Updated 16 days ago • 7.73k • 2.52k • 12

liked 5 models 9 days ago

ByteDance/Sa2VA-26B

Image-Text-to-Text • Updated 12 days ago • 124 • 10

lightblue/lb-reranker-0.5B-v1.0

Text Generation • Updated 5 days ago • 1.88k • 60

jxm/cde-small-v2

Feature Extraction • Updated 9 days ago • 3.52k • 70

jinaai/ReaderLM-v2

Text Generation • Updated 4 days ago • 18k • 415

MiniMaxAI/MiniMax-Text-01

Text Generation • Updated 9 days ago • 4.48k • 476