Merve Noyan

merve

AI & ML interests

VLMs, vision & co

Recent Activity

updated a collection 1 day ago
Jan 24 Releases
updated a collection 1 day ago
Jan 24 Releases
View all activity

Articles

Organizations

Hugging Face's profile picture Google's profile picture Deprem Yapay Zeka's profile picture Notebooks-explorers's profile picture SODA's profile picture Deprem Private's profile picture PyTorch Image Models's profile picture Turkish NLP Dataset Creators's profile picture Templates's profile picture Demo Crafters ๐Ÿค— 's profile picture Keras's profile picture tensorflow's profile picture Mukayese's profile picture HugGAN Community's profile picture EPFL VILAB's profile picture Hugging Face Fellows's profile picture Tools's profile picture Huggingface.js's profile picture scikit-learn's profile picture JAX โ™ฅ๏ธ Diffusers ๐Ÿงจ's profile picture HuggingFaceM4's profile picture 2023 Jan Offsite hackathon's profile picture HF Canonical Model Maintainers's profile picture scikit-learn's profile picture Huggingface Projects's profile picture fastai X Hugging Face Group 2022's profile picture boun-tabi-LMG's profile picture skops-tests's profile picture Kornia AI's profile picture Hugging Face H4's profile picture Keras Dreambooth Event's profile picture Turkish T5 - BERT - GPT-2's profile picture Blog-explorers's profile picture Hugging Face for Computer Vision's profile picture Hacktoberfest 2023's profile picture Hugging Face TB Research's profile picture adept-hf-collab's profile picture ZeroGPU Explorers's profile picture kotol's profile picture Magic Leap Community's profile picture Llava Hugging Face's profile picture MLX Community's profile picture Social Post Explorers's profile picture Top Contributors: Profile Followers's profile picture Dev Mode Explorers's profile picture Paris AI Running Club's profile picture yorg's profile picture CVPR2024's profile picture Les papiers de Merve's profile picture nltpt's profile picture s0409's profile picture Hugging Face FineVideo's profile picture mv's profile picture Cookbook Authors's profile picture open/ acc's profile picture Agents's profile picture University of Sydney's profile picture

merve's activity

posted an update 1 day ago
view post
Post
1741
Oof, what a week! ๐Ÿฅต So many things have happened, let's recap! merve/jan-24-releases-6793d610774073328eac67a9

Multimodal ๐Ÿ’ฌ
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG ๐Ÿ’—
- UI-TARS are new models by ByteDance to unlock agentic GUI control ๐Ÿคฏ in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark

LLMs ๐Ÿ“–
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! ๐Ÿคฏ
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)

Audio ๐Ÿ—ฃ๏ธ
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO

Image/Video/3D Generation โฏ๏ธ
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images
ยท
reacted to m-ric's post with ๐Ÿ‘€โค๏ธ 1 day ago
view post
Post
1306
Today we make the biggest release in smolagents so far: ๐˜„๐—ฒ ๐—ฒ๐—ป๐—ฎ๐—ฏ๐—น๐—ฒ ๐˜ƒ๐—ถ๐˜€๐—ถ๐—ผ๐—ป ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€, ๐˜„๐—ต๐—ถ๐—ฐ๐—ต ๐—ฎ๐—น๐—น๐—ผ๐˜„๐˜€ ๐˜๐—ผ ๐—ฏ๐˜‚๐—ถ๐—น๐—ฑ ๐—ฝ๐—ผ๐˜„๐—ฒ๐—ฟ๐—ณ๐˜‚๐—น ๐˜„๐—ฒ๐—ฏ ๐—ฏ๐—ฟ๐—ผ๐˜„๐˜€๐—ถ๐—ป๐—ด ๐—ฎ๐—ด๐—ฒ๐—ป๐˜๐˜€! ๐Ÿฅณ

Our agents can now casually open up a web browser, and navigate on it by scrolling, clicking elements on the webpage, going back, just like a user would.

The demo below shows Claude-3.5-Sonnet browsing GitHub for task: "Find how many commits the author of the current top trending repo did over last year."
Hi @mlabonne !

Go try it out, it's the most cracked agentic stuff I've seen in a while ๐Ÿคฏ (well, along with OpenAI's Operator who beat us by one day)

For more detail, read our announcement blog ๐Ÿ‘‰ https://huggingface.co./blog/smolagents-can-see
The code for the web browser example is here ๐Ÿ‘‰ https://github.com/huggingface/smolagents/blob/main/examples/vlm_web_browser.py
ยท