Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
anditoย 
posted an update about 13 hours ago
Post
333
๐—œ๐—ป๐˜๐—ฟ๐—ผ๐—ฑ๐˜‚๐—ฐ๐—ถ๐—ป๐—ด ๐˜๐—ต๐—ฒ ๐˜„๐—ผ๐—ฟ๐—น๐—ฑ'๐˜€ ๐˜€๐—บ๐—ฎ๐—น๐—น๐—ฒ๐˜€๐˜ ๐˜ƒ๐—ถ๐˜€๐—ถ๐—ผ๐—ป ๐—น๐—ฎ๐—ป๐—ด๐˜‚๐—ฎ๐—ด๐—ฒ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น!

Weโ€™re thrilled to share ๐—ฆ๐—บ๐—ผ๐—น๐—ฉ๐—Ÿ๐—  (256M & 500M)โ€”the smallest Visual Language Models ever built. Think: running on <1GB of GPU memoryโ€”you can fine-tune it on your laptop and run it on your toaster!

Why Itโ€™s Game-Changing:
- ๐—ข๐˜‚๐˜๐—ฝ๐—ฒ๐—ฟ๐—ณ๐—ผ๐—ฟ๐—บ๐˜€ ๐—Ÿ๐—ฎ๐—ฟ๐—ด๐—ฒ๐—ฟ ๐— ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€: Even the 256M model surpasses our SOTA 80B-parameter model from just 17 months ago. Over 300x reduction!
๐— ๐—ถ๐—ด๐—ต๐˜๐˜† ๐—˜๐—ณ๐—ณ๐—ถ๐—ฐ๐—ถ๐—ฒ๐—ป๐—ฐ๐˜†: The 256M version delivers 80% of our 2.2B modelโ€™s performance, and the 500M version hits 90%
๐—Ÿ๐—ถ๐—ด๐—ต๐˜๐—ป๐—ถ๐—ป๐—ด-๐—™๐—ฎ๐˜€๐˜ ๐—ฆ๐—ฒ๐—ฎ๐—ฟ๐—ฐ๐—ต: SmolVLM integrates with ColiPali for state-of-the-art retrieval speedsโ€”on par with models 10x bigger. That means cheaper, faster indexing and real-world impact.

Whatโ€™s New Under the Hood:
- ๐—ก๐—ฒ๐˜„ ๐—ฉ๐—ถ๐˜€๐—ถ๐—ผ๐—ป ๐—˜๐—ป๐—ฐ๐—ผ๐—ฑ๐—ฒ๐—ฟ: Smaller overall size (400M -> 93M), but with higher resolution.
- ๐—›๐—ถ๐—ด๐—ต๐—ฒ๐—ฟ ๐—ฃ๐—ถ๐˜…๐—ฒ๐—น๐˜€/๐—ง๐—ผ๐—ธ๐—ฒ๐—ป: 4096 vs. 1820โ€”more efficient image processing.
- ๐—ฆ๐—บ๐—ฎ๐—ฟ๐˜ ๐—ง๐—ผ๐—ธ๐—ฒ๐—ป๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป: Faster training and a performance boost.

Check our blog: https://huggingface.co./blog/smolervlm
The models: HuggingFaceTB/smolvlm-256m-and-500m-6791fafc5bb0ab8acc960fb0
The demo: HuggingFaceTB/SmolVLM-256M-Demo
In this post