Post
699
๐๐ป๐๐ฟ๐ผ๐ฑ๐๐ฐ๐ถ๐ป๐ด ๐๐ต๐ฒ ๐๐ผ๐ฟ๐น๐ฑ'๐ ๐๐บ๐ฎ๐น๐น๐ฒ๐๐ ๐๐ถ๐๐ถ๐ผ๐ป ๐น๐ฎ๐ป๐ด๐๐ฎ๐ด๐ฒ ๐บ๐ผ๐ฑ๐ฒ๐น!
Weโre thrilled to share ๐ฆ๐บ๐ผ๐น๐ฉ๐๐ (256M & 500M)โthe smallest Visual Language Models ever built. Think: running on <1GB of GPU memoryโyou can fine-tune it on your laptop and run it on your toaster!
Why Itโs Game-Changing:
- ๐ข๐๐๐ฝ๐ฒ๐ฟ๐ณ๐ผ๐ฟ๐บ๐ ๐๐ฎ๐ฟ๐ด๐ฒ๐ฟ ๐ ๐ผ๐ฑ๐ฒ๐น๐: Even the 256M model surpasses our SOTA 80B-parameter model from just 17 months ago. Over 300x reduction!
๐ ๐ถ๐ด๐ต๐๐ ๐๐ณ๐ณ๐ถ๐ฐ๐ถ๐ฒ๐ป๐ฐ๐: The 256M version delivers 80% of our 2.2B modelโs performance, and the 500M version hits 90%
๐๐ถ๐ด๐ต๐๐ป๐ถ๐ป๐ด-๐๐ฎ๐๐ ๐ฆ๐ฒ๐ฎ๐ฟ๐ฐ๐ต: SmolVLM integrates with ColiPali for state-of-the-art retrieval speedsโon par with models 10x bigger. That means cheaper, faster indexing and real-world impact.
Whatโs New Under the Hood:
- ๐ก๐ฒ๐ ๐ฉ๐ถ๐๐ถ๐ผ๐ป ๐๐ป๐ฐ๐ผ๐ฑ๐ฒ๐ฟ: Smaller overall size (400M -> 93M), but with higher resolution.
- ๐๐ถ๐ด๐ต๐ฒ๐ฟ ๐ฃ๐ถ๐ ๐ฒ๐น๐/๐ง๐ผ๐ธ๐ฒ๐ป: 4096 vs. 1820โmore efficient image processing.
- ๐ฆ๐บ๐ฎ๐ฟ๐ ๐ง๐ผ๐ธ๐ฒ๐ป๐ถ๐๐ฎ๐๐ถ๐ผ๐ป: Faster training and a performance boost.
Check our blog: https://huggingface.co./blog/smolervlm
The models: HuggingFaceTB/smolvlm-256m-and-500m-6791fafc5bb0ab8acc960fb0
The demo: HuggingFaceTB/SmolVLM-256M-Demo
Weโre thrilled to share ๐ฆ๐บ๐ผ๐น๐ฉ๐๐ (256M & 500M)โthe smallest Visual Language Models ever built. Think: running on <1GB of GPU memoryโyou can fine-tune it on your laptop and run it on your toaster!
Why Itโs Game-Changing:
- ๐ข๐๐๐ฝ๐ฒ๐ฟ๐ณ๐ผ๐ฟ๐บ๐ ๐๐ฎ๐ฟ๐ด๐ฒ๐ฟ ๐ ๐ผ๐ฑ๐ฒ๐น๐: Even the 256M model surpasses our SOTA 80B-parameter model from just 17 months ago. Over 300x reduction!
๐ ๐ถ๐ด๐ต๐๐ ๐๐ณ๐ณ๐ถ๐ฐ๐ถ๐ฒ๐ป๐ฐ๐: The 256M version delivers 80% of our 2.2B modelโs performance, and the 500M version hits 90%
๐๐ถ๐ด๐ต๐๐ป๐ถ๐ป๐ด-๐๐ฎ๐๐ ๐ฆ๐ฒ๐ฎ๐ฟ๐ฐ๐ต: SmolVLM integrates with ColiPali for state-of-the-art retrieval speedsโon par with models 10x bigger. That means cheaper, faster indexing and real-world impact.
Whatโs New Under the Hood:
- ๐ก๐ฒ๐ ๐ฉ๐ถ๐๐ถ๐ผ๐ป ๐๐ป๐ฐ๐ผ๐ฑ๐ฒ๐ฟ: Smaller overall size (400M -> 93M), but with higher resolution.
- ๐๐ถ๐ด๐ต๐ฒ๐ฟ ๐ฃ๐ถ๐ ๐ฒ๐น๐/๐ง๐ผ๐ธ๐ฒ๐ป: 4096 vs. 1820โmore efficient image processing.
- ๐ฆ๐บ๐ฎ๐ฟ๐ ๐ง๐ผ๐ธ๐ฒ๐ป๐ถ๐๐ฎ๐๐ถ๐ผ๐ป: Faster training and a performance boost.
Check our blog: https://huggingface.co./blog/smolervlm
The models: HuggingFaceTB/smolvlm-256m-and-500m-6791fafc5bb0ab8acc960fb0
The demo: HuggingFaceTB/SmolVLM-256M-Demo