Update README.md
Browse files
README.md
CHANGED
@@ -37,6 +37,12 @@ SmolVLM can be used for inference on multimodal (image + text) tasks where the i
|
|
37 |
|
38 |
To fine-tune SmolVLM on a specific task, you can follow [the fine-tuning tutorial](https://github.com/huggingface/smollm/blob/main/vision/finetuning/Smol_VLM_FT.ipynb).
|
39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
### Technical Summary
|
41 |
|
42 |
SmolVLM leverages the lightweight SmolLM2 language model to provide a compact yet powerful multimodal experience. It introduces several changes compared to the larger SmolVLM 2.2B model:
|
@@ -167,15 +173,3 @@ The training data comes from [The Cauldron](https://huggingface.co/datasets/Hugg
|
|
167 |
|
168 |
|
169 |
|
170 |
-
|
171 |
-
## Evaluation
|
172 |
-
|
173 |
-
|
174 |
-
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smoller_vlm_benchmarks.png" alt="Example Image" style="width:90%;" />
|
175 |
-
|
176 |
-
|
177 |
-
| Size | Mathvista | MMMU | OCRBench | MMStar | AI2D | ChartQA_Test | Science_QA | TextVQA Val | DocVQA Val |
|
178 |
-
|-------|-----------|------|----------|--------|-------|--------------|------------|-------------|------------|
|
179 |
-
| 256M | 35.9 | 28.3 | 52.6 | 34.6 | 47 | 55.8 | 73.6 | 49.9 | 58.3 |
|
180 |
-
| 500M | 40.1 | 33.7 | 61 | 38.3 | 59.5 | 63.2 | 79.7 | 60.5 | 70.5 |
|
181 |
-
| 2.2B | 43.9 | 38.3 | 65.5 | 41.8 | 64 | 71.6 | 84.5 | 72.1 | 79.7 |
|
|
|
37 |
|
38 |
To fine-tune SmolVLM on a specific task, you can follow [the fine-tuning tutorial](https://github.com/huggingface/smollm/blob/main/vision/finetuning/Smol_VLM_FT.ipynb).
|
39 |
|
40 |
+
## Evaluation
|
41 |
+
|
42 |
+
|
43 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smoller_vlm_benchmarks.png" alt="Benchmarks" style="width:90%;" />
|
44 |
+
|
45 |
+
|
46 |
### Technical Summary
|
47 |
|
48 |
SmolVLM leverages the lightweight SmolLM2 language model to provide a compact yet powerful multimodal experience. It introduces several changes compared to the larger SmolVLM 2.2B model:
|
|
|
173 |
|
174 |
|
175 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|