andito HF staff commited on
Commit
7a8ac21
·
verified ·
1 Parent(s): 401eccf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -12
README.md CHANGED
@@ -37,6 +37,12 @@ SmolVLM can be used for inference on multimodal (image + text) tasks where the i
37
 
38
  To fine-tune SmolVLM on a specific task, you can follow [the fine-tuning tutorial](https://github.com/huggingface/smollm/blob/main/vision/finetuning/Smol_VLM_FT.ipynb).
39
 
 
 
 
 
 
 
40
  ### Technical Summary
41
 
42
  SmolVLM leverages the lightweight SmolLM2 language model to provide a compact yet powerful multimodal experience. It introduces several changes compared to the larger SmolVLM 2.2B model:
@@ -167,15 +173,3 @@ The training data comes from [The Cauldron](https://huggingface.co/datasets/Hugg
167
 
168
 
169
 
170
-
171
- ## Evaluation
172
-
173
-
174
- <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smoller_vlm_benchmarks.png" alt="Example Image" style="width:90%;" />
175
-
176
-
177
- | Size | Mathvista | MMMU | OCRBench | MMStar | AI2D | ChartQA_Test | Science_QA | TextVQA Val | DocVQA Val |
178
- |-------|-----------|------|----------|--------|-------|--------------|------------|-------------|------------|
179
- | 256M | 35.9 | 28.3 | 52.6 | 34.6 | 47 | 55.8 | 73.6 | 49.9 | 58.3 |
180
- | 500M | 40.1 | 33.7 | 61 | 38.3 | 59.5 | 63.2 | 79.7 | 60.5 | 70.5 |
181
- | 2.2B | 43.9 | 38.3 | 65.5 | 41.8 | 64 | 71.6 | 84.5 | 72.1 | 79.7 |
 
37
 
38
  To fine-tune SmolVLM on a specific task, you can follow [the fine-tuning tutorial](https://github.com/huggingface/smollm/blob/main/vision/finetuning/Smol_VLM_FT.ipynb).
39
 
40
+ ## Evaluation
41
+
42
+
43
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smoller_vlm_benchmarks.png" alt="Benchmarks" style="width:90%;" />
44
+
45
+
46
  ### Technical Summary
47
 
48
  SmolVLM leverages the lightweight SmolLM2 language model to provide a compact yet powerful multimodal experience. It introduces several changes compared to the larger SmolVLM 2.2B model:
 
173
 
174
 
175