finalf0 commited on
Commit
e0456e9
·
verified ·
1 Parent(s): c4a230f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +101 -0
README.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-classification
3
+ ---
4
+
5
+ ## MiniCPM-V
6
+ **MiniCPM-V** is an efficient version with promising performance for deployment. The model is built based on MiniCPM-2.4B and SigLip-400M, connected by a perceiver resampler. Notable features of MiniCPM-V include:
7
+
8
+ - 🚀 **High Efficiency.**
9
+
10
+ MiniCPM-V can be **efficiently deployed on most GPU cards and personal computers**, and **even on edge devices such as mobile phones**. In terms of visual encoding, we compress the image representations into 64 tokens via a perceiver resampler, which is significantly fewer than other LMMs based on MLP architecture (typically > 512 tokens). This allows MiniCPM-V to operate with **much less memory cost and higher speed during inference**.
11
+
12
+ - 🔥 **Promising Performance.**
13
+
14
+ MiniCPM-V achieves **state-of-the-art performance** on multiple benchmarks (including MMMU, MME, and MMbech, etc) among models with comparable sizes, surpassing existing LMMs built on Phi-2. It even **achieves comparable or better performance than the 9.6B Qwen-VL-Chat**.
15
+
16
+ - 🙌 **Bilingual Support.**
17
+
18
+ MiniCPM-V is **the first edge-deployable LMM supporting bilingual multimodal interaction in English and Chinese**. This is achieved by generalizing multimodal capabilities across languages, a technique from our ICLR 2024 spotlight [paper](https://arxiv.org/abs/2308.12038).
19
+
20
+ <div align="center">
21
+
22
+ <table style="margin: 0px auto;">
23
+ <thead>
24
+ <tr>
25
+ <th align="left">Model</th>
26
+ <th>Size</th>
27
+ <th>MME</th>
28
+ <th nowrap="nowrap" >MMB dev (en)</th>
29
+ <th nowrap="nowrap" >MMB dev (zh)</th>
30
+ <th nowrap="nowrap" >MMMU val</th>
31
+ <th nowrap="nowrap" >CMMMU val</th>
32
+ </tr>
33
+ </thead>
34
+ <tbody align="center">
35
+ <tr>
36
+ <td align="left">LLaVA-Phi</td>
37
+ <td align="right">3.0B</td>
38
+ <td>1335</td>
39
+ <td>59.8</td>
40
+ <td>- </td>
41
+ <td>- </td>
42
+ <td>- </td>
43
+ </tr>
44
+ <tr>
45
+ <td nowrap="nowrap" align="left">MobileVLM</td>
46
+ <td align="right">3.0B</td>
47
+ <td>1289</td>
48
+ <td>59.6</td>
49
+ <td>- </td>
50
+ <td>- </td>
51
+ <td>- </td>
52
+ </tr>
53
+ <tr>
54
+ <td nowrap="nowrap" align="left" >Imp-v1</td>
55
+ <td align="right">3B</td>
56
+ <td>1434</td>
57
+ <td>66.5</td>
58
+ <td>- </td>
59
+ <td>- </td>
60
+ <td>- </td>
61
+ </tr>
62
+ <tr>
63
+ <td align="left" >Qwen-VL-Chat</td>
64
+ <td align="right" >9.6B</td>
65
+ <td>1487</td>
66
+ <td>60.6 </td>
67
+ <td>56.7 </td>
68
+ <td>35.9 </td>
69
+ <td>30.7 </td>
70
+ </tr>
71
+ <tr>
72
+ <td nowrap="nowrap" align="left" ><b>MiniCPM-V</b></td>
73
+ <td align="right">3B </td>
74
+ <td>1452 </td>
75
+ <td>67.3 </td>
76
+ <td>61.9 </td>
77
+ <td>34.7 </td>
78
+ <td>32.1 </td>
79
+ </tr>
80
+ </tbody>
81
+ </table>
82
+
83
+ </div>
84
+
85
+ <table>
86
+ <tr>
87
+ <td>
88
+ <p>
89
+ <img src="data/Mushroom_en.gif" width="400"/>
90
+ </p>
91
+ </td>
92
+ <td>
93
+ <p>
94
+ <img src="data/Snake_en.gif" width="400"/>
95
+ </p>
96
+ </td>
97
+ </tr>
98
+ </table>
99
+
100
+ ## Demo
101
+ Click here to try out the Demo of [MiniCPM-V](http://120.92.209.146:80).