Update readme
Browse files
README.md
CHANGED
@@ -7,7 +7,7 @@ pipeline_tag: visual-question-answering
|
|
7 |
|
8 |
- π₯ **State-of-the-art Performance.**
|
9 |
|
10 |
-
MiniCPM-V 2.0 achieves **state-of-the-art performance** on multiple benchmarks (including OCRBench, TextVQA, MME, MMB, MathVista, etc) among models under 7B parameters. It even **outperforms strong Qwen-VL-Chat 9.6B, CogVLM-Chat 17.4B, and Yi-VL 34B on OpenCompass, a
|
11 |
|
12 |
- π **Trustworthy Behavior.**
|
13 |
|
@@ -15,7 +15,7 @@ pipeline_tag: visual-question-answering
|
|
15 |
|
16 |
- π **High-Resolution Images at Any Aspect Raito.**
|
17 |
|
18 |
-
MiniCPM-V 2.0 can accept **1.8 million
|
19 |
|
20 |
- β‘οΈ **High Efficiency.**
|
21 |
|
@@ -25,7 +25,7 @@ pipeline_tag: visual-question-answering
|
|
25 |
|
26 |
- π **Bilingual Support.**
|
27 |
|
28 |
-
MiniCPM-V 2.0 **supports strong bilingual multimodal capabilities in both English and Chinese**. This is enabled by generalizing multimodal capabilities across languages, a technique from [VisCPM](https://arxiv.org/abs/2308.12038) [ICLR'24
|
29 |
|
30 |
## Evaluation <!-- omit in toc -->
|
31 |
|
@@ -81,7 +81,7 @@ pipeline_tag: visual-question-answering
|
|
81 |
<td>- </td>
|
82 |
<td>78.0</td>
|
83 |
<td>88.4</td>
|
84 |
-
<td>
|
85 |
<td>63.2</td>
|
86 |
<td>1771.5</td>
|
87 |
<td>75.1</td>
|
|
|
7 |
|
8 |
- π₯ **State-of-the-art Performance.**
|
9 |
|
10 |
+
MiniCPM-V 2.0 achieves **state-of-the-art performance** on multiple benchmarks (including OCRBench, TextVQA, MME, MMB, MathVista, etc) among models under 7B parameters. It even **outperforms strong Qwen-VL-Chat 9.6B, CogVLM-Chat 17.4B, and Yi-VL 34B on OpenCompass, a comprehensive evaluation over 11 popular benchmarks**. Notably, MiniCPM-V 2.0 shows **strong OCR capability**, achieving **comparable performance to Gemini Pro in scene-text understanding**, and **state-of-the-art performance on OCRBench** among open-source models.
|
11 |
|
12 |
- π **Trustworthy Behavior.**
|
13 |
|
|
|
15 |
|
16 |
- π **High-Resolution Images at Any Aspect Raito.**
|
17 |
|
18 |
+
MiniCPM-V 2.0 can accept **1.8 million pixels (e.g., 1344x1344) images at any aspect ratio**. This enables better perception of fine-grained visual information such as small objects and optical characters, which is achieved via a recent technique from [LLaVA-UHD](https://arxiv.org/pdf/2403.11703.pdf).
|
19 |
|
20 |
- β‘οΈ **High Efficiency.**
|
21 |
|
|
|
25 |
|
26 |
- π **Bilingual Support.**
|
27 |
|
28 |
+
MiniCPM-V 2.0 **supports strong bilingual multimodal capabilities in both English and Chinese**. This is enabled by generalizing multimodal capabilities across languages, a technique from [VisCPM](https://arxiv.org/abs/2308.12038) [ICLR'24].
|
29 |
|
30 |
## Evaluation <!-- omit in toc -->
|
31 |
|
|
|
81 |
<td>- </td>
|
82 |
<td>78.0</td>
|
83 |
<td>88.4</td>
|
84 |
+
<td>645</td>
|
85 |
<td>63.2</td>
|
86 |
<td>1771.5</td>
|
87 |
<td>75.1</td>
|