finalf0 commited on
Commit
fe77168
β€’
1 Parent(s): 0017ac3

Update readme

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -7,7 +7,7 @@ pipeline_tag: visual-question-answering
7
 
8
  - πŸ”₯ **State-of-the-art Performance.**
9
 
10
- MiniCPM-V 2.0 achieves **state-of-the-art performance** on multiple benchmarks (including OCRBench, TextVQA, MME, MMB, MathVista, etc) among models under 7B parameters. It even **outperforms strong Qwen-VL-Chat 9.6B, CogVLM-Chat 17.4B, and Yi-VL 34B on OpenCompass, a comprehesive evaluation over 11 popular benchmarks**. Notably, MiniCPM-V 2.0 shows **strong OCR capability**, achieving **comparable performance to Gemini Pro in scene-text understanding**, and **state-of-the-art performance on OCRBench** among open-source models.
11
 
12
  - πŸ† **Trustworthy Behavior.**
13
 
@@ -15,7 +15,7 @@ pipeline_tag: visual-question-answering
15
 
16
  - 🌟 **High-Resolution Images at Any Aspect Raito.**
17
 
18
- MiniCPM-V 2.0 can accept **1.8 million pixel (e.g., 1344x1344) images at any aspect ratio**. This enables better perception of fine-grained visual information such as small objects and optical characters, which is achieved via a recent technique from [LLaVA-UHD](https://arxiv.org/pdf/2403.11703.pdf).
19
 
20
  - ⚑️ **High Efficiency.**
21
 
@@ -25,7 +25,7 @@ pipeline_tag: visual-question-answering
25
 
26
  - πŸ™Œ **Bilingual Support.**
27
 
28
- MiniCPM-V 2.0 **supports strong bilingual multimodal capabilities in both English and Chinese**. This is enabled by generalizing multimodal capabilities across languages, a technique from [VisCPM](https://arxiv.org/abs/2308.12038) [ICLR'24 Spotlight].
29
 
30
  ## Evaluation <!-- omit in toc -->
31
 
@@ -81,7 +81,7 @@ pipeline_tag: visual-question-answering
81
  <td>- </td>
82
  <td>78.0</td>
83
  <td>88.4</td>
84
- <td>516</td>
85
  <td>63.2</td>
86
  <td>1771.5</td>
87
  <td>75.1</td>
 
7
 
8
  - πŸ”₯ **State-of-the-art Performance.**
9
 
10
+ MiniCPM-V 2.0 achieves **state-of-the-art performance** on multiple benchmarks (including OCRBench, TextVQA, MME, MMB, MathVista, etc) among models under 7B parameters. It even **outperforms strong Qwen-VL-Chat 9.6B, CogVLM-Chat 17.4B, and Yi-VL 34B on OpenCompass, a comprehensive evaluation over 11 popular benchmarks**. Notably, MiniCPM-V 2.0 shows **strong OCR capability**, achieving **comparable performance to Gemini Pro in scene-text understanding**, and **state-of-the-art performance on OCRBench** among open-source models.
11
 
12
  - πŸ† **Trustworthy Behavior.**
13
 
 
15
 
16
  - 🌟 **High-Resolution Images at Any Aspect Raito.**
17
 
18
+ MiniCPM-V 2.0 can accept **1.8 million pixels (e.g., 1344x1344) images at any aspect ratio**. This enables better perception of fine-grained visual information such as small objects and optical characters, which is achieved via a recent technique from [LLaVA-UHD](https://arxiv.org/pdf/2403.11703.pdf).
19
 
20
  - ⚑️ **High Efficiency.**
21
 
 
25
 
26
  - πŸ™Œ **Bilingual Support.**
27
 
28
+ MiniCPM-V 2.0 **supports strong bilingual multimodal capabilities in both English and Chinese**. This is enabled by generalizing multimodal capabilities across languages, a technique from [VisCPM](https://arxiv.org/abs/2308.12038) [ICLR'24].
29
 
30
  ## Evaluation <!-- omit in toc -->
31
 
 
81
  <td>- </td>
82
  <td>78.0</td>
83
  <td>88.4</td>
84
+ <td>645</td>
85
  <td>63.2</td>
86
  <td>1771.5</td>
87
  <td>75.1</td>