soldni commited on
Commit
6180ba1
1 Parent(s): 817de9b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -1
README.md CHANGED
@@ -21,7 +21,12 @@ tags:
21
 
22
  # MolmoE 1B
23
 
24
- Molmo is an open vision-language model developed by the Allen Institute for AI. Molmo models are trained on PixMo, a dataset of 1 million, highly-curated image-text pairs. It has state-of-the-art performance among multimodal models with a similar size while being fully open-source. You can find all models in the Molmo family [here](https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19).
 
 
 
 
 
25
 
26
  MolmoE-1B is a multimodal Mixture-of-Experts LLM with 1.5B active and 7.2B total parameters released in September 2024 (0924) based on [OLMoE-1B-7B-0924](https://huggingface.co/allenai/OLMoE-1B-7B-0924).
27
  It nearly matches the performance of GPT-4V on both academic benchmarks and human evaluation, and achieves state-of-the-art performance among similarly-sized open multimodal models.
@@ -90,6 +95,38 @@ print(generated_text)
90
  # wooden deck. The deck's planks, which are a mix of light and dark brown with ...
91
  ```
92
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
  ## License and Use
94
 
95
  This model is licensed under Apache 2.0. It is intended for research and educational use.
 
21
 
22
  # MolmoE 1B
23
 
24
+
25
+ Molmo is a family of open vision-language models developed by the Allen Institute for AI.
26
+ Molmo models are trained on PixMo, a dataset of 1 million, highly-curated image-text pairs.
27
+ It has state-of-the-art performance among multimodal models with a similar size while being fully open-source.
28
+ You can find all models in the Molmo family [here](https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19).
29
+ **Learn more** about the Molmo family [in our announcement blog post](https://molmo.allenai.org/blog).
30
 
31
  MolmoE-1B is a multimodal Mixture-of-Experts LLM with 1.5B active and 7.2B total parameters released in September 2024 (0924) based on [OLMoE-1B-7B-0924](https://huggingface.co/allenai/OLMoE-1B-7B-0924).
32
  It nearly matches the performance of GPT-4V on both academic benchmarks and human evaluation, and achieves state-of-the-art performance among similarly-sized open multimodal models.
 
95
  # wooden deck. The deck's planks, which are a mix of light and dark brown with ...
96
  ```
97
 
98
+ ## Evaluations
99
+
100
+ | Model | Average Score on 11 Academic Benchmarks | Human Preference Elo Rating |
101
+ |-----------------------------|-----------------------------------------|-----------------------------|
102
+ | Molmo 72B | 81.2 | 1077 |
103
+ | **Molmo 7B-D** (this model) | **77.3** | **1056** |
104
+ | Molmo 7B-O | 74.6 | 1051 |
105
+ | MolmoE 1B | 68.6 | 1032 |
106
+ | GPT-4o | 78.5 | 1079 |
107
+ | GPT-4V | 71.1 | 1041 |
108
+ | Gemini 1.5 Pro | 78.3 | 1074 |
109
+ | Gemini 1.5 Flash | 75.1 | 1054 |
110
+ | Claude 3.5 Sonnet | 76.7 | 1069 |
111
+ | Claude 3 Opus | 66.4 | 971 |
112
+ | Claude 3 Haiku | 65.3 | 999 |
113
+ | Qwen VL2 72B | 79.4 | 1037 |
114
+ | Qwen VL2 7B | 73.7 | 1025 |
115
+ | Intern VL2 LLAMA 76B | 77.1 | 1018 |
116
+ | Intern VL2 8B | 69.4 | 953 |
117
+ | Pixtral 12B | 69.5 | 1016 |
118
+ | Phi3.5-Vision 4B | 59.7 | 982 |
119
+ | PaliGemma 3B | 50.0 | 937 |
120
+ | LLAVA OneVision 72B | 76.6 | 1051 |
121
+ | LLAVA OneVision 7B | 72.0 | 1024 |
122
+ | Cambrian-1 34B | 66.8 | 953 |
123
+ | Cambrian-1 8B | 63.4 | 952 |
124
+ | xGen - MM - Interleave 4B | 59.5 | 979 |
125
+ | LLAVA-1.5 13B | 43.9 | 960 |
126
+ | LLAVA-1.5 7B | 40.7 | 951 |
127
+
128
+ *Benchmarks: AI2D test, ChartQA test, VQA v2.0 test, DocQA test, InfographicVQA test, TextVQA val, RealWorldQA, MMMU val, MathVista testmini, CountBenchQA, Flickr Count (we collected this new dataset that is significantly harder than CountBenchQA).*
129
+
130
  ## License and Use
131
 
132
  This model is licensed under Apache 2.0. It is intended for research and educational use.