Molmo on text tasks
Hi,
First of all, great work! I have some questions regarding Molmo.
Q1, I don't see a demo for text-only tasks, also I do not see text benchmarks in your tech report (like mmlu and HumanEval). Does that mean I cannot use Molmo as a regular LLM?
Q2, I see pre-training stage is performed on image captioning tasks, does it mean that Molmo will not perform well in more complicated reasoning tasks? Or such captioning is suffice for VLM to understand image tasks?
Best Regards,
Nuo
Hi,
A1. Currently our demo only supports inputs with images. However, we have also tested our models on language only tasks and they work pretty well. So you should be able to use them as regular LLMs. We will release the numbers on text benchmarks in the full tech report.
A2. We've also used other data sources with more reasoning involved in our training data mixture. The reasoning also depends a lot on the LLM backbone used in the model. If your tasks require complex reasoning you might want to try the 72B model.