Molmo on text tasks

#18
by lorinma - opened

Hi,
First of all, great work! I have some questions regarding Molmo.
Q1, I don't see a demo for text-only tasks, also I do not see text benchmarks in your tech report (like mmlu and HumanEval). Does that mean I cannot use Molmo as a regular LLM?
Q2, I see pre-training stage is performed on image captioning tasks, does it mean that Molmo will not perform well in more complicated reasoning tasks? Or such captioning is suffice for VLM to understand image tasks?

Best Regards,
Nuo

lorinma changed discussion title from Curious Molmo on text tasks to Molmo on text tasks

Hi,

A1. Currently our demo only supports inputs with images. However, we have also tested our models on language only tasks and they work pretty well. So you should be able to use them as regular LLMs. We will release the numbers on text benchmarks in the full tech report.
A2. We've also used other data sources with more reasoning involved in our training data mixture. The reasoning also depends a lot on the LLM backbone used in the model. If your tasks require complex reasoning you might want to try the 72B model.

lorinma changed discussion status to closed

Sign up or log in to comment