allenai
/

Molmo-7B-D-0924

Image-Text-to-Text

text-generation

Model card Files Files and versions Community

soldni commited on Sep 25

Commit

d0262c9

•

1 Parent(s): f75a7cd

Update README.md

Files changed (1) hide show

README.md +60 -3

README.md CHANGED Viewed

@@ -1,3 +1,60 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+# Molmo 7B-D
+## Quick Start
+To run Molmo, first install dependencies:
+```bash
+pip install einops torch torchvision PIL
+```
+Then, follow these steps:
+```python
+from transformers import AutoModelForCausalLM, AutoProcessor, GenerationConfig
+from PIL import Image
+import requests
+# load the processor
+processor = AutoProcessor.from_pretrained(
+    'allenai/Molmo-7B-D-0924',
+    trust_remote_code=True,
+    torch_dtype='auto',
+    device_map='auto'
+)
+# load the model
+model = AutoModelForCausalLM.from_pretrained(
+    'allenai/Molmo-7B-D-0924',
+    trust_remote_code=True,
+    torch_dtype='auto',
+    device_map='auto'
+)
+# process the image and text
+inputs = processor.process(
+    images=[Image.open(requests.get("https://picsum.photos/id/237/536/354", stream=True).raw)],
+    text="Describe this image."
+)
+# move inputs to the correct device and make a batch of size 1
+inputs = {k: v.to(model.device).unsqueeze(0) for k, v in inputs.items()}
+# generate output; maximum 200 new tokens; stop generation when <|endoftext|> is generated
+output = model.generate_from_batch(
+    inputs,
+    GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
+    tokenizer=processor.tokenizer
+)
+# only get generated tokens; decode them to text
+generated_tokens = output[0,inputs['input_ids'].size(1):]
+generated_text = processor.tokenizer.decode(generated_tokens, skip_special_tokens=True)
+# print the generated text
+print(generated_text)
+```