allenai
/

Molmo-7B-D-0924

Image-Text-to-Text

text-generation

Model card Files Files and versions Community

chrisc36 commited on Sep 30

Commit

ff82110

•

1 Parent(s): df092d0

Update README.md

Files changed (1) hide show

README.md +27 -0

README.md CHANGED Viewed

@@ -94,6 +94,33 @@ print(generated_text)
 #      perspective. The puppy is sitting on a wooden deck, which is composed ...
 ```
 ## Evaluations
 | Model                       | Average Score on 11 Academic Benchmarks | Human Preference Elo Rating |

 #      perspective. The puppy is sitting on a wooden deck, which is composed ...
 ```
+To make inference more efficient, run with autocast:
+```python
+with torch.autocast(device_type="cuda", enabled=True, dtype=torch.bfloat16):
+  output = model.generate_from_batch(
+      inputs,
+      GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
+      tokenizer=processor.tokenizer
+  )
+```
+We did most of our evaluation in this setting (autocast on, but float32 weights)
+To even further reduce the memory requirements, the model can be run with bfloat16 weights:
+```python
+model.to(dtype=torch.bfloat16)
+inputs["images"] = inputs["images"].to(torch.bfloat16)
+output = model.generate_from_batch(
+    inputs,
+    GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
+    tokenizer=processor.tokenizer
+)
+```
+Note that we have observed that this can (rarely) change the output of the model compared to running with float32 weights.
 ## Evaluations
 | Model                       | Average Score on 11 Academic Benchmarks | Human Preference Elo Rating |