allenai
/

MolmoE-1B-0924

Image-Text-to-Text

text-generation

Mixture of Experts

Inference Endpoints

Model card Files Files and versions Community

VRAM - how to run

#6

by SinanAkkoyun - opened Sep 27, 2024

Sep 27, 2024

•

edited Sep 27, 2024

How do you run inference with your API?

Sep 27, 2024

•

edited Sep 27, 2024

May I ask what quantization your demo inference runs at?

cktlco

Sep 27, 2024

•

edited Sep 27, 2024

This is an unquantized model. You can estimate the total expected VRAM required by adding the size of each checkpoint file.

To run this on a 24GB GPU, you can try this env var or the 4-bit per weight quantized model I linked in the last comment here: https://huggingface.co./allenai/MolmoE-1B-0924/discussions/4

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment