Edit model card

Script for creating this. llmcompressor installed from source since it depends on something that wasn't compiled into the release at the time.

from transformers import AutoProcessor, Qwen2VLForConditionalGeneration

from llmcompressor.modifiers.quantization import QuantizationModifier
from llmcompressor.transformers import oneshot, wrap_hf_model_class

MODEL_ID = "Qwen/Qwen2-VL-7B-Instruct"

# Load model.
model_class = wrap_hf_model_class(Qwen2VLForConditionalGeneration)
model = model_class.from_pretrained(MODEL_ID, device_map="auto", torch_dtype="auto")
processor = AutoProcessor.from_pretrained(MODEL_ID)

# Configure the simple PTQ quantization
recipe = QuantizationModifier(
  targets="Linear", 
  scheme="FP8_DYNAMIC", 
  ignore=["re:.*lm_head", "re:visual.*"]
)

# Apply the quantization algorithm.
oneshot(model=model, recipe=recipe)

# Save the model.
SAVE_DIR = MODEL_ID.split("/")[1] + "-FP8-Dynamic"
model.save_pretrained(SAVE_DIR)
processor.save_pretrained(SAVE_DIR)
Downloads last month
4
Safetensors
Model size
8.29B params
Tensor type
BF16
·
F8_E4M3
·
Inference API
Unable to determine this model's library. Check the docs .