metadata

library_name: transformers
license: apache-2.0
datasets:
  - HuggingFaceM4/the_cauldron
  - HuggingFaceM4/Docmatix
pipeline_tag: video-text-to-text
language:
  - en
base_model:
  - HuggingFaceTB/SmolLM2-360M-Instruct
  - google/siglip-base-patch16-512
  - HuggingFaceTB/SmolVLM2-500M-Video-Instruct
tags:
  - mlx

HuggingFaceTB/SmolVLM2-500M-Video-Instruct-mlx-8bit-skip-vision

This model was converted to MLX format from HuggingFaceTB/SmolVLM2-500M-Video-Instruct using mlx-vlm version 0.1.13. In this quantized version of the 500M model, the Vision tower is not quantized to avoid issues on iOS Refer to the original model card for more details on the model.

Use with mlx

pip install -U mlx-vlm

python -m mlx_vlm.generate --model mlx-community/SmolVLM2-500M-Video-Instruct-mlx-8bit-skip-vision --image https://huggingface.co./datasets/huggingface/documentation-images/resolve/main/bee.jpg --prompt "Can you describe this image?"