metadata
library_name: transformers
license: apache-2.0
datasets:
- HuggingFaceM4/the_cauldron
- HuggingFaceM4/Docmatix
pipeline_tag: video-text-to-text
language:
- en
base_model:
- HuggingFaceTB/SmolLM2-360M-Instruct
- google/siglip-base-patch16-512
- HuggingFaceTB/SmolVLM2-500M-Video-Instruct
tags:
- mlx
HuggingFaceTB/SmolVLM2-500M-Video-Instruct-mlx-8bit-skip-vision
This model was converted to MLX format from HuggingFaceTB/SmolVLM2-500M-Video-Instruct
using mlx-vlm version 0.1.13.
In this quantized version of the 500M model, the Vision tower is not quantized to avoid issues on iOS
Refer to the original model card for more details on the model.
Use with mlx
pip install -U mlx-vlm
python -m mlx_vlm.generate --model mlx-community/SmolVLM2-500M-Video-Instruct-mlx-8bit-skip-vision --image https://huggingface.co./datasets/huggingface/documentation-images/resolve/main/bee.jpg --prompt "Can you describe this image?"