--- pipeline_tag: image-text-to-text --- This is the Florence-VL 8B SFT checkpoint described in [Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion](https://huggingface.co./papers/2412.04424). Code: https://github.com/JiuhaiChen/Florence-VL