metadata

library_name: transformers
tags:
  - vidore
model-index:
  - name: colphi3.5
    results: []
datasets:
  - vidore/colpali_train_set
base_model:
  - microsoft/Phi-3.5-vision-instruct
pipeline_tag: feature-extraction
license: mit

ColPhi3.5

This model was trained from scratch on the data_dir/colpali_train_set dataset.

Model description

ColPhi3.5 is a model based on a novel model architecture and training strategy based on Vision Language Models (VLMs) to efficiently index documents from their visual features. It is a Phi3.5-V-4B extension that generates ColBERT- style multi-vector representations of text and images. It was introduced in the paper ColPali: Efficient Document Retrieval with Vision Language Models.

Intended uses & limitations

More information needed

Training and evaluation data

More information needed