--- license: apache-2.0 language: - en ---

# SOLO Model Card ## Model details **Model type:** SOLO is a 7B large vision-language model with a single Transformer architecture for unified vision-language modeling. SOLO accepts both raw image patches (in pixels) and texts as inputs, without using a separate pre-trained vision encoder. **Model date:** SOLO-7B was trained in June 2024. **Paper or resources for more information:** [Paper](https://arxiv.org/abs/2407.06438) & [Github](https://github.com/Yangyi-Chen/SOLO) **Where to send questions or comments about the model:** https://github.com/Yangyi-Chen/SOLO/issues **Inference with Huggingface** Please check this [scripts](https://github.com/Yangyi-Chen/SOLO/blob/main/scripts/notebook/demo.ipynb) for an example of performing inference on the model.