--- license: apache-2.0 datasets: - liuhaotian/LLaVA-Instruct-150K - jxu124/refcoco - jxu124/refcocog - jxu124/refcocoplus metrics: - accuracy language: - en --- # Model Summery We propose Lenna a Language enhanced reasoning detection assistant, which utilizes the robust multimodal feature representation of MLLMs, while preserving location information for detection. This is achieved by incorporating an additional token in the MLLM vocabulary that is free of explicit semantic context but serves as a prompt for the detector to identify the corresponding position. To evaluate the reasoning capability of Lenna, we construct a ReasonDet dataset to measure its performance on reasoning-based detection. # Model Sources - Repository: https://github.com/Meituan-AutoML/Lenna - Paper: https://arxiv.org/abs/2312.02433 # How to Get Started with the Model Model weights can be loaded with Hugging Face Transformers. Examples can be found at [Github](https://github.com/Meituan-AutoML/Lenna).