Model Summery
We propose Lenna a Language enhanced reasoning detection assistant, which utilizes the robust multimodal feature representation of MLLMs, while preserving location information for detection. This is achieved by incorporating an additional token in the MLLM vocabulary that is free of explicit semantic context but serves as a prompt for the detector to identify the corresponding position. To evaluate the reasoning capability of Lenna, we construct a ReasonDet dataset to measure its performance on reasoning-based detection.
Model Sources
- Repository: https://github.com/Meituan-AutoML/Lenna
- Paper: https://arxiv.org/abs/2312.02433
How to Get Started with the Model
Model weights can be loaded with Hugging Face Transformers. Examples can be found at Github.
- Downloads last month
- 31
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.