Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,23 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- liuhaotian/LLaVA-Instruct-150K
|
5 |
+
- jxu124/refcoco
|
6 |
+
- jxu124/refcocog
|
7 |
+
- jxu124/refcocoplus
|
8 |
+
metrics:
|
9 |
+
- accuracy
|
10 |
+
language:
|
11 |
+
- en
|
12 |
---
|
13 |
+
# Model Summery
|
14 |
+
We propose Lenna a Language enhanced reasoning detection assistant, which utilizes the robust multimodal feature representation of MLLMs, while preserving location information for detection.
|
15 |
+
This is achieved by incorporating an additional <DET> token in the MLLM vocabulary that is free of explicit semantic context but serves as a prompt for the detector to identify the corresponding position.
|
16 |
+
To evaluate the reasoning capability of Lenna, we construct a ReasonDet dataset to measure its performance on reasoning-based detection.
|
17 |
+
|
18 |
+
# Model Sources
|
19 |
+
- Repository: https://github.com/Meituan-AutoML/Lenna
|
20 |
+
- Paper: https://arxiv.org/abs/2312.02433
|
21 |
+
|
22 |
+
# How to Get Started with the Model
|
23 |
+
Model weights can be loaded with Hugging Face Transformers. Examples can be found at [Github](https://github.com/Meituan-AutoML/Lenna).
|