mtgv commited on
Commit
dfea805
·
1 Parent(s): b8cb10f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -0
README.md CHANGED
@@ -1,3 +1,23 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - liuhaotian/LLaVA-Instruct-150K
5
+ - jxu124/refcoco
6
+ - jxu124/refcocog
7
+ - jxu124/refcocoplus
8
+ metrics:
9
+ - accuracy
10
+ language:
11
+ - en
12
  ---
13
+ # Model Summery
14
+ We propose Lenna a Language enhanced reasoning detection assistant, which utilizes the robust multimodal feature representation of MLLMs, while preserving location information for detection.
15
+ This is achieved by incorporating an additional <DET> token in the MLLM vocabulary that is free of explicit semantic context but serves as a prompt for the detector to identify the corresponding position.
16
+ To evaluate the reasoning capability of Lenna, we construct a ReasonDet dataset to measure its performance on reasoning-based detection.
17
+
18
+ # Model Sources
19
+ - Repository: https://github.com/Meituan-AutoML/Lenna
20
+ - Paper: https://arxiv.org/abs/2312.02433
21
+
22
+ # How to Get Started with the Model
23
+ Model weights can be loaded with Hugging Face Transformers. Examples can be found at [Github](https://github.com/Meituan-AutoML/Lenna).