Update README.md
Browse files
README.md
CHANGED
@@ -1,10 +1,26 @@
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
pipeline_tag: image-text-to-text
|
|
|
4 |
---
|
5 |
-
# Model Card: Reflective LLaVA (ReflectiVA)
|
6 |
|
7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
|
9 |
## Citation
|
10 |
If you make use of our work, please cite our repo:
|
@@ -16,4 +32,8 @@ If you make use of our work, please cite our repo:
|
|
16 |
journal={arXiv},
|
17 |
year={2024}
|
18 |
}
|
19 |
-
```
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
pipeline_tag: image-text-to-text
|
4 |
+
license: apache-2.0
|
5 |
---
|
6 |
+
# Model Card: Reflective LLaVA (ReflectiVA)
|
7 |
|
8 |
+
Multimodal LLMs (MLLMs) are the natural extension of large language models to handle multimodal inputs, combining text and image data.
|
9 |
+
They have recently garnered attention due to their capability to address complex tasks involving both modalities.
|
10 |
+
However, their effectiveness is limited to the knowledge acquired during training, which restricts their practical utility.
|
11 |
+
In this work, we introduce a novel method to enhance the adaptability of MLLMs by integrating external knowledge sources.
|
12 |
+
Our proposed model, Reflective LLaVA (```ReflectiVA```), utilizes reflective tokens to dynamically determine the need for external knowledge
|
13 |
+
and predict the relevance of information retrieved from an external database.
|
14 |
+
Tokens are trained following a two-stage two-model training recipe. This ultimately enables the MLLM to manage external knowledge
|
15 |
+
while preserving fluency and performance on tasks where external knowledge is not needed.
|
16 |
+
|
17 |
+
The efficacy of ```ReflectiVA``` for knowledge-based visual question answering, highlighting its
|
18 |
+
superior performance compared to existing methods.
|
19 |
+
|
20 |
+
|
21 |
+
In this model space, you will find the Overall Model (stage two) weights of ```ReflectiVA```.
|
22 |
+
|
23 |
+
For more information, visit our [ReflectiVA repository](https://github.com/aimagelab/ReflectiVA).
|
24 |
|
25 |
## Citation
|
26 |
If you make use of our work, please cite our repo:
|
|
|
32 |
journal={arXiv},
|
33 |
year={2024}
|
34 |
}
|
35 |
+
```
|
36 |
+
|
37 |
+
## Paper page
|
38 |
+
|
39 |
+
Paper can be found at https://huggingface.co/papers/2411.16863.
|