aimagelab
/

ReflectiVA

Image-Text-to-Text

text-generation

Inference Endpoints

Model card Files Files and versions Community

fede97 commited on Nov 27, 2024

Commit

4acda93

·

verified ·

1 Parent(s): 864bd53

Update README.md

Files changed (1) hide show

README.md +23 -3

README.md CHANGED Viewed

@@ -1,10 +1,26 @@
 ---
 library_name: transformers
 pipeline_tag: image-text-to-text
 ---
-# Model Card: Reflective LLaVA (ReflectiVA)
-```ReflectiVA```
 ## Citation
 If you make use of our work, please cite our repo:
@@ -16,4 +32,8 @@ If you make use of our work, please cite our repo:
   journal={arXiv},
   year={2024}
 }
-```

 ---
 library_name: transformers
 pipeline_tag: image-text-to-text
+license: apache-2.0
 ---
+# Model Card: Reflective LLaVA (ReflectiVA)
+Multimodal LLMs (MLLMs) are the natural extension of large language models to handle multimodal inputs, combining text and image data.
+They have recently garnered attention due to their capability to address complex tasks involving both modalities.
+However, their effectiveness is limited to the knowledge acquired during training, which restricts their practical utility.
+In this work, we introduce a novel method to enhance the adaptability of MLLMs by integrating external knowledge sources.
+Our proposed model, Reflective LLaVA (```ReflectiVA```), utilizes reflective tokens to dynamically determine the need for external knowledge
+and predict the relevance of information retrieved from an external database.
+Tokens are trained following a two-stage two-model training recipe. This ultimately enables the MLLM to manage external knowledge
+while preserving fluency and performance on tasks where external knowledge is not needed.
+The efficacy of ```ReflectiVA``` for knowledge-based visual question answering, highlighting its
+superior performance compared to existing methods.
+In this model space, you will find the Overall Model (stage two) weights of ```ReflectiVA```.
+For more information, visit our [ReflectiVA repository](https://github.com/aimagelab/ReflectiVA).
 ## Citation
 If you make use of our work, please cite our repo:
   journal={arXiv},
   year={2024}
 }
+```
+## Paper page
+Paper can be found at https://huggingface.co/papers/2411.16863.