FreedomIntelligence
/

LongLLaVA-53B-A13B

Image-Text-to-Text

text-generation

Inference Endpoints

Model card Files Files and versions Community

Xidong commited on Sep 18

Commit

a2e3489

•

1 Parent(s): d84425e

Update README.md

Files changed (1) hide show

README.md +45 -6

README.md CHANGED Viewed

@@ -11,9 +11,10 @@ pipeline_tag: image-text-to-text
 ![efficiency](./assets/singleGPU.png)
 ## 🌈 Update
-* **[2024.09.05]** LongLLaVA repo is published！🎉
 ## Architecture
@@ -41,17 +42,52 @@ pipeline_tag: image-text-to-text
-## Evaluation and demo
-   > Coming Soon~
-## To do
-  [] Release inference code
 ## Citation
@@ -65,4 +101,7 @@ pipeline_tag: image-text-to-text
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2409.02889},
 }
-```

 ![efficiency](./assets/singleGPU.png)
 ## 🌈 Update
+* **[2024.09.05]** LongLLaVA repo is published！🎉 The Code will
 ## Architecture
+## Results reproduction
+### Evaluation
+- Preparation
+Get the model inference code from [Github](https://github.com/FreedomIntelligence/LongLLaVA).
+```bash
+  git clone https://github.com/FreedomIntelligence/LongLLaVA.git
+```
+- Environment Setup
+```bash
+  pip install -r requirements.txt
+```
+- Command Line Interface
+```bash
+  python cli.py --model_dir path-to-longllava
+```
+- Model Inference
+```python
+query = 'What does the picture show?'
+image_paths = ['image_path1'] # image or video path
+from cli import Chatbot
+bot = Chatbot(path-to-longllava)
+output = bot.inference(query, image_paths)
+print(output) # Prints the output of the model
+```
+## TO DO
+- [ ] Release Data Construction Code
+## Acknowledgement
+- [LLaVA](https://github.com/haotian-liu/LLaVA): Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
 ## Citation
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2409.02889},
 }
+```