Update README.md
Browse files
README.md
CHANGED
@@ -8,6 +8,14 @@ library_name: adapter-transformers
|
|
8 |
pipeline_tag: text-classification
|
9 |
---
|
10 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
# Introducing MoE-LLaVA-Qwen1.5-1.8B×4-Top2 for Vietnamese
|
12 |
|
13 |
We are excited to present MoE-LLaVA-Qwen1.5-1.8B×4-Top2, tailored for the Vietnamese language. This model is part of our ongoing efforts to develop Vision Language Models (VLM) for Vietnamese, a domain that is currently limited and predominantly features larger models (~7B parameters). Our model activates approximately 2.2B parameters per call, significantly reducing the memory footprint, and it can be quantized for local execution.
|
@@ -27,6 +35,9 @@ For the COCO dataset, we utilized Llava-style prompts to generate data. For the
|
|
27 |
- **Caption-based Prompting:** Utilizes accurate captions and bounding boxes from the original dataset.
|
28 |
- **Image-based Prompting:** Leverages images to generate captions and conversations.
|
29 |
|
|
|
|
|
|
|
30 |
## Bias, Risks, and Limitations
|
31 |
|
32 |
The dataset may contain biases originating from its sources. Users should remain aware of these potential biases when utilizing the dataset.
|
|
|
8 |
pipeline_tag: text-classification
|
9 |
---
|
10 |
|
11 |
+
<p align="center">
|
12 |
+
<img src="https://s11.ax1x.com/2023/12/28/piqvDMV.png" width="250" style="margin-bottom: 0.2;"/>
|
13 |
+
<img src="https://s11.ax1x.com/2023/12/28/piqvDMV.png" width="250" style="margin-bottom: 0.2;"/>
|
14 |
+
<p>
|
15 |
+
<h2 align="center"> <a href="https://arxiv.org/abs/2401.15947">MoE-LLaVA-Qwen1.5-1.8B×4-Top2: When Vision meet Small-scaled Language Model and Vietnamese Synthetic Dataset</a></h2>
|
16 |
+
|
17 |
+
<h5 align="center">
|
18 |
+
|
19 |
# Introducing MoE-LLaVA-Qwen1.5-1.8B×4-Top2 for Vietnamese
|
20 |
|
21 |
We are excited to present MoE-LLaVA-Qwen1.5-1.8B×4-Top2, tailored for the Vietnamese language. This model is part of our ongoing efforts to develop Vision Language Models (VLM) for Vietnamese, a domain that is currently limited and predominantly features larger models (~7B parameters). Our model activates approximately 2.2B parameters per call, significantly reducing the memory footprint, and it can be quantized for local execution.
|
|
|
35 |
- **Caption-based Prompting:** Utilizes accurate captions and bounding boxes from the original dataset.
|
36 |
- **Image-based Prompting:** Leverages images to generate captions and conversations.
|
37 |
|
38 |
+
## Evaluation
|
39 |
+
- Comming soon 🫡
|
40 |
+
|
41 |
## Bias, Risks, and Limitations
|
42 |
|
43 |
The dataset may contain biases originating from its sources. Users should remain aware of these potential biases when utilizing the dataset.
|