qazimbhat1
/

Crystal-based-MLLM-7B

Text Generation

Inference Endpoints

Model card Files Files and versions Community

qazimbhat1 commited on Jul 3

Commit

262ecb0

•

1 Parent(s): 1343bb5

Update README_code_model.md

Files changed (1) hide show

README_code_model.md +4 -4

README_code_model.md CHANGED Viewed

@@ -71,7 +71,7 @@ aiming to assess perceptual and cognitive capability of MLLMs within 14 sub-task
 | CrystalChat-7B  | 1456.53 | **308.21** | 86.96 | 67.77 | **57.84** |
 | Vicuna-7B | **1481.12** | 302.85 | **87.174** | **67.97** | 56.49   |
-*Table 1: Comparison of different LLM backbones on visual language understanding benchmarks. All models are instruction-tuned on the general domain data (i.e. LLaVA)*
@@ -103,7 +103,7 @@ The dataset chosen was created by LLaVA with academic-task-oriented VQA data mix
 | VG [25]       | 86K  | Provide the bounding box coordinate of the region this sentence describes. |
 | **Total**     | **665K** |                                                                      |
-*Table 2. Instruction-following Data Mixture of LLaVA-1.5.*
 #### Web2Code Data
@@ -130,7 +130,7 @@ DWU<sub>R</sub>: We refined the WebSRC question-answer data to improve its quali
 | **Avg DOM Depth** | 5.3±1.0 | 6.5±1.0 |
 | **Avg Unique Tags** | 13.6±2.7 | 13.5±2.5 |
-*Table 3. DWCG is a newly generated GPT-3.5-based dataset, while DWCG<sub>R</sub> is the refined dataset that utilizes WebSight and Pix2Code datasets*
 ### Webpage Understanding Datasets
@@ -140,7 +140,7 @@ DWU<sub>R</sub>: We refined the WebSRC question-answer data to improve its quali
 | **Instruction** | ✓       | ✓               |
 | **Size**         | 243.5K | 51.5K           |
-*Table 4. Distribution of DWU and DWU<sub>R</sub> datasets. Both datasets include high-quality question-answer pairs for webpage understanding.*

 | CrystalChat-7B  | 1456.53 | **308.21** | 86.96 | 67.77 | **57.84** |
 | Vicuna-7B | **1481.12** | 302.85 | **87.174** | **67.97** | 56.49   |
+**Table 3:** Comparison of different LLM backbones on visual language understanding benchmarks. All models are instruction-tuned on the general domain data (i.e. LLaVA)*
 | VG [25]       | 86K  | Provide the bounding box coordinate of the region this sentence describes. |
 | **Total**     | **665K** |                                                                      |
+**Table 4:** Instruction-following Data Mixture of LLaVA-1.5.*
 #### Web2Code Data
 | **Avg DOM Depth** | 5.3±1.0 | 6.5±1.0 |
 | **Avg Unique Tags** | 13.6±2.7 | 13.5±2.5 |
+**Table 5:** DWCG is a newly generated GPT-3.5-based dataset, while DWCG<sub>R</sub> is the refined dataset that utilizes WebSight and Pix2Code datasets*
 ### Webpage Understanding Datasets
 | **Instruction** | ✓       | ✓               |
 | **Size**         | 243.5K | 51.5K           |
+**Table 6:** Distribution of DWU and DWU<sub>R</sub> datasets. Both datasets include high-quality question-answer pairs for webpage understanding.*