qazimbhat1
commited on
Commit
•
262ecb0
1
Parent(s):
1343bb5
Update README_code_model.md
Browse files- README_code_model.md +4 -4
README_code_model.md
CHANGED
@@ -71,7 +71,7 @@ aiming to assess perceptual and cognitive capability of MLLMs within 14 sub-task
|
|
71 |
| CrystalChat-7B | 1456.53 | **308.21** | 86.96 | 67.77 | **57.84** |
|
72 |
| Vicuna-7B | **1481.12** | 302.85 | **87.174** | **67.97** | 56.49 |
|
73 |
|
74 |
-
|
75 |
|
76 |
|
77 |
|
@@ -103,7 +103,7 @@ The dataset chosen was created by LLaVA with academic-task-oriented VQA data mix
|
|
103 |
| VG [25] | 86K | Provide the bounding box coordinate of the region this sentence describes. |
|
104 |
| **Total** | **665K** | |
|
105 |
|
106 |
-
|
107 |
|
108 |
#### Web2Code Data
|
109 |
|
@@ -130,7 +130,7 @@ DWU<sub>R</sub>: We refined the WebSRC question-answer data to improve its quali
|
|
130 |
| **Avg DOM Depth** | 5.3±1.0 | 6.5±1.0 |
|
131 |
| **Avg Unique Tags** | 13.6±2.7 | 13.5±2.5 |
|
132 |
|
133 |
-
|
134 |
|
135 |
|
136 |
### Webpage Understanding Datasets
|
@@ -140,7 +140,7 @@ DWU<sub>R</sub>: We refined the WebSRC question-answer data to improve its quali
|
|
140 |
| **Instruction** | ✓ | ✓ |
|
141 |
| **Size** | 243.5K | 51.5K |
|
142 |
|
143 |
-
|
144 |
|
145 |
|
146 |
|
|
|
71 |
| CrystalChat-7B | 1456.53 | **308.21** | 86.96 | 67.77 | **57.84** |
|
72 |
| Vicuna-7B | **1481.12** | 302.85 | **87.174** | **67.97** | 56.49 |
|
73 |
|
74 |
+
**Table 3:** Comparison of different LLM backbones on visual language understanding benchmarks. All models are instruction-tuned on the general domain data (i.e. LLaVA)*
|
75 |
|
76 |
|
77 |
|
|
|
103 |
| VG [25] | 86K | Provide the bounding box coordinate of the region this sentence describes. |
|
104 |
| **Total** | **665K** | |
|
105 |
|
106 |
+
**Table 4:** Instruction-following Data Mixture of LLaVA-1.5.*
|
107 |
|
108 |
#### Web2Code Data
|
109 |
|
|
|
130 |
| **Avg DOM Depth** | 5.3±1.0 | 6.5±1.0 |
|
131 |
| **Avg Unique Tags** | 13.6±2.7 | 13.5±2.5 |
|
132 |
|
133 |
+
**Table 5:** DWCG is a newly generated GPT-3.5-based dataset, while DWCG<sub>R</sub> is the refined dataset that utilizes WebSight and Pix2Code datasets*
|
134 |
|
135 |
|
136 |
### Webpage Understanding Datasets
|
|
|
140 |
| **Instruction** | ✓ | ✓ |
|
141 |
| **Size** | 243.5K | 51.5K |
|
142 |
|
143 |
+
**Table 6:** Distribution of DWU and DWU<sub>R</sub> datasets. Both datasets include high-quality question-answer pairs for webpage understanding.*
|
144 |
|
145 |
|
146 |
|