victormiller
commited on
Commit
•
a1dc112
1
Parent(s):
fbe1b69
Update README.md
Browse files
README.md
CHANGED
@@ -19,6 +19,24 @@ datasets:
|
|
19 |
CrystalChat-7B based multi-modal large language model (MLLM) mimics the training recipe used for Vicuna-7B based [LLaVa-v1.5](https://huggingface.co/docs/transformers/main/model_doc/llava). CrystalChat-7B based MLLMs models are entirely transparent, having open-sourced all materials, including code, data, model checkpoint, intermediate results, and more at [Web2Code: A Large-scale Webpage-to-Code Dataset
|
20 |
and Evaluation Framework for Multimodal LLMs](https://arxiv.org/pdf/2406.20098). CrystalChat-7B-Web2Code MLLM is specialized in webpage images-to-html code generation.
|
21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
## Web2Code Dataset
|
23 |
Our Web2Code instruction tuning dataset construction and instruction generation process
|
24 |
involves four key components:
|
@@ -142,29 +160,6 @@ The dataset chosen was created by LLaVA with academic-task-oriented VQA data mix
|
|
142 |
|
143 |
**Table 6:** Distribution of DWU and DWU<sub>R</sub> datasets. Both datasets include high-quality question-answer pairs for webpage understanding.*
|
144 |
|
145 |
-
|
146 |
-
|
147 |
-
## Examples
|
148 |
-
|
149 |
-
|
150 |
-
|
151 |
-
**Example 1: Hand drawn images**
|
152 |
-
|
153 |
-
| ![Image 1](images2/handdrawn.png) | ![Image 2](images2/crystal.png) |
|
154 |
-
|:----------------------:|:----------------------:|
|
155 |
-
| Hand Drawn Webpage | CrystalChat-Web2Code Rendering |
|
156 |
-
|
157 |
-
|
158 |
-
**Example 2: Recreate a webpage from an image**
|
159 |
-
Image 1: Original Webpage
|
160 |
-
<center><img src="images2/ori.png" alt="k2 eval table" /></center>
|
161 |
-
|
162 |
-
Image 2: CrystalChat-Web2Code Rendering
|
163 |
-
<center><img src="images2/crystalchat.png" alt="k2 eval table" /></center>
|
164 |
-
|
165 |
-
|
166 |
-
**Image 3:** Hand-drawn webpage input to CrystalChat-7B-Web2Code generated output.
|
167 |
-
|
168 |
## Loading Crystal
|
169 |
```python
|
170 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
19 |
CrystalChat-7B based multi-modal large language model (MLLM) mimics the training recipe used for Vicuna-7B based [LLaVa-v1.5](https://huggingface.co/docs/transformers/main/model_doc/llava). CrystalChat-7B based MLLMs models are entirely transparent, having open-sourced all materials, including code, data, model checkpoint, intermediate results, and more at [Web2Code: A Large-scale Webpage-to-Code Dataset
|
20 |
and Evaluation Framework for Multimodal LLMs](https://arxiv.org/pdf/2406.20098). CrystalChat-7B-Web2Code MLLM is specialized in webpage images-to-html code generation.
|
21 |
|
22 |
+
|
23 |
+
## CrystalChat-Web2Code Features
|
24 |
+
|
25 |
+
**Covert hand-drawn images to a website**
|
26 |
+
|
27 |
+
| ![Image 1](images2/handdrawn.png) | ![Image 2](images2/crystal.png) |
|
28 |
+
|:----------------------:|:----------------------:|
|
29 |
+
| Hand Drawn Webpage | CrystalChat-Web2Code Rendering |
|
30 |
+
|
31 |
+
|
32 |
+
**Recreate a new webpage from an existing webpage**
|
33 |
+
Image 1: Original Webpage
|
34 |
+
<center><img src="images2/ori.png" alt="k2 eval table" /></center>
|
35 |
+
|
36 |
+
Image 2: CrystalChat-Web2Code Rendering
|
37 |
+
<center><img src="images2/crystalchat.png" alt="k2 eval table" /></center>
|
38 |
+
|
39 |
+
|
40 |
## Web2Code Dataset
|
41 |
Our Web2Code instruction tuning dataset construction and instruction generation process
|
42 |
involves four key components:
|
|
|
160 |
|
161 |
**Table 6:** Distribution of DWU and DWU<sub>R</sub> datasets. Both datasets include high-quality question-answer pairs for webpage understanding.*
|
162 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
163 |
## Loading Crystal
|
164 |
```python
|
165 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|