victormiller commited on
Commit
a1dc112
1 Parent(s): fbe1b69

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -23
README.md CHANGED
@@ -19,6 +19,24 @@ datasets:
19
  CrystalChat-7B based multi-modal large language model (MLLM) mimics the training recipe used for Vicuna-7B based [LLaVa-v1.5](https://huggingface.co/docs/transformers/main/model_doc/llava). CrystalChat-7B based MLLMs models are entirely transparent, having open-sourced all materials, including code, data, model checkpoint, intermediate results, and more at [Web2Code: A Large-scale Webpage-to-Code Dataset
20
  and Evaluation Framework for Multimodal LLMs](https://arxiv.org/pdf/2406.20098). CrystalChat-7B-Web2Code MLLM is specialized in webpage images-to-html code generation.
21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  ## Web2Code Dataset
23
  Our Web2Code instruction tuning dataset construction and instruction generation process
24
  involves four key components:
@@ -142,29 +160,6 @@ The dataset chosen was created by LLaVA with academic-task-oriented VQA data mix
142
 
143
  **Table 6:** Distribution of DWU and DWU<sub>R</sub> datasets. Both datasets include high-quality question-answer pairs for webpage understanding.*
144
 
145
-
146
-
147
- ## Examples
148
-
149
-
150
-
151
- **Example 1: Hand drawn images**
152
-
153
- | ![Image 1](images2/handdrawn.png) | ![Image 2](images2/crystal.png) |
154
- |:----------------------:|:----------------------:|
155
- | Hand Drawn Webpage | CrystalChat-Web2Code Rendering |
156
-
157
-
158
- **Example 2: Recreate a webpage from an image**
159
- Image 1: Original Webpage
160
- <center><img src="images2/ori.png" alt="k2 eval table" /></center>
161
-
162
- Image 2: CrystalChat-Web2Code Rendering
163
- <center><img src="images2/crystalchat.png" alt="k2 eval table" /></center>
164
-
165
-
166
- **Image 3:** Hand-drawn webpage input to CrystalChat-7B-Web2Code generated output.
167
-
168
  ## Loading Crystal
169
  ```python
170
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
19
  CrystalChat-7B based multi-modal large language model (MLLM) mimics the training recipe used for Vicuna-7B based [LLaVa-v1.5](https://huggingface.co/docs/transformers/main/model_doc/llava). CrystalChat-7B based MLLMs models are entirely transparent, having open-sourced all materials, including code, data, model checkpoint, intermediate results, and more at [Web2Code: A Large-scale Webpage-to-Code Dataset
20
  and Evaluation Framework for Multimodal LLMs](https://arxiv.org/pdf/2406.20098). CrystalChat-7B-Web2Code MLLM is specialized in webpage images-to-html code generation.
21
 
22
+
23
+ ## CrystalChat-Web2Code Features
24
+
25
+ **Covert hand-drawn images to a website**
26
+
27
+ | ![Image 1](images2/handdrawn.png) | ![Image 2](images2/crystal.png) |
28
+ |:----------------------:|:----------------------:|
29
+ | Hand Drawn Webpage | CrystalChat-Web2Code Rendering |
30
+
31
+
32
+ **Recreate a new webpage from an existing webpage**
33
+ Image 1: Original Webpage
34
+ <center><img src="images2/ori.png" alt="k2 eval table" /></center>
35
+
36
+ Image 2: CrystalChat-Web2Code Rendering
37
+ <center><img src="images2/crystalchat.png" alt="k2 eval table" /></center>
38
+
39
+
40
  ## Web2Code Dataset
41
  Our Web2Code instruction tuning dataset construction and instruction generation process
42
  involves four key components:
 
160
 
161
  **Table 6:** Distribution of DWU and DWU<sub>R</sub> datasets. Both datasets include high-quality question-answer pairs for webpage understanding.*
162
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
163
  ## Loading Crystal
164
  ```python
165
  from transformers import AutoModelForCausalLM, AutoTokenizer