File size: 3,625 Bytes
fcee673
 
 
 
 
159bc5d
6d700c5
159bc5d
6d700c5
fed7c37
23163a6
fed7c37
6d700c5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f9e96da
 
 
6d700c5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
license: apache-2.0
language:
- en
---
# InterleavedBench (EMNLP'24 Main Conference)

This is the official huggingface repo for the paper "**Holistic Evaluation for Interleaved Text-and-Image Generation**" accepted in EMNLP 2024 Main Conference.

**Paper: https://arxiv.org/abs/2406.14643**

**Website: https://vt-nlp.github.io/InterleavedEval/**

## How to use InterleavedBench

### Repo hierarchy

- `interleaved_bench.json` is the main json file of the dataset. 
- `zipped_images` is the directory of zipped images for each subset, including the images for the context and ground truths.
- `src/interleavedeval_gpt4o.py` is the python script for InterleavedEval with GPT-4o. Its input is the model prediction file.

### To get started

- unzip the images files under `zipped_images`.
- Run the inference on `interleaved_bench.json` with your model and get your model output (including text and image).
- Use the script in `src/interleavedeval_gpt4o.py` to perform evaluation.

### Important notes
- For image editing and subject-driven generation tasks, the scores on text-related aspects (text quality, text-image coherence) are directly set to 0. Please skip those scores when you compute the overall performance.


One example in `interleaved_bench.json` is as follows:

```
{
    "id": "wikihow_next_step_0_489157",
    "image": [
        "wiki_images_test/489157_0_0.png",
        "wiki_images_test/489157_0_1.png",
        "wiki_images_test/489157_0_2.png",
        "wiki_images_test/489157_0_3.png",
        "wiki_images_test/489157_0_4.png"
    ],
    "task_name": "wikihow_next_step",
    "conversations": [
        {
            "from": "human",
            "value": "In this task, you are given a high-level goal 'How to Make a Banana Shake': Banana shakes are a tasty way to get a lot of nutrients all at once. Bananas provide a creamy, smooth texture when turned into a drink. Bananas also fill empty stomachs, staving off hunger pangs and giving you a nice energy burst. In this article you'll find a few ways to make banana shakes, among the many possibilities. \n  You need to assist human user to complete this task via making a banana shake with kefir. Given the previous steps, you need to predict the subsequent 4 steps to help the user to finish the task. The previous steps are: \n <BEGIN>  Put 2 to 3 bananas in a bowl. <image>\n"
        },
        {
            "from": "gpt",
            "value": "Now put in a liter of kefir and a teaspoon of sugar. <image>\n Put 1 cup milk into the mix. <image>\n Using a blender, blend all ingredients together. <image>\n Relax with your fresh banana smoothie! <image>\n"
        }
    ],
    "goal": "How to Make a Banana Shake",
    "category": [
        "Food and Entertaining",
        "Drinks",
        "Smoothies Shakes and Milk",
        "Fruit Based Shakes"
    ],
    "dataset_id": "wikihow_selected_test_uni"
},
```

### Reference

If you find our work useful or interesting, please cite:
```
@article{liu_holistic_2024,
  author       = {Minqian Liu and
                  Zhiyang Xu and
                  Zihao Lin and
                  Trevor Ashby and
                  Joy Rimchala and
                  Jiaxin Zhang and
                  Lifu Huang},
  title        = {Holistic Evaluation for Interleaved Text-and-Image Generation},
  journal      = {CoRR},
  volume       = {abs/2406.14643},
  year         = {2024},
  url          = {https://doi.org/10.48550/arXiv.2406.14643},
  doi          = {10.48550/ARXIV.2406.14643},
  eprinttype    = {arXiv},
  eprint       = {2406.14643},
  timestamp    = {Tue, 16 Jul 2024 16:17:50 +0200}
}
```