English
retrieval
question answering
File size: 8,464 Bytes
4291eca
 
 
 
 
 
 
 
 
 
 
 
e351c5d
4291eca
 
 
 
 
 
 
 
e351c5d
4291eca
 
 
 
 
553f2f6
4291eca
 
 
 
 
 
e351c5d
4291eca
 
 
 
 
41abdb4
4291eca
 
 
 
 
 
 
 
 
553f2f6
4291eca
553f2f6
 
 
4291eca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0e791a6
4291eca
 
 
 
0e791a6
 
 
4291eca
 
 
0e791a6
4291eca
0e791a6
 
 
4291eca
 
 
 
 
 
 
 
e351c5d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
---
license: cc
language:
- en
base_model:
- intfloat/e5-base-v2
tags:
- retrieval
- question answering
---

<div align="center">
  <img src="https://github.com/SapienzaNLP/zebra/blob/master/assets/zebra.png?raw=true" width="100" height="100">
</div>

<div align="center">
  <h1>ZEBRA: Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering</h1>
</div>

<div style="display:flex; justify-content: center; align-items: center; flex-direction: row;">
    <a href="https://2024.emnlp.org/"><img src="https://img.shields.io/badge/EMNLP-2024-4b44ce"></a> &nbsp; &nbsp; 
    <a href="https://arxiv.org/abs/2410.05077"><img src="https://img.shields.io/badge/arXiv-paper-b31b1b.svg"></a> &nbsp; &nbsp; 
    <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/"><img src="https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-lightgrey.svg"></a> &nbsp; &nbsp;
    <a href="https://huggingface.co./collections/sapienzanlp/zebra-66e3ec50c8ce415ea7572d0e"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Collection-FCD21D"></a> &nbsp; &nbsp;
    <a href="https://github.com/SapienzaNLP/zebra"><img src="https://img.shields.io/badge/GitHub-Repo-121013?logo=github&logoColor=white"></a> &nbsp; &nbsp;
</div>

<div align="center"> A retrieval augmentation framework for zero-shot commonsense question answering with LLMs. </div>

## ๐Ÿ› ๏ธ Installation

Installation from PyPi

```bash
pip install zebra-qa
```

Installation from source

```bash
git clone https://github.com/sapienzanlp/zebra.git
cd zebra
conda create -n zebra python==3.10
conda activate zebra
pip install -e .
```

## ๐Ÿš€ Quick Start

ZEBRA is a plug-and-play retrieval augmentation framework for **Commonsense Question Answering**. \
It is composed of three pipeline stages: *example retrieval*, *knowledge generation* and *informed reasoning*.

- Example retrieval: given a question, we retrieve relevant examples of question-knowledge pairs from a large collection
- Knowledge generation: we prompt an LLM to generate useful explanations for the given input question by leveraging the relationships in the retrieved question-knowledge pairs.
- Informed reasoning: we prompt the same LLM for the question answering task by taking advantage of the previously generated explanations.

Here is an example of how to use ZEBRA for question answering:

```python
from zebra import Zebra

# Load Zebra with language model, retriever, document index and explanations.
zebra = Zebra(
  model="meta-llama/Meta-Llama-3-8B-Instruct",
  retriever="sapienzanlp/zebra-retriever-e5-base-v2",
  document_index="sapienzanlp/zebra-kb"
)

# Provide a question and answer choices.
questions = [
    "What should you do if you see someone hurt and in need of help?",
    "If your friend is upset, what is the best way to support them?",
    "What should you do if your phone battery is running low in a public place?",
    "What should you do if you are running late for an important meeting?",
]

choices = [
    ["Walk away.", "Call for help.", "Take a photo for social media."],
    ["Listen to them and offer comfort.", "Tell them they are overreacting.", "Ignore them and walk away."],
    ["Borrow a stranger's phone.", "Use public charging station.", "Leave your phone unattended while it charges."],
    ["Rush through traffic.", "Call and inform them you will be late.", "Do not show up at all."],
]

# Generate knowledge and perform question answering.
zebra_output = zebra.pipeline(questions=questions, choices=choices)
```

The output contains, for each question, a list of generated explanations and the predicted answer:

```bash
  ZebraOutput(
    explanations=[
      [
        "Walking away would be neglecting the person's need for help and potentially putting them in danger.",
        'Calling for help, such as 911, is the most effective way to get the person the assistance they need.',
        "Taking a photo for social media might spread awareness, but it's not a direct way to help the person in need."
      ],
      [
        'Listening and offering comfort shows empathy and understanding.', 
        "Telling someone they're overreacting can be dismissive and unhelpful.", 
        'Ignoring someone in distress can be hurtful and unkind.'
      ],
      [
        "Borrow a stranger's phone: Unwise, as it's a security risk and may lead to theft or damage.", 
        "Use public charging station: Safe and convenient, as it's a designated charging area.", 
        'Leave your phone unattended while it charges: Not recommended, as it may be stolen or damaged.'
      ],
      [
        'Rush through traffic: This option is risky and may lead to accidents or stress.', 
        'Call and inform them you will be late: This is the most likely option, as it shows respect for the meeting and allows for adjustments.', 
        'Do not show up at all: This is unacceptable, as it shows disrespect for the meeting and may damage relationships.'
      ],
    ],
    answers=[
      "Call for help.",
      "Listen to them and offer comfort.",
      "Use public charging station.",
      "Call and inform them you will be late."
    ],
  )
```

You can also call the `zebra.pipeline` method with the `return_dict` parameter set to `True` to ask ZEBRA to return also the retrieved examples along with their explanations.

## Models and Data

Models and data can be found at the following [HuggingFace Collection ๐Ÿค—](https://huggingface.co./collections/sapienzanlp/zebra-66e3ec50c8ce415ea7572d0e).

## ๐Ÿ“Š Performance

We evaluate the performance of ZEBRA on 8 well-established commonsense question answering datasets. The following table shows the results (accuracy) of the models before / after the application of ZEBRA.

|          Model           |       CSQA      |      ARC-C      |      ARC-E      |       OBQA      |       PIQA      |       QASC      |      CSQA2      |        WG       |       AVG       |  
| ------------------------ | --------------- | --------------- | --------------- | --------------- | --------------- | --------------- | --------------- | --------------- | --------------- | 
| Mistral-7B-Instruct-v0.2 | 68.2 / **73.3** | 72.4	/ **75.2** | 85.8	/ **87.4** | 68.8	/ **75.8** | 76.1	/ **80.2** | 66.1	/ **68.3** | 58.5	/ **67.5** | 55.8 / **60.7** | 68.9 / **73.5** |
| Phi3-small-8k-Instruct   | 77.2 / **80.9** | 90.4 / **91.6** | 96.9	/ **97.7** | 90.4	/ **91.2** | 86.6	/ **88.1** | **83.5**	/ 81.0 | 68.0	/ **74.6** | 79.1	/ **81.0** | 84.0 / **85.8** | 
| Meta-Llama-3-8b-Instruct | 73.9 / **78.7** | 79.4 / **83.5** | 91.7	/ **92.9** | 73.4	/ **79.6** | 78.3	/ **84.0** | 78.2	/ **79.1** | 64.3	/ **69.4** | 56.2	/ **63.2** | 74.4 / **78.8** | 
| Phi3-mini-128k-Instruct  | 73.4 / **74.8** | 85.7	/ **88.0** | 95.4	/ **96.0** | 82.8	/ **87.8** | 80.4	/ **84.2** | **74.7**	/ 73.9 | 59.3	/ **64.6** | 67.3	/ **72.9** | 77.4 / **80.5** | 

You can also download the official paper results at the following [Google Drive Link](https://drive.google.com/file/d/1l7bY-TkqnmVQn5M5ynQfT-0upMcRlMnT/view?usp=drive_link).

## Cite this work

If you use any part of this work, please consider citing the paper as follows:

```bibtex
@inproceedings{molfese-etal-2024-zebra,
    title = "{ZEBRA}: Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering",
    author = "Molfese, Francesco Maria  and
      Conia, Simone  and
      Orlando, Riccardo  and
      Navigli, Roberto",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-main.1251",
    doi = "10.18653/v1/2024.emnlp-main.1251",
    pages = "22429--22444"
}
```

## ๐Ÿชช License

The data and software are licensed under [Creative Commons Attribution-NonCommercial-ShareAlike 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/).

## Acknowledgements
We gratefully acknowledge CREATIVE (CRoss-modalunderstanding and gEnerATIon of Visual and tExtual content) for supporting this work. Simone Conia gratefully acknowledges the support of Future AI Research ([PNRR MUR project PE0000013-FAIR](https://fondazione-fair.it/en/)), which fully funds his fellowship at Sapienza University of Rome since October 2023.