RichardErkhov
commited on
Commit
•
54a5fbd
1
Parent(s):
44df8a1
uploaded readme
Browse files
README.md
ADDED
@@ -0,0 +1,175 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Quantization made by Richard Erkhov.
|
2 |
+
|
3 |
+
[Github](https://github.com/RichardErkhov)
|
4 |
+
|
5 |
+
[Discord](https://discord.gg/pvy7H8DZMG)
|
6 |
+
|
7 |
+
[Request more models](https://github.com/RichardErkhov/quant_request)
|
8 |
+
|
9 |
+
|
10 |
+
llama2_tifa_question_generation - GGUF
|
11 |
+
- Model creator: https://huggingface.co/tifa-benchmark/
|
12 |
+
- Original model: https://huggingface.co/tifa-benchmark/llama2_tifa_question_generation/
|
13 |
+
|
14 |
+
|
15 |
+
| Name | Quant method | Size |
|
16 |
+
| ---- | ---- | ---- |
|
17 |
+
| [llama2_tifa_question_generation.Q2_K.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q2_K.gguf) | Q2_K | 2.36GB |
|
18 |
+
| [llama2_tifa_question_generation.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.IQ3_XS.gguf) | IQ3_XS | 2.6GB |
|
19 |
+
| [llama2_tifa_question_generation.IQ3_S.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.IQ3_S.gguf) | IQ3_S | 2.75GB |
|
20 |
+
| [llama2_tifa_question_generation.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q3_K_S.gguf) | Q3_K_S | 2.75GB |
|
21 |
+
| [llama2_tifa_question_generation.IQ3_M.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.IQ3_M.gguf) | IQ3_M | 2.9GB |
|
22 |
+
| [llama2_tifa_question_generation.Q3_K.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q3_K.gguf) | Q3_K | 3.07GB |
|
23 |
+
| [llama2_tifa_question_generation.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q3_K_M.gguf) | Q3_K_M | 3.07GB |
|
24 |
+
| [llama2_tifa_question_generation.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q3_K_L.gguf) | Q3_K_L | 3.35GB |
|
25 |
+
| [llama2_tifa_question_generation.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.IQ4_XS.gguf) | IQ4_XS | 3.4GB |
|
26 |
+
| [llama2_tifa_question_generation.Q4_0.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q4_0.gguf) | Q4_0 | 3.56GB |
|
27 |
+
| [llama2_tifa_question_generation.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.IQ4_NL.gguf) | IQ4_NL | 3.58GB |
|
28 |
+
| [llama2_tifa_question_generation.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q4_K_S.gguf) | Q4_K_S | 3.59GB |
|
29 |
+
| [llama2_tifa_question_generation.Q4_K.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q4_K.gguf) | Q4_K | 3.8GB |
|
30 |
+
| [llama2_tifa_question_generation.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q4_K_M.gguf) | Q4_K_M | 3.8GB |
|
31 |
+
| [llama2_tifa_question_generation.Q4_1.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q4_1.gguf) | Q4_1 | 3.95GB |
|
32 |
+
| [llama2_tifa_question_generation.Q5_0.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q5_0.gguf) | Q5_0 | 4.33GB |
|
33 |
+
| [llama2_tifa_question_generation.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q5_K_S.gguf) | Q5_K_S | 4.33GB |
|
34 |
+
| [llama2_tifa_question_generation.Q5_K.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q5_K.gguf) | Q5_K | 4.45GB |
|
35 |
+
| [llama2_tifa_question_generation.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q5_K_M.gguf) | Q5_K_M | 4.45GB |
|
36 |
+
| [llama2_tifa_question_generation.Q5_1.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q5_1.gguf) | Q5_1 | 4.72GB |
|
37 |
+
| [llama2_tifa_question_generation.Q6_K.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q6_K.gguf) | Q6_K | 5.15GB |
|
38 |
+
| [llama2_tifa_question_generation.Q8_0.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q8_0.gguf) | Q8_0 | 6.67GB |
|
39 |
+
|
40 |
+
|
41 |
+
|
42 |
+
|
43 |
+
Original model description:
|
44 |
+
---
|
45 |
+
license: apache-2.0
|
46 |
+
inference: true
|
47 |
+
widget:
|
48 |
+
- text: "<s>[INST] <<SYS>>\nGiven an image description, generate one or two multiple-choice questions that verifies if the image description is correct.\nClassify each concept into a type (object, human, animal, food, activity, attribute, counting, color, material, spatial, location, shape, other), and then generate a question for each type.\n\n<</SYS>>\n\nDescription: a blue rabbit and a red plane [/INST] Entities:"
|
49 |
+
pipeline_tag: text-generation
|
50 |
+
tags:
|
51 |
+
- text-generation-inference
|
52 |
+
- llama2
|
53 |
+
- text-to-image
|
54 |
+
datasets:
|
55 |
+
- TIFA
|
56 |
+
language:
|
57 |
+
- en
|
58 |
+
---
|
59 |
+
Project page: <https://tifa-benchmark.github.io/>
|
60 |
+
|
61 |
+
This is the text parsing and question generation model for the ICCV 2023 paper [TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering](https://arxiv.org/abs/2303.11897)
|
62 |
+
|
63 |
+
We introduce TIFA (Text-to-Image Faithfulness evaluation with question Answering), an automatic evaluation metric that measures the faithfulness of a generated image to its text input via visual question answering (VQA). Specifically, given a text input, we automatically generate several question-answer pairs using a language model. We calculate image faithfulness by checking whether existing VQA models can answer these questions using the generated image.
|
64 |
+
|
65 |
+
Specifically, this fine-tuned LLaMA 2 model is the substitute for the GPT-3 model in the paper. It can parse an arbitrary prompt into visual entities, attributes, relations, etc. and generate question-answer tuples for each of them. See examples below.
|
66 |
+
|
67 |
+
|
68 |
+
# QuickStart
|
69 |
+
|
70 |
+
All codes are from <https://github.com/Yushi-Hu/tifa>. Clone this repo to easily use this model together with other modules (e.g. VQA) provided in TIFA.
|
71 |
+
|
72 |
+
Please follow the prompt format, which will give the best performance.
|
73 |
+
|
74 |
+
|
75 |
+
```python
|
76 |
+
import torch
|
77 |
+
import transformers
|
78 |
+
|
79 |
+
# prepare the LLaMA 2 model
|
80 |
+
model_name = "tifa-benchmark/llama2_tifa_question_generation"
|
81 |
+
pipeline = transformers.pipeline(
|
82 |
+
"text-generation",
|
83 |
+
model=model_name,
|
84 |
+
torch_dtype=torch.float16,
|
85 |
+
device_map="auto",
|
86 |
+
)
|
87 |
+
|
88 |
+
|
89 |
+
# formating prompt following LLaMA 2 style
|
90 |
+
def create_qg_prompt(caption):
|
91 |
+
INTRO_BLURB = "Given an image description, generate one or two multiple-choice questions that verifies if the image description is correct.\nClassify each concept into a type (object, human, animal, food, activity, attribute, counting, color, material, spatial, location, shape, other), and then generate a question for each type.\n"
|
92 |
+
formated_prompt = f"<s>[INST] <<SYS>>\n{INTRO_BLURB}\n<</SYS>>\n\n"
|
93 |
+
formated_prompt += f"Description: {caption} [/INST] Entities:"
|
94 |
+
return formated_prompt
|
95 |
+
|
96 |
+
|
97 |
+
test_caption = "a blue rabbit and a red plane"
|
98 |
+
|
99 |
+
# create prompt
|
100 |
+
prompt = create_qg_prompt(text_caption)
|
101 |
+
|
102 |
+
# text completion
|
103 |
+
sequences = pipeline(
|
104 |
+
prompt, do_sample=False, num_beams=5, num_return_sequences=1, max_length=512)
|
105 |
+
output = sequences[0]['generated_text'][len(prompt):]
|
106 |
+
output = output.split('\n\n')[0]
|
107 |
+
|
108 |
+
# output
|
109 |
+
print(output)
|
110 |
+
|
111 |
+
#### Expected output ###
|
112 |
+
# rabbit, plane
|
113 |
+
# Activites:
|
114 |
+
# Colors: blue, red
|
115 |
+
# Counting:
|
116 |
+
# Other attributes:
|
117 |
+
# About rabbit (animal):
|
118 |
+
# Q: is this a rabbit?
|
119 |
+
# Choices: yes, no
|
120 |
+
# A: yes
|
121 |
+
# About rabbit (animal):
|
122 |
+
# Q: what animal is in the picture?
|
123 |
+
# Choices: rabbit, dog, cat, fish
|
124 |
+
# A: rabbit
|
125 |
+
# About plane (object):
|
126 |
+
# Q: is this a plane?
|
127 |
+
# Choices: yes, no
|
128 |
+
# A: yes
|
129 |
+
# About plane (object):
|
130 |
+
# Q: what type of vehicle is this?
|
131 |
+
# Choices: plane, car, motorcycle, bus
|
132 |
+
# A: plane
|
133 |
+
# About blue (color):
|
134 |
+
# Q: is the rabbit blue?
|
135 |
+
# Choices: yes, no
|
136 |
+
# A: yes
|
137 |
+
# About blue (color):
|
138 |
+
# Q: what color is the rabbit?
|
139 |
+
# Choices: blue, red, yellow, green
|
140 |
+
# A: blue
|
141 |
+
# About red (color):
|
142 |
+
# Q: is the plane red?
|
143 |
+
# Choices: yes, no
|
144 |
+
# A: yes
|
145 |
+
# About red (color):
|
146 |
+
# Q: what color is the plane?
|
147 |
+
# Choices: red, blue, yellow, green
|
148 |
+
# A: red
|
149 |
+
```
|
150 |
+
|
151 |
+
# Use this LM under tifascore package
|
152 |
+
|
153 |
+
tifascore provides extra functions to parse this output etc. First install tifascore according to <https://github.com/Yushi-Hu/tifa>. Then the usage is below
|
154 |
+
|
155 |
+
```python
|
156 |
+
from tifascore import get_llama2_pipeline, get_llama2_question_and_answers
|
157 |
+
|
158 |
+
pipeline = get_llama2_pipeline("tifa-benchmark/llama2_tifa_question_generation")
|
159 |
+
|
160 |
+
print(get_llama2_question_and_answers(pipeline, "a blue rabbit and a red plane"))
|
161 |
+
|
162 |
+
#### Expected output ###
|
163 |
+
# [{'caption': 'a blue rabbit and a red plane', 'element': 'rabbit', 'question': 'what animal is in the picture?', 'choices': ['rabbit', 'dog', 'cat', 'fish'], 'answer': 'rabbit', 'element_type': 'animal/human'}, {'caption': 'a blue rabbit and a red plane', 'element': 'plane', 'question': 'is this a plane?', 'choices': ['yes', 'no'], 'answer': 'yes', 'element_type': 'object'}, {'caption': 'a blue rabbit and a red plane', 'element': 'plane', 'question': 'what type of vehicle is this?', 'choices': ['plane', 'car', 'motorcycle', 'bus'], 'answer': 'plane', 'element_type': 'object'}, {'caption': 'a blue rabbit and a red plane', 'element': 'blue', 'question': 'is the rabbit blue?', 'choices': ['yes', 'no'], 'answer': 'yes', 'element_type': 'color'}, {'caption': 'a blue rabbit and a red plane', 'element': 'blue', 'question': 'what color is the rabbit?', 'choices': ['blue', 'red', 'yellow', 'green'], 'answer': 'blue', 'element_type': 'color'}, {'caption': 'a blue rabbit and a red plane', 'element': 'red', 'question': 'is the plane red?', 'choices': ['yes', 'no'], 'answer': 'yes', 'element_type': 'color'}, {'caption': 'a blue rabbit and a red plane', 'element': 'red', 'question': 'what color is the plane?', 'choices': ['red', 'blue', 'yellow', 'green'], 'answer': 'red', 'element_type': 'color'}]
|
164 |
+
```
|
165 |
+
|
166 |
+
## Bibtex
|
167 |
+
```
|
168 |
+
@article{hu2023tifa,
|
169 |
+
title={Tifa: Accurate and interpretable text-to-image faithfulness evaluation with question answering},
|
170 |
+
author={Hu, Yushi and Liu, Benlin and Kasai, Jungo and Wang, Yizhong and Ostendorf, Mari and Krishna, Ranjay and Smith, Noah A},
|
171 |
+
journal={arXiv preprint arXiv:2303.11897},
|
172 |
+
year={2023}
|
173 |
+
}
|
174 |
+
```
|
175 |
+
|