RichardErkhov commited on
Commit
54a5fbd
1 Parent(s): 44df8a1

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +175 -0
README.md ADDED
@@ -0,0 +1,175 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ llama2_tifa_question_generation - GGUF
11
+ - Model creator: https://huggingface.co/tifa-benchmark/
12
+ - Original model: https://huggingface.co/tifa-benchmark/llama2_tifa_question_generation/
13
+
14
+
15
+ | Name | Quant method | Size |
16
+ | ---- | ---- | ---- |
17
+ | [llama2_tifa_question_generation.Q2_K.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q2_K.gguf) | Q2_K | 2.36GB |
18
+ | [llama2_tifa_question_generation.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.IQ3_XS.gguf) | IQ3_XS | 2.6GB |
19
+ | [llama2_tifa_question_generation.IQ3_S.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.IQ3_S.gguf) | IQ3_S | 2.75GB |
20
+ | [llama2_tifa_question_generation.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q3_K_S.gguf) | Q3_K_S | 2.75GB |
21
+ | [llama2_tifa_question_generation.IQ3_M.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.IQ3_M.gguf) | IQ3_M | 2.9GB |
22
+ | [llama2_tifa_question_generation.Q3_K.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q3_K.gguf) | Q3_K | 3.07GB |
23
+ | [llama2_tifa_question_generation.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q3_K_M.gguf) | Q3_K_M | 3.07GB |
24
+ | [llama2_tifa_question_generation.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q3_K_L.gguf) | Q3_K_L | 3.35GB |
25
+ | [llama2_tifa_question_generation.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.IQ4_XS.gguf) | IQ4_XS | 3.4GB |
26
+ | [llama2_tifa_question_generation.Q4_0.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q4_0.gguf) | Q4_0 | 3.56GB |
27
+ | [llama2_tifa_question_generation.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.IQ4_NL.gguf) | IQ4_NL | 3.58GB |
28
+ | [llama2_tifa_question_generation.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q4_K_S.gguf) | Q4_K_S | 3.59GB |
29
+ | [llama2_tifa_question_generation.Q4_K.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q4_K.gguf) | Q4_K | 3.8GB |
30
+ | [llama2_tifa_question_generation.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q4_K_M.gguf) | Q4_K_M | 3.8GB |
31
+ | [llama2_tifa_question_generation.Q4_1.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q4_1.gguf) | Q4_1 | 3.95GB |
32
+ | [llama2_tifa_question_generation.Q5_0.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q5_0.gguf) | Q5_0 | 4.33GB |
33
+ | [llama2_tifa_question_generation.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q5_K_S.gguf) | Q5_K_S | 4.33GB |
34
+ | [llama2_tifa_question_generation.Q5_K.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q5_K.gguf) | Q5_K | 4.45GB |
35
+ | [llama2_tifa_question_generation.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q5_K_M.gguf) | Q5_K_M | 4.45GB |
36
+ | [llama2_tifa_question_generation.Q5_1.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q5_1.gguf) | Q5_1 | 4.72GB |
37
+ | [llama2_tifa_question_generation.Q6_K.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q6_K.gguf) | Q6_K | 5.15GB |
38
+ | [llama2_tifa_question_generation.Q8_0.gguf](https://huggingface.co/RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q8_0.gguf) | Q8_0 | 6.67GB |
39
+
40
+
41
+
42
+
43
+ Original model description:
44
+ ---
45
+ license: apache-2.0
46
+ inference: true
47
+ widget:
48
+ - text: "<s>[INST] <<SYS>>\nGiven an image description, generate one or two multiple-choice questions that verifies if the image description is correct.\nClassify each concept into a type (object, human, animal, food, activity, attribute, counting, color, material, spatial, location, shape, other), and then generate a question for each type.\n\n<</SYS>>\n\nDescription: a blue rabbit and a red plane [/INST] Entities:"
49
+ pipeline_tag: text-generation
50
+ tags:
51
+ - text-generation-inference
52
+ - llama2
53
+ - text-to-image
54
+ datasets:
55
+ - TIFA
56
+ language:
57
+ - en
58
+ ---
59
+ Project page: <https://tifa-benchmark.github.io/>
60
+
61
+ This is the text parsing and question generation model for the ICCV 2023 paper [TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering](https://arxiv.org/abs/2303.11897)
62
+
63
+ We introduce TIFA (Text-to-Image Faithfulness evaluation with question Answering), an automatic evaluation metric that measures the faithfulness of a generated image to its text input via visual question answering (VQA). Specifically, given a text input, we automatically generate several question-answer pairs using a language model. We calculate image faithfulness by checking whether existing VQA models can answer these questions using the generated image.
64
+
65
+ Specifically, this fine-tuned LLaMA 2 model is the substitute for the GPT-3 model in the paper. It can parse an arbitrary prompt into visual entities, attributes, relations, etc. and generate question-answer tuples for each of them. See examples below.
66
+
67
+
68
+ # QuickStart
69
+
70
+ All codes are from <https://github.com/Yushi-Hu/tifa>. Clone this repo to easily use this model together with other modules (e.g. VQA) provided in TIFA.
71
+
72
+ Please follow the prompt format, which will give the best performance.
73
+
74
+
75
+ ```python
76
+ import torch
77
+ import transformers
78
+
79
+ # prepare the LLaMA 2 model
80
+ model_name = "tifa-benchmark/llama2_tifa_question_generation"
81
+ pipeline = transformers.pipeline(
82
+ "text-generation",
83
+ model=model_name,
84
+ torch_dtype=torch.float16,
85
+ device_map="auto",
86
+ )
87
+
88
+
89
+ # formating prompt following LLaMA 2 style
90
+ def create_qg_prompt(caption):
91
+ INTRO_BLURB = "Given an image description, generate one or two multiple-choice questions that verifies if the image description is correct.\nClassify each concept into a type (object, human, animal, food, activity, attribute, counting, color, material, spatial, location, shape, other), and then generate a question for each type.\n"
92
+ formated_prompt = f"<s>[INST] <<SYS>>\n{INTRO_BLURB}\n<</SYS>>\n\n"
93
+ formated_prompt += f"Description: {caption} [/INST] Entities:"
94
+ return formated_prompt
95
+
96
+
97
+ test_caption = "a blue rabbit and a red plane"
98
+
99
+ # create prompt
100
+ prompt = create_qg_prompt(text_caption)
101
+
102
+ # text completion
103
+ sequences = pipeline(
104
+ prompt, do_sample=False, num_beams=5, num_return_sequences=1, max_length=512)
105
+ output = sequences[0]['generated_text'][len(prompt):]
106
+ output = output.split('\n\n')[0]
107
+
108
+ # output
109
+ print(output)
110
+
111
+ #### Expected output ###
112
+ # rabbit, plane
113
+ # Activites:
114
+ # Colors: blue, red
115
+ # Counting:
116
+ # Other attributes:
117
+ # About rabbit (animal):
118
+ # Q: is this a rabbit?
119
+ # Choices: yes, no
120
+ # A: yes
121
+ # About rabbit (animal):
122
+ # Q: what animal is in the picture?
123
+ # Choices: rabbit, dog, cat, fish
124
+ # A: rabbit
125
+ # About plane (object):
126
+ # Q: is this a plane?
127
+ # Choices: yes, no
128
+ # A: yes
129
+ # About plane (object):
130
+ # Q: what type of vehicle is this?
131
+ # Choices: plane, car, motorcycle, bus
132
+ # A: plane
133
+ # About blue (color):
134
+ # Q: is the rabbit blue?
135
+ # Choices: yes, no
136
+ # A: yes
137
+ # About blue (color):
138
+ # Q: what color is the rabbit?
139
+ # Choices: blue, red, yellow, green
140
+ # A: blue
141
+ # About red (color):
142
+ # Q: is the plane red?
143
+ # Choices: yes, no
144
+ # A: yes
145
+ # About red (color):
146
+ # Q: what color is the plane?
147
+ # Choices: red, blue, yellow, green
148
+ # A: red
149
+ ```
150
+
151
+ # Use this LM under tifascore package
152
+
153
+ tifascore provides extra functions to parse this output etc. First install tifascore according to <https://github.com/Yushi-Hu/tifa>. Then the usage is below
154
+
155
+ ```python
156
+ from tifascore import get_llama2_pipeline, get_llama2_question_and_answers
157
+
158
+ pipeline = get_llama2_pipeline("tifa-benchmark/llama2_tifa_question_generation")
159
+
160
+ print(get_llama2_question_and_answers(pipeline, "a blue rabbit and a red plane"))
161
+
162
+ #### Expected output ###
163
+ # [{'caption': 'a blue rabbit and a red plane', 'element': 'rabbit', 'question': 'what animal is in the picture?', 'choices': ['rabbit', 'dog', 'cat', 'fish'], 'answer': 'rabbit', 'element_type': 'animal/human'}, {'caption': 'a blue rabbit and a red plane', 'element': 'plane', 'question': 'is this a plane?', 'choices': ['yes', 'no'], 'answer': 'yes', 'element_type': 'object'}, {'caption': 'a blue rabbit and a red plane', 'element': 'plane', 'question': 'what type of vehicle is this?', 'choices': ['plane', 'car', 'motorcycle', 'bus'], 'answer': 'plane', 'element_type': 'object'}, {'caption': 'a blue rabbit and a red plane', 'element': 'blue', 'question': 'is the rabbit blue?', 'choices': ['yes', 'no'], 'answer': 'yes', 'element_type': 'color'}, {'caption': 'a blue rabbit and a red plane', 'element': 'blue', 'question': 'what color is the rabbit?', 'choices': ['blue', 'red', 'yellow', 'green'], 'answer': 'blue', 'element_type': 'color'}, {'caption': 'a blue rabbit and a red plane', 'element': 'red', 'question': 'is the plane red?', 'choices': ['yes', 'no'], 'answer': 'yes', 'element_type': 'color'}, {'caption': 'a blue rabbit and a red plane', 'element': 'red', 'question': 'what color is the plane?', 'choices': ['red', 'blue', 'yellow', 'green'], 'answer': 'red', 'element_type': 'color'}]
164
+ ```
165
+
166
+ ## Bibtex
167
+ ```
168
+ @article{hu2023tifa,
169
+ title={Tifa: Accurate and interpretable text-to-image faithfulness evaluation with question answering},
170
+ author={Hu, Yushi and Liu, Benlin and Kasai, Jungo and Wang, Yizhong and Ostendorf, Mari and Krishna, Ranjay and Smith, Noah A},
171
+ journal={arXiv preprint arXiv:2303.11897},
172
+ year={2023}
173
+ }
174
+ ```
175
+