uploaded readme

54a5fbd verified 3 months ago

11.1 kB

	Quantization made by Richard Erkhov.

	[Github](https://github.com/RichardErkhov)

	[Discord](https://discord.gg/pvy7H8DZMG)

	[Request more models](https://github.com/RichardErkhov/quant_request)


	llama2_tifa_question_generation - GGUF
	- Model creator: https://huggingface.co./tifa-benchmark/
	- Original model: https://huggingface.co./tifa-benchmark/llama2_tifa_question_generation/


	\| Name \| Quant method \| Size \|
	\| ---- \| ---- \| ---- \|
	\| [llama2_tifa_question_generation.Q2_K.gguf](https://huggingface.co./RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q2_K.gguf) \| Q2_K \| 2.36GB \|
	\| [llama2_tifa_question_generation.IQ3_XS.gguf](https://huggingface.co./RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.IQ3_XS.gguf) \| IQ3_XS \| 2.6GB \|
	\| [llama2_tifa_question_generation.IQ3_S.gguf](https://huggingface.co./RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.IQ3_S.gguf) \| IQ3_S \| 2.75GB \|
	\| [llama2_tifa_question_generation.Q3_K_S.gguf](https://huggingface.co./RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q3_K_S.gguf) \| Q3_K_S \| 2.75GB \|
	\| [llama2_tifa_question_generation.IQ3_M.gguf](https://huggingface.co./RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.IQ3_M.gguf) \| IQ3_M \| 2.9GB \|
	\| [llama2_tifa_question_generation.Q3_K.gguf](https://huggingface.co./RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q3_K.gguf) \| Q3_K \| 3.07GB \|
	\| [llama2_tifa_question_generation.Q3_K_M.gguf](https://huggingface.co./RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q3_K_M.gguf) \| Q3_K_M \| 3.07GB \|
	\| [llama2_tifa_question_generation.Q3_K_L.gguf](https://huggingface.co./RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q3_K_L.gguf) \| Q3_K_L \| 3.35GB \|
	\| [llama2_tifa_question_generation.IQ4_XS.gguf](https://huggingface.co./RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.IQ4_XS.gguf) \| IQ4_XS \| 3.4GB \|
	\| [llama2_tifa_question_generation.Q4_0.gguf](https://huggingface.co./RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q4_0.gguf) \| Q4_0 \| 3.56GB \|
	\| [llama2_tifa_question_generation.IQ4_NL.gguf](https://huggingface.co./RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.IQ4_NL.gguf) \| IQ4_NL \| 3.58GB \|
	\| [llama2_tifa_question_generation.Q4_K_S.gguf](https://huggingface.co./RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q4_K_S.gguf) \| Q4_K_S \| 3.59GB \|
	\| [llama2_tifa_question_generation.Q4_K.gguf](https://huggingface.co./RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q4_K.gguf) \| Q4_K \| 3.8GB \|
	\| [llama2_tifa_question_generation.Q4_K_M.gguf](https://huggingface.co./RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q4_K_M.gguf) \| Q4_K_M \| 3.8GB \|
	\| [llama2_tifa_question_generation.Q4_1.gguf](https://huggingface.co./RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q4_1.gguf) \| Q4_1 \| 3.95GB \|
	\| [llama2_tifa_question_generation.Q5_0.gguf](https://huggingface.co./RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q5_0.gguf) \| Q5_0 \| 4.33GB \|
	\| [llama2_tifa_question_generation.Q5_K_S.gguf](https://huggingface.co./RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q5_K_S.gguf) \| Q5_K_S \| 4.33GB \|
	\| [llama2_tifa_question_generation.Q5_K.gguf](https://huggingface.co./RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q5_K.gguf) \| Q5_K \| 4.45GB \|
	\| [llama2_tifa_question_generation.Q5_K_M.gguf](https://huggingface.co./RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q5_K_M.gguf) \| Q5_K_M \| 4.45GB \|
	\| [llama2_tifa_question_generation.Q5_1.gguf](https://huggingface.co./RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q5_1.gguf) \| Q5_1 \| 4.72GB \|
	\| [llama2_tifa_question_generation.Q6_K.gguf](https://huggingface.co./RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q6_K.gguf) \| Q6_K \| 5.15GB \|
	\| [llama2_tifa_question_generation.Q8_0.gguf](https://huggingface.co./RichardErkhov/tifa-benchmark_-_llama2_tifa_question_generation-gguf/blob/main/llama2_tifa_question_generation.Q8_0.gguf) \| Q8_0 \| 6.67GB \|




	Original model description:
	---
	license: apache-2.0
	inference: true
	widget:
	- text: "<s>[INST] <<SYS>>\nGiven an image description, generate one or two multiple-choice questions that verifies if the image description is correct.\nClassify each concept into a type (object, human, animal, food, activity, attribute, counting, color, material, spatial, location, shape, other), and then generate a question for each type.\n\n<</SYS>>\n\nDescription: a blue rabbit and a red plane [/INST] Entities:"
	pipeline_tag: text-generation
	tags:
	- text-generation-inference
	- llama2
	- text-to-image
	datasets:
	- TIFA
	language:
	- en
	---
	Project page: <https://tifa-benchmark.github.io/>

	This is the text parsing and question generation model for the ICCV 2023 paper [TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering](https://arxiv.org/abs/2303.11897)

	We introduce TIFA (Text-to-Image Faithfulness evaluation with question Answering), an automatic evaluation metric that measures the faithfulness of a generated image to its text input via visual question answering (VQA). Specifically, given a text input, we automatically generate several question-answer pairs using a language model. We calculate image faithfulness by checking whether existing VQA models can answer these questions using the generated image.

	Specifically, this fine-tuned LLaMA 2 model is the substitute for the GPT-3 model in the paper. It can parse an arbitrary prompt into visual entities, attributes, relations, etc. and generate question-answer tuples for each of them. See examples below.


	# QuickStart

	All codes are from <https://github.com/Yushi-Hu/tifa>. Clone this repo to easily use this model together with other modules (e.g. VQA) provided in TIFA.

	Please follow the prompt format, which will give the best performance.


	```python
	import torch
	import transformers

	# prepare the LLaMA 2 model
	model_name = "tifa-benchmark/llama2_tifa_question_generation"
	pipeline = transformers.pipeline(
	"text-generation",
	model=model_name,
	torch_dtype=torch.float16,
	device_map="auto",
	)


	# formating prompt following LLaMA 2 style
	def create_qg_prompt(caption):
	INTRO_BLURB = "Given an image description, generate one or two multiple-choice questions that verifies if the image description is correct.\nClassify each concept into a type (object, human, animal, food, activity, attribute, counting, color, material, spatial, location, shape, other), and then generate a question for each type.\n"
	formated_prompt = f"<s>[INST] <<SYS>>\n{INTRO_BLURB}\n<</SYS>>\n\n"
	formated_prompt += f"Description: {caption} [/INST] Entities:"
	return formated_prompt


	test_caption = "a blue rabbit and a red plane"

	# create prompt
	prompt = create_qg_prompt(text_caption)

	# text completion
	sequences = pipeline(
	prompt, do_sample=False, num_beams=5, num_return_sequences=1, max_length=512)
	output = sequences[0]['generated_text'][len(prompt):]
	output = output.split('\n\n')[0]

	# output
	print(output)

	#### Expected output ###
	# rabbit, plane
	# Activites:
	# Colors: blue, red
	# Counting:
	# Other attributes:
	# About rabbit (animal):
	# Q: is this a rabbit?
	# Choices: yes, no
	# A: yes
	# About rabbit (animal):
	# Q: what animal is in the picture?
	# Choices: rabbit, dog, cat, fish
	# A: rabbit
	# About plane (object):
	# Q: is this a plane?
	# Choices: yes, no
	# A: yes
	# About plane (object):
	# Q: what type of vehicle is this?
	# Choices: plane, car, motorcycle, bus
	# A: plane
	# About blue (color):
	# Q: is the rabbit blue?
	# Choices: yes, no
	# A: yes
	# About blue (color):
	# Q: what color is the rabbit?
	# Choices: blue, red, yellow, green
	# A: blue
	# About red (color):
	# Q: is the plane red?
	# Choices: yes, no
	# A: yes
	# About red (color):
	# Q: what color is the plane?
	# Choices: red, blue, yellow, green
	# A: red
	```

	# Use this LM under tifascore package

	tifascore provides extra functions to parse this output etc. First install tifascore according to <https://github.com/Yushi-Hu/tifa>. Then the usage is below

	```python
	from tifascore import get_llama2_pipeline, get_llama2_question_and_answers

	pipeline = get_llama2_pipeline("tifa-benchmark/llama2_tifa_question_generation")

	print(get_llama2_question_and_answers(pipeline, "a blue rabbit and a red plane"))

	#### Expected output ###
	# [{'caption': 'a blue rabbit and a red plane', 'element': 'rabbit', 'question': 'what animal is in the picture?', 'choices': ['rabbit', 'dog', 'cat', 'fish'], 'answer': 'rabbit', 'element_type': 'animal/human'}, {'caption': 'a blue rabbit and a red plane', 'element': 'plane', 'question': 'is this a plane?', 'choices': ['yes', 'no'], 'answer': 'yes', 'element_type': 'object'}, {'caption': 'a blue rabbit and a red plane', 'element': 'plane', 'question': 'what type of vehicle is this?', 'choices': ['plane', 'car', 'motorcycle', 'bus'], 'answer': 'plane', 'element_type': 'object'}, {'caption': 'a blue rabbit and a red plane', 'element': 'blue', 'question': 'is the rabbit blue?', 'choices': ['yes', 'no'], 'answer': 'yes', 'element_type': 'color'}, {'caption': 'a blue rabbit and a red plane', 'element': 'blue', 'question': 'what color is the rabbit?', 'choices': ['blue', 'red', 'yellow', 'green'], 'answer': 'blue', 'element_type': 'color'}, {'caption': 'a blue rabbit and a red plane', 'element': 'red', 'question': 'is the plane red?', 'choices': ['yes', 'no'], 'answer': 'yes', 'element_type': 'color'}, {'caption': 'a blue rabbit and a red plane', 'element': 'red', 'question': 'what color is the plane?', 'choices': ['red', 'blue', 'yellow', 'green'], 'answer': 'red', 'element_type': 'color'}]
	```

	## Bibtex
	```
	@article{hu2023tifa,
	title={Tifa: Accurate and interpretable text-to-image faithfulness evaluation with question answering},
	author={Hu, Yushi and Liu, Benlin and Kasai, Jungo and Wang, Yizhong and Ostendorf, Mari and Krishna, Ranjay and Smith, Noah A},
	journal={arXiv preprint arXiv:2303.11897},
	year={2023}
	}
	```