tifa-benchmark
/

llama2_tifa_question_generation

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

yushihu commited on Aug 22, 2023

Commit

64dae59

•

1 Parent(s): b21464e

Update README.md

Files changed (1) hide show

README.md +90 -0

README.md CHANGED Viewed

@@ -1,3 +1,93 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+inference: false
+pipeline_tag: text-generation
+tags:
+- text-generation-inference
+- llama2
+- text-to-image
+datasets:
+- TIFA
+language:
+- en
 ---
+This is the text parsing and question generation model for the ICCV 2023 paper [TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering](https://arxiv.org/abs/2303.11897)
+We introduce TIFA (Text-to-Image Faithfulness evaluation with question Answering), an automatic evaluation metric that measures the faithfulness of a generated image to its text input via visual question answering (VQA). Specifically, given a text input, we automatically generate several question-answer pairs using a language model. We calculate image faithfulness by checking whether existing VQA models can answer these questions using the generated image.
+Specifically, this fine-tuned LLaMA 2 model is the substitute for the GPT-3 model in the paper. It can parse an arbitrary prompt into visual entities, attributes, relations, etc. and generate question-answer tuples for each of them. See examples below.
+# QuickStart
+All codes are from <https://github.com/Yushi-Hu/tifa>. Clone this repo to easily use this model together with other modules (e.g. VQA) provided in TIFA.
+Please follow the prompt format, which will give the best performance.
+```python
+import torch
+import transformers
+# prepare the LLaMA 2 model
+model_name = "/gscratch/tial/yushihu/tifa-all/llama2/results/llama2/final_question_generation_checkpoint"
+pipeline = transformers.pipeline(
+    "text-generation",
+    model=model_name,
+    torch_dtype=torch.float16,
+    device_map="auto",
+)
+# prompt formatting
+test_caption = "a blue rabbit and a red plane"
+model = PromptCap("vqascore/promptcap-coco-vqa")  # also support OFA checkpoints. e.g. "OFA-Sys/ofa-large"
+if torch.cuda.is_available():
+  model.cuda()
+prompt = "please describe this image according to the given question: what piece of clothing is this boy putting on?"
+image = "glove_boy.jpeg"
+print(model.caption(prompt, image))
+```
+To try generic captioning, just use "what does the image describe?"
+```python
+prompt = "what does the image describe?"
+image = "glove_boy.jpeg"
+print(model.caption(prompt, image))
+```
+PromptCap also support taking OCR inputs:
+```python
+prompt = "please describe this image according to the given question: what year was this taken?"
+image = "dvds.jpg"
+ocr = "yip AE Mht juor 02/14/2012"
+print(model.caption(prompt, image, ocr))
+```
+## Bibtex
+```
+@article{hu2022promptcap,
+  title={PromptCap: Prompt-Guided Task-Aware Image Captioning},
+  author={Hu, Yushi and Hua, Hang and Yang, Zhengyuan and Shi, Weijia and Smith, Noah A and Luo, Jiebo},
+  journal={arXiv preprint arXiv:2211.09699},
+  year={2022}
+}
+```