OpenGVLab
/

InternVL-Chat-V1-1

@@ -85,7 +85,12 @@ generation_config = dict(
 )
 question = "请详细描述图片"
-response = model.chat(tokenizer, pixel_values, question, generation_config)
 ```
 ## Examples
@@ -96,7 +101,7 @@ As you can see, although the Lynyrd Skynyrd in the image has some letters that a
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/-jQ8jCctx1VjkzVxzChQa.png)
-This model can also conduct in-depth analysis of AAAI's official website and identify important information in the web page.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/08W04RdT3PmzJuGwFU3--.png)
@@ -106,17 +111,17 @@ This model can also conduct in-depth analysis of AAAI's official website and ide
 \* Training set observed.
-| MathVista<br>(testmini) | MMB<br>(dev/test) | MMB−CN<br>(dev/test) | MMMU<br>(val/test)                                                                 | CMMMU<br>(val/test) | MMVP | MME            | POPE | Tiny LVLM | SEEDv1<br>(image) | LLaVA Wild | MM−Vet |
-| ----------------------- | ----------------- | -------------------- | ---------------------------------------------------------------------------------- | ------------------- | ---- | -------------- | ---- | --------- | ----------------- | ---------- | ------ |
-| 34.5                    | 76.7&nbsp;/&nbsp;75.4       | 71.9&nbsp;/&nbsp;70.3          | 39.1&nbsp;/&nbsp;35.3                                                                        | 34.8&nbsp;/&nbsp;34.0         | 44.7 | 1675.1&nbsp;/&nbsp;348.6 | 87.1 | 343.2     | 73.2              | 73.2       | 46.7   |
 **Image Captioning & Visual Question Answering**
 \* Training set observed.
-| COCO<br>(test) | Flickr30K<br>(test) | NoCaps<br>(val) | VQAv2<br>(testdev) | OKVQA<br>(val) | TextVQA<br>(val) | VizWiz<br>(val/test) | AI2D<br>(test) | GQA<br>(test) | ScienceQA<br>(image) |
-| -------------- | ------------------- | --------------- | ------------------ | -------------- | ---------------- | -------------------- | -------------- | ------------- | -------------------- |
-| 142.2\*        | 85.3                | 120.8           | 80.9\*             | 64.1\*         | 65.9             | 59.0&nbsp;/&nbsp;57.3          | 72.2\*         | 62.5\*        | 90.1\*               |
 - We found that incorrect images were used for training and testing in `AI2D`, meaning that for problems where `abcLabel` is True, `abc_images` were not utilized. We have now corrected the images used for testing, but the results may still be somewhat lower as a consequence.

 )
 question = "请详细描述图片"
+response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None)
+print(question, response)
+question = "请根据图片写一首诗"
+response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=history)
+print(question, response)
 ```
 ## Examples
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/-jQ8jCctx1VjkzVxzChQa.png)
+This model can also conduct an in-depth analysis of AAAI's official website and identify important information on the web page.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/08W04RdT3PmzJuGwFU3--.png)
 \* Training set observed.
+| MathVista<br>(testmini) | MMB<br>(dev/test)     | MMB−CN<br>(dev/test)  | MMMU<br>(val/test)     | CMMMU<br>(val/test)   | MMVP | MME                      | POPE | Tiny LVLM | SEEDv1<br>(image) | LLaVA Wild | MM−Vet |
+| ----------------------- | --------------------- | --------------------- | ---------------------- | --------------------- | ---- | ------------------------ | ---- | --------- | ----------------- | ---------- | ------ |
+| 34.5                    | 76.7&nbsp;/&nbsp;75.4 | 71.9&nbsp;/&nbsp;70.3 | 39.1&nbsp;/&nbsp;35.3  | 34.8&nbsp;/&nbsp;34.0 | 44.7 | 1675.1&nbsp;/&nbsp;348.6 | 87.1 | 343.2     | 73.2              | 73.2       | 46.7   |
 **Image Captioning & Visual Question Answering**
 \* Training set observed.
+| COCO<br>(test) | Flickr30K<br>(test) | NoCaps<br>(val) | VQAv2<br>(testdev) | OKVQA<br>(val) | TextVQA<br>(val) | VizWiz<br>(val/test)  | AI2D<br>(test) | GQA<br>(test) | ScienceQA<br>(image) |
+| -------------- | ------------------- | --------------- | ------------------ | -------------- | ---------------- | --------------------- | -------------- | ------------- | -------------------- |
+| 142.2\*        | 85.3                | 120.8           | 80.9\*             | 64.1\*         | 65.9             | 59.0&nbsp;/&nbsp;57.3 | 72.2\*         | 62.5\*        | 90.1\*               |
 - We found that incorrect images were used for training and testing in `AI2D`, meaning that for problems where `abcLabel` is True, `abc_images` were not utilized. We have now corrected the images used for testing, but the results may still be somewhat lower as a consequence.