Update README.md
Browse files
README.md
CHANGED
@@ -85,7 +85,12 @@ generation_config = dict(
|
|
85 |
)
|
86 |
|
87 |
question = "请详细描述图片"
|
88 |
-
response = model.chat(tokenizer, pixel_values, question, generation_config)
|
|
|
|
|
|
|
|
|
|
|
89 |
```
|
90 |
|
91 |
## Examples
|
@@ -96,7 +101,7 @@ As you can see, although the Lynyrd Skynyrd in the image has some letters that a
|
|
96 |
|
97 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/-jQ8jCctx1VjkzVxzChQa.png)
|
98 |
|
99 |
-
This model can also conduct in-depth analysis of AAAI's official website and identify important information
|
100 |
|
101 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/08W04RdT3PmzJuGwFU3--.png)
|
102 |
|
@@ -106,17 +111,17 @@ This model can also conduct in-depth analysis of AAAI's official website and ide
|
|
106 |
|
107 |
\* Training set observed.
|
108 |
|
109 |
-
| MathVista<br>(testmini) | MMB<br>(dev/test)
|
110 |
-
| ----------------------- |
|
111 |
-
| 34.5 | 76.7 / 75.4
|
112 |
|
113 |
**Image Captioning & Visual Question Answering**
|
114 |
|
115 |
\* Training set observed.
|
116 |
|
117 |
-
| COCO<br>(test) | Flickr30K<br>(test) | NoCaps<br>(val) | VQAv2<br>(testdev) | OKVQA<br>(val) | TextVQA<br>(val) | VizWiz<br>(val/test)
|
118 |
-
| -------------- | ------------------- | --------------- | ------------------ | -------------- | ---------------- |
|
119 |
-
| 142.2\* | 85.3 | 120.8 | 80.9\* | 64.1\* | 65.9 | 59.0 / 57.3
|
120 |
|
121 |
- We found that incorrect images were used for training and testing in `AI2D`, meaning that for problems where `abcLabel` is True, `abc_images` were not utilized. We have now corrected the images used for testing, but the results may still be somewhat lower as a consequence.
|
122 |
|
|
|
85 |
)
|
86 |
|
87 |
question = "请详细描述图片"
|
88 |
+
response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None)
|
89 |
+
print(question, response)
|
90 |
+
|
91 |
+
question = "请根据图片写一首诗"
|
92 |
+
response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=history)
|
93 |
+
print(question, response)
|
94 |
```
|
95 |
|
96 |
## Examples
|
|
|
101 |
|
102 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/-jQ8jCctx1VjkzVxzChQa.png)
|
103 |
|
104 |
+
This model can also conduct an in-depth analysis of AAAI's official website and identify important information on the web page.
|
105 |
|
106 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/08W04RdT3PmzJuGwFU3--.png)
|
107 |
|
|
|
111 |
|
112 |
\* Training set observed.
|
113 |
|
114 |
+
| MathVista<br>(testmini) | MMB<br>(dev/test) | MMB−CN<br>(dev/test) | MMMU<br>(val/test) | CMMMU<br>(val/test) | MMVP | MME | POPE | Tiny LVLM | SEEDv1<br>(image) | LLaVA Wild | MM−Vet |
|
115 |
+
| ----------------------- | --------------------- | --------------------- | ---------------------- | --------------------- | ---- | ------------------------ | ---- | --------- | ----------------- | ---------- | ------ |
|
116 |
+
| 34.5 | 76.7 / 75.4 | 71.9 / 70.3 | 39.1 / 35.3 | 34.8 / 34.0 | 44.7 | 1675.1 / 348.6 | 87.1 | 343.2 | 73.2 | 73.2 | 46.7 |
|
117 |
|
118 |
**Image Captioning & Visual Question Answering**
|
119 |
|
120 |
\* Training set observed.
|
121 |
|
122 |
+
| COCO<br>(test) | Flickr30K<br>(test) | NoCaps<br>(val) | VQAv2<br>(testdev) | OKVQA<br>(val) | TextVQA<br>(val) | VizWiz<br>(val/test) | AI2D<br>(test) | GQA<br>(test) | ScienceQA<br>(image) |
|
123 |
+
| -------------- | ------------------- | --------------- | ------------------ | -------------- | ---------------- | --------------------- | -------------- | ------------- | -------------------- |
|
124 |
+
| 142.2\* | 85.3 | 120.8 | 80.9\* | 64.1\* | 65.9 | 59.0 / 57.3 | 72.2\* | 62.5\* | 90.1\* |
|
125 |
|
126 |
- We found that incorrect images were used for training and testing in `AI2D`, meaning that for problems where `abcLabel` is True, `abc_images` were not utilized. We have now corrected the images used for testing, but the results may still be somewhat lower as a consequence.
|
127 |
|