File size: 1,564 Bytes
8ad8161
 
 
 
 
9282d8a
8ad8161
 
9282d8a
 
 
 
3788991
 
8ad8161
0e8d993
0e4e908
3ef4592
3788991
3ef4592
3788991
3ef4592
3788991
a67eb24
 
 
 
 
 
 
 
 
 
 
 
 
3788991
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
language:
- en
base_model:
- Salesforce/blip-image-captioning-base
pipeline_tag: image-to-text
tags:
- art
license: apache-2.0
metrics:
- bleu
library_name: transformers
datasets:
- phiyodr/coco2017
---
### Fine-Tuned Image Captioning Model

This is a fine-tuned version of BLIP for visual answering on images. This model is finetuned on Stanford Online Products Dataset comprising of 120k product images from online retail platform. The dataset is enriched with answers from LLMs and used to fine-tune the model.

This experimental model can be used for answering questions on product images in retail industry. Product meta data enrichment, Validation of human generated product description are some of the examples sue case.

Examples: (place images here)

           Input Image                                                                                                      |      Model Output
___________________________________________________________________________________________________________________________________________________________________________
           

![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/672d17c98e098bf429c83670/wstIrphXfPqDNTC84x9IB.jpeg)         chips nachos



![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/672d17c98e098bf429c83670/-Z87gp9zWg2FiLTUCu8Ir.jpeg)      a man in a suit walking across a crosswalk



![image/png](https://cdn-uploads.huggingface.co/production/uploads/672d17c98e098bf429c83670/YcSs_CFcRj-Tb4woXIArC.png)         bush ' s best white beans