File size: 2,443 Bytes
750f7de
feb5784
 
 
 
 
 
750f7de
feb5784
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
tags:
- image-to-text
- image-captioning
- endpoints-template
license: bsd-3-clause
library_name: generic
---

# Fork of [Salesforce/blip-image-captioning-large](https://huggingface.co/Salesforce/blip-image-captioning-large) for a `image-captioning` task on 🤗Inference endpoint.

This repository implements a `custom` task for `image-captioning` for 🤗 Inference Endpoints. The code for the customized pipeline is in the [pipeline.py](https://huggingface.co/florentgbelidji/blip_captioning/blob/main/pipeline.py).
To use deploy this model a an Inference Endpoint you have to select `Custom` as task to use the `pipeline.py` file. -> _double check if it is selected_
### expected Request payload
```json
{
  "image": "/9j/4AAQSkZJRgA.....", #encoded image
  "text": "a photography of a"
}
```
below is an example on how to run a request using Python and `requests`.
## Run Request 
1. Use any online  image. 
```bash
!wget https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg
```
2.run request

```python
import json
from typing import List
import requests as r
import base64

with open("/content/demo.jpg", "rb") as image_file:
    encoded_string = base64.b64encode(image_file.read()).decode()

ENDPOINT_URL = ""
HF_TOKEN = ""

def query(payload):
	response = requests.post(API_URL, headers=headers, json=payload)
	return response.json()


output = query({
    "inputs": {
        "images": [encoded_string],  # using the base64 encoded string
        "texts": ["a photography of"]  # Optional, based on your current class logic
    }
})
print(output)
```

Example parameters depending on the decoding strategy:

1. Beam search

``` 
        "parameters": {
                   "num_beams":5,
                   "max_length":20
        }
```

2. Nucleus sampling

``` 
        "parameters": {
                   "num_beams":1,
                   "max_length":20,
                   "do_sample": True,
                   "top_k":50,
                   "top_p":0.95
        }
```

3. Contrastive search

``` 
        "parameters": {
                   "penalty_alpha":0.6,
                   "top_k":4
                   "max_length":512
        }
```

See [generate()](https://huggingface.co/docs/transformers/v4.25.1/en/main_classes/text_generation#transformers.GenerationMixin.generate) doc for additional detail


expected output
```python
{'captions': ['a photography of a woman and her dog on the beach']}
```