File size: 1,772 Bytes
c5e0833
 
 
 
321ed85
 
 
 
1688039
321ed85
 
 
 
 
041895e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
321ed85
041895e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
321ed85
041895e
 
 
 
 
 
 
c5e0833
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
language:
- tr
---

To build the model I used Resnet18 for image part and Turkish-DistillBert for text part.
Turkish-DistillBert: [dbmdz/distilbert-base-turkish-cased]

You can get more information (and code 🎉) on how to train or use the model on my [github].

[dbmdz/distilbert-base-turkish-cased]: https://huggingface.co/dbmdz/distilbert-base-turkish-cased

[github]: https://github.com/kesimeg/turkish-clip

# How to use the model?

In order to use the model use can use the class in model.py like the example below:

```Python
from model import Net
import torch
import torchvision
import torch.nn as nn
from torchvision import transforms
import torch.nn.functional as F
from PIL import Image
from transformers import AutoTokenizer, AutoModel

model = Net()
# If you use model on cpu you need the map_location part
model.load_state_dict(torch.load("clip_model.pt", map_location=torch.device('cpu')))
model.eval()

tokenizer = AutoTokenizer.from_pretrained("dbmdz/distilbert-base-turkish-cased")

transform=transforms.Compose(
        [
            transforms.Resize((224, 224)),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
],
)

def predict(img,text_vec):
  input = transform(img).unsqueeze(0)
  token_list = tokenizer(text_vec,padding = True)

  text = torch.Tensor(token_list["input_ids"]).long()
  mask = torch.Tensor(token_list["attention_mask"]).long()


  image_vec, text_vec = model(input, text , mask)
  print(F.softmax(torch.matmul(image_vec,text_vec.T),dim=1))

img = Image.open("dog.png") # A dog image

text_vec = ["Çimenler içinde bir köpek.","Bir köpek.","Çimenler içinde bir kuş."] # Descriptions
predict(img,text_vec) # Probabilities for each description

```