|
--- |
|
language: |
|
- tr |
|
--- |
|
|
|
To build the model I used Resnet18 for image part and Turkish-DistillBert for text part. |
|
Turkish-DistillBert: [dbmdz/distilbert-base-turkish-cased] |
|
|
|
You can get more information (and code 🎉) on how to train or use the model on my [github]. |
|
|
|
[dbmdz/distilbert-base-turkish-cased]: https://huggingface.co/dbmdz/distilbert-base-turkish-cased |
|
|
|
[github]: https://github.com/kesimeg/turkish-clip |
|
|
|
# How to use the model? |
|
|
|
In order to use the model use can use the class in model.py like the example below: |
|
|
|
```Python |
|
from model import Net |
|
import torch |
|
import torchvision |
|
import torch.nn as nn |
|
from torchvision import transforms |
|
import torch.nn.functional as F |
|
from PIL import Image |
|
from transformers import AutoTokenizer, AutoModel |
|
|
|
model = Net() |
|
# If you use model on cpu you need the map_location part |
|
model.load_state_dict(torch.load("clip_model.pt", map_location=torch.device('cpu'))) |
|
model.eval() |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("dbmdz/distilbert-base-turkish-cased") |
|
|
|
transform=transforms.Compose( |
|
[ |
|
transforms.Resize((224, 224)), |
|
transforms.ToTensor(), |
|
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), |
|
], |
|
) |
|
|
|
def predict(img,text_vec): |
|
input = transform(img).unsqueeze(0) |
|
token_list = tokenizer(text_vec,padding = True) |
|
|
|
text = torch.Tensor(token_list["input_ids"]).long() |
|
mask = torch.Tensor(token_list["attention_mask"]).long() |
|
|
|
|
|
image_vec, text_vec = model(input, text , mask) |
|
print(F.softmax(torch.matmul(image_vec,text_vec.T),dim=1)) |
|
|
|
img = Image.open("dog.png") # A dog image |
|
|
|
text_vec = ["Çimenler içinde bir köpek.","Bir köpek.","Çimenler içinde bir kuş."] # Descriptions |
|
predict(img,text_vec) # Probabilities for each description |
|
|
|
``` |