kesimeg
/

turkish-clip

Model card Files Files and versions Community

turkish-clip / README.md

kesimeg's picture

Language info added

c5e0833 verified 4 months ago

|

1.77 kB

	---
	language:
	- tr
	---

	To build the model I used Resnet18 for image part and Turkish-DistillBert for text part.
	Turkish-DistillBert: [dbmdz/distilbert-base-turkish-cased]

	You can get more information (and code 🎉) on how to train or use the model on my [github].

	[dbmdz/distilbert-base-turkish-cased]: https://huggingface.co/dbmdz/distilbert-base-turkish-cased

	[github]: https://github.com/kesimeg/turkish-clip

	# How to use the model?

	In order to use the model use can use the class in model.py like the example below:

	```Python
	from model import Net
	import torch
	import torchvision
	import torch.nn as nn
	from torchvision import transforms
	import torch.nn.functional as F
	from PIL import Image
	from transformers import AutoTokenizer, AutoModel

	model = Net()
	# If you use model on cpu you need the map_location part
	model.load_state_dict(torch.load("clip_model.pt", map_location=torch.device('cpu')))
	model.eval()

	tokenizer = AutoTokenizer.from_pretrained("dbmdz/distilbert-base-turkish-cased")

	transform=transforms.Compose(
	[
	transforms.Resize((224, 224)),
	transforms.ToTensor(),
	transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
	],
	)

	def predict(img,text_vec):
	input = transform(img).unsqueeze(0)
	token_list = tokenizer(text_vec,padding = True)

	text = torch.Tensor(token_list["input_ids"]).long()
	mask = torch.Tensor(token_list["attention_mask"]).long()


	image_vec, text_vec = model(input, text , mask)
	print(F.softmax(torch.matmul(image_vec,text_vec.T),dim=1))

	img = Image.open("dog.png") # A dog image

	text_vec = ["Çimenler içinde bir köpek.","Bir köpek.","Çimenler içinde bir kuş."] # Descriptions
	predict(img,text_vec) # Probabilities for each description

	```