Konthee commited on
Commit
30359ae
·
verified ·
1 Parent(s): bd07a27

Update README.md

Browse files

## Usage:

- How to load the text encoder (with CLIP's image decoder)
- How to use the model

## Eample of model usage in these tasks:

- Zero shot classification

- Text-Image retrieval

## Citation (todo: discuss with Boat)

Files changed (1) hide show
  1. README.md +127 -7
README.md CHANGED
@@ -8,15 +8,14 @@ tags:
8
  <br>
9
 
10
  ## How to use
 
11
  ```python
12
- from transformers import AutoModel,AutoProcessor
13
- model = AutoModel.from_pretrained("openthaigpt/CLIPTextCamembertModelWithProjection-contrastive", trust_remote_code=True)
14
- processor = AutoProcessor.from_pretrained("openthaigpt/CLIPTextCamembertModelWithProjection-contrastive", trust_remote_
15
- ```
16
 
17
- ### Preprocessing
 
18
 
19
- Texts are preprocessed with the following rules:
20
 
21
  - Replace HTML forms of characters with the actual characters such asnbsp;with a space and \\\\\\\\\\\\\\\\<br /> with a line break [[Howard and Ruder, 2018]](https://arxiv.org/abs/1801.06146).
22
  - Remove empty brackets ((), {}, and []) than sometimes come up as a result of text extraction such as from Wikipedia.
@@ -26,5 +25,126 @@ Texts are preprocessed with the following rules:
26
  - Word-level tokenization using [[Phatthiyaphaibun et al., 2020]](https://zenodo.org/record/4319685#.YA4xEGQzaDU) ’s `newmm` dictionary-based maximal matching tokenizer.
27
  - Replace repetitive words; this is done post-tokenization unlike [[Howard and Ruder, 2018]](https://arxiv.org/abs/1801.06146). since there is no delimitation by space in Thai as in English.
28
  - Replace spaces with <\\\\\\\\\\\\\\\\_>. The SentencePiece tokenizer combines the spaces with other tokens. Since spaces serve as punctuation in Thai such as sentence boundaries similar to periods in English, combining it with other tokens will omit an important feature for tasks such as word tokenization and sentence breaking. Therefore, we opt to explicitly mark spaces with <\\\\\\\\\\\\\\\\_>.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
- <br>
 
 
8
  <br>
9
 
10
  ## How to use
11
+ - #### Install python package
12
  ```python
13
+ pip thai2transformers==0.1.2
 
 
 
14
 
15
+ ```
16
+ - ### Preprocessing
17
 
18
+ Texts are preprocessed with the following rules: [process_transformers](https://github.com/vistec-AI/thai2transformers/blob/master/thai2transformers/preprocess.py)
19
 
20
  - Replace HTML forms of characters with the actual characters such asnbsp;with a space and \\\\\\\\\\\\\\\\<br /> with a line break [[Howard and Ruder, 2018]](https://arxiv.org/abs/1801.06146).
21
  - Remove empty brackets ((), {}, and []) than sometimes come up as a result of text extraction such as from Wikipedia.
 
25
  - Word-level tokenization using [[Phatthiyaphaibun et al., 2020]](https://zenodo.org/record/4319685#.YA4xEGQzaDU) ’s `newmm` dictionary-based maximal matching tokenizer.
26
  - Replace repetitive words; this is done post-tokenization unlike [[Howard and Ruder, 2018]](https://arxiv.org/abs/1801.06146). since there is no delimitation by space in Thai as in English.
27
  - Replace spaces with <\\\\\\\\\\\\\\\\_>. The SentencePiece tokenizer combines the spaces with other tokens. Since spaces serve as punctuation in Thai such as sentence boundaries similar to periods in English, combining it with other tokens will omit an important feature for tasks such as word tokenization and sentence breaking. Therefore, we opt to explicitly mark spaces with <\\\\\\\\\\\\\\\\_>.
28
+ <br>
29
+
30
+ - #### How to load the text encoder
31
+
32
+ ```python
33
+ from transformers import AutoModel,AutoProcessor
34
+ from thai2transformers.preprocess import process_transformers
35
+ model = AutoModel.from_pretrained("openthaigpt/CLIPTextCamembertModelWithProjection-contrastive", trust_remote_code=True)
36
+ processor = AutoProcessor.from_pretrained("openthaigpt/CLIPTextCamembertModelWithProjection-contrastive", trust_remote_code=True)
37
+
38
+ input_text = ["This is dog",
39
+ "how are you today",
40
+ "สวัสดีครับ วันนี้อากาศร้อนมาก"]
41
+ processed_input_text = [process_transformers(input_text_) for input_text_ in input_text ]
42
+
43
+ text_tokens = processor(text=processed_input_text, padding=True, return_tensors="pt")
44
+ embedding = model(**text_tokens).text_embeds
45
+
46
+ print(embedding,embedding.shape)
47
+ ```
48
+ - #### Output:
49
+ ```python
50
+ tensor([[ 0.0318, 0.0341, -0.1317, ..., -0.2763, -0.2103, 0.0968],
51
+ [ 0.0579, -0.1373, -0.0293, ..., -0.3926, -0.2002, -0.0497],
52
+ [ 0.0303, 0.0440, 0.0217, ..., -0.3282, -0.0100, -0.0757]],
53
+ grad_fn=<MmBackward0>) torch.Size([3, 512])
54
+ ```
55
+
56
+ ## Eample of model usage
57
+
58
+ - ### Zero shot classification
59
+
60
+ ```python
61
+ from torch import FloatTensor, IntTensor, Tensor
62
+ from transformers import AutoModel, AutoProcessor, CLIPModel
63
+
64
+ # Load image model and processor.
65
+ image_processor = AutoProcessor.from_pretrained("openai/clip-vit-base-patch32")
66
+ image_model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32").to(device)
67
+
68
+ # Load text model and processor.
69
+ text_processor = AutoProcessor.from_pretrained("openthaigpt/CLIPTextCamembertModelWithProjection-contrastive", trust_remote_code=True)
70
+ text_model = AutoModel.from_pretrained("openthaigpt/CLIPTextCamembertModelWithProjection-contrastive", trust_remote_code=True).to(device)
71
+
72
+ class_labels = ['แมว','หมา', 'นก']
73
+ label2id = {label: i for i, label in enumerate(class_labels)}
74
+
75
+ inputs = text_processor(text=class_labels, padding=True, return_tensors="pt")
76
+ inputs = {name: tensor.to(self.device) for name, tensor in inputs.items()}
77
+ text_embeddings = self.text_model(**inputs).text_embeds
78
+ text_embeddings /= text_embeddings.norm(dim=1, keepdim=True)
79
+
80
+ inputs = image_processor(images=images, return_tensors="pt")
81
+ inputs = {name: tensor.to(self.device) for name, tensor in inputs.items()}
82
+ image_embeddings = self.image_model.get_image_features(**inputs)
83
+ image_embeddings /= image_embeddings.norm(dim=1, keepdim=True)
84
+
85
+ similarities = torch.mm(image_embeddings, text_embeddings.t())
86
+ logits = F.softmax(similarities, dim=1)
87
+ indices = torch.argmax(logits, dim=1)
88
+
89
+ logits = logits.detach().cpu()
90
+ indices = indices.detach().cpu()
91
+
92
+ predict= [class_labels[i] for i in indices ]
93
+
94
+ ```
95
+
96
+ - ### Text-Image retrieval
97
+
98
+ ```python
99
+ import faiss
100
+ from torch import FloatTensor, IntTensor, Tensor
101
+ from transformers import AutoModel, AutoProcessor, CLIPModel
102
+
103
+ # Load image model and processor.
104
+ image_processor = AutoProcessor.from_pretrained("openai/clip-vit-base-patch32")
105
+ image_model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32").to(device)
106
+
107
+ # Load text model and processor.
108
+ text_processor = AutoProcessor.from_pretrained("openthaigpt/CLIPTextCamembertModelWithProjection-contrastive", trust_remote_code=True)
109
+ text_model = AutoModel.from_pretrained("openthaigpt/CLIPTextCamembertModelWithProjection-contrastive", trust_remote_code=True).to(device)
110
+
111
+ text_input = ['แมวสีส้ม','หมาสีดำ', 'นกสีขาว']
112
+ processed_input_text = [process_transformers(input_text_) for input_text_ in input_text ]
113
+
114
+
115
+ inputs = text_processor(text=processed_input_text, padding=True, return_tensors="pt")
116
+ inputs = {name: tensor.to(self.device) for name, tensor in inputs.items()}
117
+ text_embeddings = self.text_model(**inputs).text_embeds
118
+ text_embeddings /= text_embeddings.norm(dim=1, keepdim=True)
119
+
120
+ inputs = image_processor(images=images, return_tensors="pt")
121
+ inputs = {name: tensor.to(self.device) for name, tensor in inputs.items()}
122
+ image_embeddings = self.image_model.get_image_features(**inputs)
123
+ image_embeddings /= image_embeddings.norm(dim=1, keepdim=True)
124
+
125
+
126
+ n = text_embeddings.shape[1]
127
+
128
+ text_index = faiss.IndexFlatIP(n)
129
+ image_index = faiss.IndexFlatIP(n)
130
+ text_index.add(text_embeddings)
131
+ image_index.add(image_embeddings)
132
+
133
+
134
+ # Get_image_search_recall_at_k
135
+ distances, retrieved_indices = image_index.search(text_embeddings, k=5)
136
+ recall_image_search = sum(1.0 if i in indices else 0.0
137
+ for i, indices in zip(range(n), retrieved_indices)
138
+ ) / float(n)
139
+
140
+ # Get_text_search_recall_at_k
141
+ distances, retrieved_indices = text_index.search(image_embeddings, k=5)
142
+ recall_text_search = sum(1.0 if i in indices else 0.0
143
+ for i, indices in zip(range(n), retrieved_indices)
144
+ ) / float(n)
145
+
146
+ ```
147
+
148
 
149
+ ### Authors
150
+ * Konthee Boonmeeprakob ([email protected])