AruniAnkur commited on
Commit
4c4d5fa
·
verified ·
1 Parent(s): af44f50

added model of fine tuning

Browse files
Files changed (2) hide show
  1. distilbert_finetuing.ipynb +1184 -0
  2. t5_training.ipynb +269 -0
distilbert_finetuing.ipynb ADDED
@@ -0,0 +1,1184 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": null,
6
+ "metadata": {},
7
+ "outputs": [],
8
+ "source": [
9
+ "#!pip install \"modin[all]\" # Install Ray and Dask\n",
10
+ "# !pip install pytorch \n",
11
+ "# !pip install intel-extension-for-pytorch\n",
12
+ "# !pip install transformers\n",
13
+ "# !pip install datasets"
14
+ ]
15
+ },
16
+ {
17
+ "cell_type": "code",
18
+ "execution_count": 21,
19
+ "metadata": {},
20
+ "outputs": [
21
+ {
22
+ "data": {
23
+ "text/html": [
24
+ "<div>\n",
25
+ "<style scoped>\n",
26
+ " .dataframe tbody tr th:only-of-type {\n",
27
+ " vertical-align: middle;\n",
28
+ " }\n",
29
+ "\n",
30
+ " .dataframe tbody tr th {\n",
31
+ " vertical-align: top;\n",
32
+ " }\n",
33
+ "\n",
34
+ " .dataframe thead th {\n",
35
+ " text-align: right;\n",
36
+ " }\n",
37
+ "</style>\n",
38
+ "<table border=\"1\" class=\"dataframe\">\n",
39
+ " <thead>\n",
40
+ " <tr style=\"text-align: right;\">\n",
41
+ " <th></th>\n",
42
+ " <th>Questions</th>\n",
43
+ " <th>Category</th>\n",
44
+ " </tr>\n",
45
+ " </thead>\n",
46
+ " <tbody>\n",
47
+ " <tr>\n",
48
+ " <th>0</th>\n",
49
+ " <td>About what proportion of the population of the...</td>\n",
50
+ " <td>BT1</td>\n",
51
+ " </tr>\n",
52
+ " <tr>\n",
53
+ " <th>1</th>\n",
54
+ " <td>Correctly label the brain lobes indicated on t...</td>\n",
55
+ " <td>BT1</td>\n",
56
+ " </tr>\n",
57
+ " <tr>\n",
58
+ " <th>2</th>\n",
59
+ " <td>Define compound interest.</td>\n",
60
+ " <td>BT1</td>\n",
61
+ " </tr>\n",
62
+ " <tr>\n",
63
+ " <th>3</th>\n",
64
+ " <td>Define four types of traceability</td>\n",
65
+ " <td>BT1</td>\n",
66
+ " </tr>\n",
67
+ " <tr>\n",
68
+ " <th>4</th>\n",
69
+ " <td>Define mercantilism.</td>\n",
70
+ " <td>BT1</td>\n",
71
+ " </tr>\n",
72
+ " <tr>\n",
73
+ " <th>...</th>\n",
74
+ " <td>...</td>\n",
75
+ " <td>...</td>\n",
76
+ " </tr>\n",
77
+ " <tr>\n",
78
+ " <th>8762</th>\n",
79
+ " <td>Distinguish between different types of soil st...</td>\n",
80
+ " <td>BT4</td>\n",
81
+ " </tr>\n",
82
+ " <tr>\n",
83
+ " <th>8763</th>\n",
84
+ " <td>Invent a blockchain-based solution for transpa...</td>\n",
85
+ " <td>BT6</td>\n",
86
+ " </tr>\n",
87
+ " <tr>\n",
88
+ " <th>8764</th>\n",
89
+ " <td>Compare the advantages and disadvantages of us...</td>\n",
90
+ " <td>BT4</td>\n",
91
+ " </tr>\n",
92
+ " <tr>\n",
93
+ " <th>8765</th>\n",
94
+ " <td>Describe the purpose of the \"volatile\" keyword...</td>\n",
95
+ " <td>BT1</td>\n",
96
+ " </tr>\n",
97
+ " <tr>\n",
98
+ " <th>8766</th>\n",
99
+ " <td>Explain the concept of noise in communication ...</td>\n",
100
+ " <td>BT2</td>\n",
101
+ " </tr>\n",
102
+ " </tbody>\n",
103
+ "</table>\n",
104
+ "<p>8767 rows × 2 columns</p>\n",
105
+ "</div>"
106
+ ],
107
+ "text/plain": [
108
+ " Questions Category\n",
109
+ "0 About what proportion of the population of the... BT1\n",
110
+ "1 Correctly label the brain lobes indicated on t... BT1\n",
111
+ "2 Define compound interest. BT1\n",
112
+ "3 Define four types of traceability BT1\n",
113
+ "4 Define mercantilism. BT1\n",
114
+ "... ... ...\n",
115
+ "8762 Distinguish between different types of soil st... BT4\n",
116
+ "8763 Invent a blockchain-based solution for transpa... BT6\n",
117
+ "8764 Compare the advantages and disadvantages of us... BT4\n",
118
+ "8765 Describe the purpose of the \"volatile\" keyword... BT1\n",
119
+ "8766 Explain the concept of noise in communication ... BT2\n",
120
+ "\n",
121
+ "[8767 rows x 2 columns]"
122
+ ]
123
+ },
124
+ "execution_count": 21,
125
+ "metadata": {},
126
+ "output_type": "execute_result"
127
+ }
128
+ ],
129
+ "source": [
130
+ "import modin.pandas as pd\n",
131
+ "df = pd.read_csv('blooms_taxonomy_dataset.csv')\n",
132
+ "df"
133
+ ]
134
+ },
135
+ {
136
+ "cell_type": "code",
137
+ "execution_count": 22,
138
+ "metadata": {},
139
+ "outputs": [],
140
+ "source": [
141
+ "mapping = {\"BT1\": 0, \"BT2\": 1, \"BT3\": 2, \"BT4\": 3, \"BT5\": 4, \"BT6\": 5}\n",
142
+ "df[\"Category\"] = df[\"Category\"].map(mapping)"
143
+ ]
144
+ },
145
+ {
146
+ "cell_type": "code",
147
+ "execution_count": 23,
148
+ "metadata": {},
149
+ "outputs": [
150
+ {
151
+ "data": {
152
+ "text/html": [
153
+ "<div>\n",
154
+ "<style scoped>\n",
155
+ " .dataframe tbody tr th:only-of-type {\n",
156
+ " vertical-align: middle;\n",
157
+ " }\n",
158
+ "\n",
159
+ " .dataframe tbody tr th {\n",
160
+ " vertical-align: top;\n",
161
+ " }\n",
162
+ "\n",
163
+ " .dataframe thead th {\n",
164
+ " text-align: right;\n",
165
+ " }\n",
166
+ "</style>\n",
167
+ "<table border=\"1\" class=\"dataframe\">\n",
168
+ " <thead>\n",
169
+ " <tr style=\"text-align: right;\">\n",
170
+ " <th></th>\n",
171
+ " <th>Questions</th>\n",
172
+ " <th>Category</th>\n",
173
+ " </tr>\n",
174
+ " </thead>\n",
175
+ " <tbody>\n",
176
+ " <tr>\n",
177
+ " <th>0</th>\n",
178
+ " <td>About what proportion of the population of the...</td>\n",
179
+ " <td>0</td>\n",
180
+ " </tr>\n",
181
+ " <tr>\n",
182
+ " <th>1</th>\n",
183
+ " <td>Correctly label the brain lobes indicated on t...</td>\n",
184
+ " <td>0</td>\n",
185
+ " </tr>\n",
186
+ " <tr>\n",
187
+ " <th>2</th>\n",
188
+ " <td>Define compound interest.</td>\n",
189
+ " <td>0</td>\n",
190
+ " </tr>\n",
191
+ " <tr>\n",
192
+ " <th>3</th>\n",
193
+ " <td>Define four types of traceability</td>\n",
194
+ " <td>0</td>\n",
195
+ " </tr>\n",
196
+ " <tr>\n",
197
+ " <th>4</th>\n",
198
+ " <td>Define mercantilism.</td>\n",
199
+ " <td>0</td>\n",
200
+ " </tr>\n",
201
+ " <tr>\n",
202
+ " <th>...</th>\n",
203
+ " <td>...</td>\n",
204
+ " <td>...</td>\n",
205
+ " </tr>\n",
206
+ " <tr>\n",
207
+ " <th>8762</th>\n",
208
+ " <td>Distinguish between different types of soil st...</td>\n",
209
+ " <td>3</td>\n",
210
+ " </tr>\n",
211
+ " <tr>\n",
212
+ " <th>8763</th>\n",
213
+ " <td>Invent a blockchain-based solution for transpa...</td>\n",
214
+ " <td>5</td>\n",
215
+ " </tr>\n",
216
+ " <tr>\n",
217
+ " <th>8764</th>\n",
218
+ " <td>Compare the advantages and disadvantages of us...</td>\n",
219
+ " <td>3</td>\n",
220
+ " </tr>\n",
221
+ " <tr>\n",
222
+ " <th>8765</th>\n",
223
+ " <td>Describe the purpose of the \"volatile\" keyword...</td>\n",
224
+ " <td>0</td>\n",
225
+ " </tr>\n",
226
+ " <tr>\n",
227
+ " <th>8766</th>\n",
228
+ " <td>Explain the concept of noise in communication ...</td>\n",
229
+ " <td>1</td>\n",
230
+ " </tr>\n",
231
+ " </tbody>\n",
232
+ "</table>\n",
233
+ "<p>8767 rows × 2 columns</p>\n",
234
+ "</div>"
235
+ ],
236
+ "text/plain": [
237
+ " Questions Category\n",
238
+ "0 About what proportion of the population of the... 0\n",
239
+ "1 Correctly label the brain lobes indicated on t... 0\n",
240
+ "2 Define compound interest. 0\n",
241
+ "3 Define four types of traceability 0\n",
242
+ "4 Define mercantilism. 0\n",
243
+ "... ... ...\n",
244
+ "8762 Distinguish between different types of soil st... 3\n",
245
+ "8763 Invent a blockchain-based solution for transpa... 5\n",
246
+ "8764 Compare the advantages and disadvantages of us... 3\n",
247
+ "8765 Describe the purpose of the \"volatile\" keyword... 0\n",
248
+ "8766 Explain the concept of noise in communication ... 1\n",
249
+ "\n",
250
+ "[8767 rows x 2 columns]"
251
+ ]
252
+ },
253
+ "execution_count": 23,
254
+ "metadata": {},
255
+ "output_type": "execute_result"
256
+ }
257
+ ],
258
+ "source": [
259
+ "df"
260
+ ]
261
+ },
262
+ {
263
+ "cell_type": "code",
264
+ "execution_count": 24,
265
+ "metadata": {},
266
+ "outputs": [
267
+ {
268
+ "name": "stderr",
269
+ "output_type": "stream",
270
+ "text": [
271
+ "/opt/anaconda3/envs/pytorch_env/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884\n",
272
+ " warnings.warn(\n"
273
+ ]
274
+ },
275
+ {
276
+ "data": {
277
+ "text/plain": [
278
+ "{'input_ids': tensor([[ 101, 2055, 2054, ..., 0, 0, 0],\n",
279
+ " [ 101, 11178, 3830, ..., 0, 0, 0],\n",
280
+ " [ 101, 9375, 7328, ..., 0, 0, 0],\n",
281
+ " ...,\n",
282
+ " [ 101, 12826, 1996, ..., 0, 0, 0],\n",
283
+ " [ 101, 6235, 1996, ..., 0, 0, 0],\n",
284
+ " [ 101, 4863, 1996, ..., 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, ..., 0, 0, 0],\n",
285
+ " [1, 1, 1, ..., 0, 0, 0],\n",
286
+ " [1, 1, 1, ..., 0, 0, 0],\n",
287
+ " ...,\n",
288
+ " [1, 1, 1, ..., 0, 0, 0],\n",
289
+ " [1, 1, 1, ..., 0, 0, 0],\n",
290
+ " [1, 1, 1, ..., 0, 0, 0]])}"
291
+ ]
292
+ },
293
+ "execution_count": 24,
294
+ "metadata": {},
295
+ "output_type": "execute_result"
296
+ }
297
+ ],
298
+ "source": [
299
+ "from transformers import DistilBertTokenizer\n",
300
+ "import torch\n",
301
+ "\n",
302
+ "# Load the DistilBERT tokenizer\n",
303
+ "tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')\n",
304
+ "\n",
305
+ "# Tokenize the 'Questions' column\n",
306
+ "inputs = tokenizer(list(df['Questions']), padding=True, truncation=True, return_tensors='pt', max_length=2048)\n",
307
+ "inputs"
308
+ ]
309
+ },
310
+ {
311
+ "cell_type": "code",
312
+ "execution_count": 25,
313
+ "metadata": {},
314
+ "outputs": [
315
+ {
316
+ "data": {
317
+ "text/plain": [
318
+ "torch.Size([8767, 123])"
319
+ ]
320
+ },
321
+ "execution_count": 25,
322
+ "metadata": {},
323
+ "output_type": "execute_result"
324
+ }
325
+ ],
326
+ "source": [
327
+ "inputs['input_ids'].size()"
328
+ ]
329
+ },
330
+ {
331
+ "cell_type": "code",
332
+ "execution_count": 26,
333
+ "metadata": {},
334
+ "outputs": [
335
+ {
336
+ "data": {
337
+ "text/plain": [
338
+ "tensor([0, 0, 0, ..., 3, 0, 1])"
339
+ ]
340
+ },
341
+ "execution_count": 26,
342
+ "metadata": {},
343
+ "output_type": "execute_result"
344
+ }
345
+ ],
346
+ "source": [
347
+ "labels = torch.tensor(df['Category'].values)\n",
348
+ "labels"
349
+ ]
350
+ },
351
+ {
352
+ "cell_type": "code",
353
+ "execution_count": 27,
354
+ "metadata": {},
355
+ "outputs": [
356
+ {
357
+ "name": "stderr",
358
+ "output_type": "stream",
359
+ "text": [
360
+ "Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']\n",
361
+ "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n"
362
+ ]
363
+ }
364
+ ],
365
+ "source": [
366
+ "from transformers import DistilBertForSequenceClassification\n",
367
+ "\n",
368
+ "# Load the model with a classification head\n",
369
+ "model = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased', num_labels=6) # 6 classes: 0 to 5\n"
370
+ ]
371
+ },
372
+ {
373
+ "cell_type": "code",
374
+ "execution_count": 28,
375
+ "metadata": {},
376
+ "outputs": [],
377
+ "source": [
378
+ "from sklearn.model_selection import train_test_split\n",
379
+ "\n",
380
+ "# Split the data into training and validation sets\n",
381
+ "train_inputs, val_inputs, train_labels, val_labels = train_test_split(inputs['input_ids'], labels, test_size=0.2, random_state=42)\n"
382
+ ]
383
+ },
384
+ {
385
+ "cell_type": "code",
386
+ "execution_count": 29,
387
+ "metadata": {},
388
+ "outputs": [],
389
+ "source": [
390
+ "from torch.utils.data import DataLoader, TensorDataset\n",
391
+ "\n",
392
+ "# Create datasets for training and validation\n",
393
+ "train_dataset = TensorDataset(train_inputs, train_labels)\n",
394
+ "val_dataset = TensorDataset(val_inputs, val_labels)\n",
395
+ "\n",
396
+ "# Create DataLoader for both training and validation\n",
397
+ "train_dataloader = DataLoader(train_dataset, batch_size=20, shuffle=True)\n",
398
+ "val_dataloader = DataLoader(val_dataset, batch_size=20)\n"
399
+ ]
400
+ },
401
+ {
402
+ "cell_type": "code",
403
+ "execution_count": 44,
404
+ "metadata": {},
405
+ "outputs": [
406
+ {
407
+ "name": "stdout",
408
+ "output_type": "stream",
409
+ "text": [
410
+ "cpu\n"
411
+ ]
412
+ },
413
+ {
414
+ "name": "stderr",
415
+ "output_type": "stream",
416
+ "text": [
417
+ "/opt/anaconda3/envs/pytorch_env/lib/python3.11/site-packages/transformers/optimization.py:591: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning\n",
418
+ " warnings.warn(\n"
419
+ ]
420
+ }
421
+ ],
422
+ "source": [
423
+ "from transformers import AdamW\n",
424
+ "from torch.optim.lr_scheduler import StepLR\n",
425
+ "\n",
426
+ "# Set up the optimizer\n",
427
+ "optimizer = AdamW(model.parameters(), lr=0.0001)\n",
428
+ "\n",
429
+ "# Define the training loop\n",
430
+ "epochs = 1\n",
431
+ "device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')\n",
432
+ "model.to(device)\n",
433
+ "\n",
434
+ "print(device)"
435
+ ]
436
+ },
437
+ {
438
+ "cell_type": "code",
439
+ "execution_count": 45,
440
+ "metadata": {},
441
+ "outputs": [
442
+ {
443
+ "name": "stdout",
444
+ "output_type": "stream",
445
+ "text": [
446
+ "tensor(0.1266, grad_fn=<NllLossBackward0>)\n",
447
+ "tensor(0.2361, grad_fn=<NllLossBackward0>)\n",
448
+ "tensor(0.0948, grad_fn=<NllLossBackward0>)\n",
449
+ "tensor(0.0170, grad_fn=<NllLossBackward0>)\n",
450
+ "tensor(0.5257, grad_fn=<NllLossBackward0>)\n",
451
+ "tensor(0.0933, grad_fn=<NllLossBackward0>)\n",
452
+ "tensor(0.1646, grad_fn=<NllLossBackward0>)\n",
453
+ "tensor(0.2118, grad_fn=<NllLossBackward0>)\n",
454
+ "tensor(0.0173, grad_fn=<NllLossBackward0>)\n",
455
+ "tensor(0.1543, grad_fn=<NllLossBackward0>)\n",
456
+ "tensor(0.3518, grad_fn=<NllLossBackward0>)\n",
457
+ "tensor(0.5005, grad_fn=<NllLossBackward0>)\n",
458
+ "tensor(0.3083, grad_fn=<NllLossBackward0>)\n",
459
+ "tensor(0.1673, grad_fn=<NllLossBackward0>)\n",
460
+ "tensor(0.0377, grad_fn=<NllLossBackward0>)\n",
461
+ "tensor(0.1693, grad_fn=<NllLossBackward0>)\n",
462
+ "tensor(0.3132, grad_fn=<NllLossBackward0>)\n",
463
+ "tensor(0.3724, grad_fn=<NllLossBackward0>)\n",
464
+ "tensor(0.0699, grad_fn=<NllLossBackward0>)\n",
465
+ "tensor(0.1015, grad_fn=<NllLossBackward0>)\n",
466
+ "tensor(0.0627, grad_fn=<NllLossBackward0>)\n",
467
+ "tensor(0.0439, grad_fn=<NllLossBackward0>)\n",
468
+ "tensor(0.3108, grad_fn=<NllLossBackward0>)\n",
469
+ "tensor(0.1622, grad_fn=<NllLossBackward0>)\n",
470
+ "tensor(0.2091, grad_fn=<NllLossBackward0>)\n",
471
+ "tensor(0.1177, grad_fn=<NllLossBackward0>)\n",
472
+ "tensor(0.5044, grad_fn=<NllLossBackward0>)\n",
473
+ "tensor(0.0834, grad_fn=<NllLossBackward0>)\n",
474
+ "tensor(0.1307, grad_fn=<NllLossBackward0>)\n",
475
+ "tensor(0.0162, grad_fn=<NllLossBackward0>)\n",
476
+ "tensor(0.1507, grad_fn=<NllLossBackward0>)\n",
477
+ "tensor(0.4310, grad_fn=<NllLossBackward0>)\n",
478
+ "tensor(0.1047, grad_fn=<NllLossBackward0>)\n",
479
+ "tensor(0.3400, grad_fn=<NllLossBackward0>)\n",
480
+ "tensor(0.5385, grad_fn=<NllLossBackward0>)\n",
481
+ "tensor(0.0468, grad_fn=<NllLossBackward0>)\n",
482
+ "tensor(0.0655, grad_fn=<NllLossBackward0>)\n",
483
+ "tensor(0.0421, grad_fn=<NllLossBackward0>)\n",
484
+ "tensor(0.2367, grad_fn=<NllLossBackward0>)\n",
485
+ "tensor(0.1999, grad_fn=<NllLossBackward0>)\n",
486
+ "tensor(0.3367, grad_fn=<NllLossBackward0>)\n",
487
+ "tensor(0.5989, grad_fn=<NllLossBackward0>)\n",
488
+ "tensor(0.0349, grad_fn=<NllLossBackward0>)\n",
489
+ "tensor(0.4536, grad_fn=<NllLossBackward0>)\n",
490
+ "tensor(0.2197, grad_fn=<NllLossBackward0>)\n",
491
+ "tensor(0.2861, grad_fn=<NllLossBackward0>)\n",
492
+ "tensor(0.1133, grad_fn=<NllLossBackward0>)\n",
493
+ "tensor(0.2491, grad_fn=<NllLossBackward0>)\n",
494
+ "tensor(0.2210, grad_fn=<NllLossBackward0>)\n",
495
+ "tensor(0.1425, grad_fn=<NllLossBackward0>)\n",
496
+ "tensor(0.1268, grad_fn=<NllLossBackward0>)\n",
497
+ "tensor(0.2085, grad_fn=<NllLossBackward0>)\n",
498
+ "tensor(0.2444, grad_fn=<NllLossBackward0>)\n",
499
+ "tensor(0.3229, grad_fn=<NllLossBackward0>)\n",
500
+ "tensor(0.1340, grad_fn=<NllLossBackward0>)\n",
501
+ "tensor(0.2742, grad_fn=<NllLossBackward0>)\n",
502
+ "tensor(0.2652, grad_fn=<NllLossBackward0>)\n",
503
+ "tensor(0.1091, grad_fn=<NllLossBackward0>)\n",
504
+ "tensor(0.3718, grad_fn=<NllLossBackward0>)\n",
505
+ "tensor(0.1806, grad_fn=<NllLossBackward0>)\n",
506
+ "tensor(0.1180, grad_fn=<NllLossBackward0>)\n",
507
+ "tensor(0.1474, grad_fn=<NllLossBackward0>)\n",
508
+ "tensor(0.2807, grad_fn=<NllLossBackward0>)\n",
509
+ "tensor(0.2696, grad_fn=<NllLossBackward0>)\n",
510
+ "tensor(0.4681, grad_fn=<NllLossBackward0>)\n",
511
+ "tensor(0.0877, grad_fn=<NllLossBackward0>)\n",
512
+ "tensor(0.3703, grad_fn=<NllLossBackward0>)\n",
513
+ "tensor(0.4087, grad_fn=<NllLossBackward0>)\n",
514
+ "tensor(0.5539, grad_fn=<NllLossBackward0>)\n",
515
+ "tensor(0.1504, grad_fn=<NllLossBackward0>)\n",
516
+ "tensor(0.0107, grad_fn=<NllLossBackward0>)\n",
517
+ "tensor(0.5127, grad_fn=<NllLossBackward0>)\n",
518
+ "tensor(0.5999, grad_fn=<NllLossBackward0>)\n",
519
+ "tensor(0.1659, grad_fn=<NllLossBackward0>)\n",
520
+ "tensor(0.0303, grad_fn=<NllLossBackward0>)\n",
521
+ "tensor(0.2197, grad_fn=<NllLossBackward0>)\n",
522
+ "tensor(0.2298, grad_fn=<NllLossBackward0>)\n",
523
+ "tensor(0.3073, grad_fn=<NllLossBackward0>)\n",
524
+ "tensor(0.3306, grad_fn=<NllLossBackward0>)\n",
525
+ "tensor(0.2281, grad_fn=<NllLossBackward0>)\n",
526
+ "tensor(0.0406, grad_fn=<NllLossBackward0>)\n",
527
+ "tensor(0.1882, grad_fn=<NllLossBackward0>)\n",
528
+ "tensor(0.2777, grad_fn=<NllLossBackward0>)\n",
529
+ "tensor(0.3764, grad_fn=<NllLossBackward0>)\n",
530
+ "tensor(0.2865, grad_fn=<NllLossBackward0>)\n",
531
+ "tensor(0.1368, grad_fn=<NllLossBackward0>)\n",
532
+ "tensor(0.3605, grad_fn=<NllLossBackward0>)\n",
533
+ "tensor(0.1100, grad_fn=<NllLossBackward0>)\n",
534
+ "tensor(0.2140, grad_fn=<NllLossBackward0>)\n",
535
+ "tensor(0.4161, grad_fn=<NllLossBackward0>)\n",
536
+ "tensor(0.2829, grad_fn=<NllLossBackward0>)\n",
537
+ "tensor(0.2951, grad_fn=<NllLossBackward0>)\n",
538
+ "tensor(0.2776, grad_fn=<NllLossBackward0>)\n",
539
+ "tensor(0.0665, grad_fn=<NllLossBackward0>)\n",
540
+ "tensor(0.4622, grad_fn=<NllLossBackward0>)\n",
541
+ "tensor(0.1903, grad_fn=<NllLossBackward0>)\n",
542
+ "tensor(0.1492, grad_fn=<NllLossBackward0>)\n",
543
+ "tensor(0.3531, grad_fn=<NllLossBackward0>)\n",
544
+ "tensor(0.1535, grad_fn=<NllLossBackward0>)\n",
545
+ "tensor(0.4230, grad_fn=<NllLossBackward0>)\n",
546
+ "tensor(0.2674, grad_fn=<NllLossBackward0>)\n",
547
+ "tensor(0.1988, grad_fn=<NllLossBackward0>)\n",
548
+ "tensor(0.1032, grad_fn=<NllLossBackward0>)\n",
549
+ "tensor(0.6737, grad_fn=<NllLossBackward0>)\n",
550
+ "tensor(0.0771, grad_fn=<NllLossBackward0>)\n",
551
+ "tensor(0.0759, grad_fn=<NllLossBackward0>)\n",
552
+ "tensor(0.2127, grad_fn=<NllLossBackward0>)\n",
553
+ "tensor(0.2328, grad_fn=<NllLossBackward0>)\n",
554
+ "tensor(0.4041, grad_fn=<NllLossBackward0>)\n",
555
+ "tensor(0.3188, grad_fn=<NllLossBackward0>)\n",
556
+ "tensor(0.2907, grad_fn=<NllLossBackward0>)\n",
557
+ "tensor(0.1548, grad_fn=<NllLossBackward0>)\n",
558
+ "tensor(0.2523, grad_fn=<NllLossBackward0>)\n",
559
+ "tensor(0.3066, grad_fn=<NllLossBackward0>)\n",
560
+ "tensor(0.2681, grad_fn=<NllLossBackward0>)\n",
561
+ "tensor(0.1790, grad_fn=<NllLossBackward0>)\n",
562
+ "tensor(0.1407, grad_fn=<NllLossBackward0>)\n",
563
+ "tensor(0.4857, grad_fn=<NllLossBackward0>)\n",
564
+ "tensor(0.3541, grad_fn=<NllLossBackward0>)\n",
565
+ "tensor(0.2105, grad_fn=<NllLossBackward0>)\n",
566
+ "tensor(0.2170, grad_fn=<NllLossBackward0>)\n",
567
+ "tensor(0.3173, grad_fn=<NllLossBackward0>)\n",
568
+ "tensor(0.1405, grad_fn=<NllLossBackward0>)\n",
569
+ "tensor(0.2956, grad_fn=<NllLossBackward0>)\n",
570
+ "tensor(0.5343, grad_fn=<NllLossBackward0>)\n",
571
+ "tensor(0.3510, grad_fn=<NllLossBackward0>)\n",
572
+ "tensor(0.1565, grad_fn=<NllLossBackward0>)\n",
573
+ "tensor(0.7312, grad_fn=<NllLossBackward0>)\n",
574
+ "tensor(0.4818, grad_fn=<NllLossBackward0>)\n",
575
+ "tensor(0.3232, grad_fn=<NllLossBackward0>)\n",
576
+ "tensor(0.2504, grad_fn=<NllLossBackward0>)\n",
577
+ "tensor(0.0905, grad_fn=<NllLossBackward0>)\n",
578
+ "tensor(0.2030, grad_fn=<NllLossBackward0>)\n",
579
+ "tensor(0.3142, grad_fn=<NllLossBackward0>)\n",
580
+ "tensor(0.4711, grad_fn=<NllLossBackward0>)\n",
581
+ "tensor(0.0577, grad_fn=<NllLossBackward0>)\n",
582
+ "tensor(0.1709, grad_fn=<NllLossBackward0>)\n",
583
+ "tensor(0.1811, grad_fn=<NllLossBackward0>)\n",
584
+ "tensor(0.4690, grad_fn=<NllLossBackward0>)\n",
585
+ "tensor(0.1305, grad_fn=<NllLossBackward0>)\n",
586
+ "tensor(0.1392, grad_fn=<NllLossBackward0>)\n",
587
+ "tensor(0.1633, grad_fn=<NllLossBackward0>)\n",
588
+ "tensor(0.1361, grad_fn=<NllLossBackward0>)\n",
589
+ "tensor(0.2246, grad_fn=<NllLossBackward0>)\n",
590
+ "tensor(0.1142, grad_fn=<NllLossBackward0>)\n",
591
+ "tensor(0.4056, grad_fn=<NllLossBackward0>)\n",
592
+ "tensor(0.0341, grad_fn=<NllLossBackward0>)\n",
593
+ "tensor(0.7735, grad_fn=<NllLossBackward0>)\n",
594
+ "tensor(0.5424, grad_fn=<NllLossBackward0>)\n",
595
+ "tensor(0.0938, grad_fn=<NllLossBackward0>)\n",
596
+ "tensor(0.2202, grad_fn=<NllLossBackward0>)\n",
597
+ "tensor(0.0883, grad_fn=<NllLossBackward0>)\n",
598
+ "tensor(0.5231, grad_fn=<NllLossBackward0>)\n",
599
+ "tensor(0.3891, grad_fn=<NllLossBackward0>)\n",
600
+ "tensor(0.0318, grad_fn=<NllLossBackward0>)\n",
601
+ "tensor(0.2012, grad_fn=<NllLossBackward0>)\n",
602
+ "tensor(0.2682, grad_fn=<NllLossBackward0>)\n",
603
+ "tensor(0.4051, grad_fn=<NllLossBackward0>)\n",
604
+ "tensor(0.0735, grad_fn=<NllLossBackward0>)\n",
605
+ "tensor(0.0473, grad_fn=<NllLossBackward0>)\n",
606
+ "tensor(0.0671, grad_fn=<NllLossBackward0>)\n",
607
+ "tensor(0.3305, grad_fn=<NllLossBackward0>)\n",
608
+ "tensor(0.2791, grad_fn=<NllLossBackward0>)\n",
609
+ "tensor(0.3031, grad_fn=<NllLossBackward0>)\n",
610
+ "tensor(0.1154, grad_fn=<NllLossBackward0>)\n",
611
+ "tensor(0.1411, grad_fn=<NllLossBackward0>)\n",
612
+ "tensor(0.2358, grad_fn=<NllLossBackward0>)\n",
613
+ "tensor(0.4483, grad_fn=<NllLossBackward0>)\n",
614
+ "tensor(0.1316, grad_fn=<NllLossBackward0>)\n",
615
+ "tensor(0.4731, grad_fn=<NllLossBackward0>)\n",
616
+ "tensor(0.1665, grad_fn=<NllLossBackward0>)\n",
617
+ "tensor(0.0311, grad_fn=<NllLossBackward0>)\n",
618
+ "tensor(0.2365, grad_fn=<NllLossBackward0>)\n",
619
+ "tensor(0.5279, grad_fn=<NllLossBackward0>)\n",
620
+ "tensor(0.4144, grad_fn=<NllLossBackward0>)\n",
621
+ "tensor(0.1594, grad_fn=<NllLossBackward0>)\n",
622
+ "tensor(0.2623, grad_fn=<NllLossBackward0>)\n",
623
+ "tensor(0.2407, grad_fn=<NllLossBackward0>)\n",
624
+ "tensor(0.4914, grad_fn=<NllLossBackward0>)\n",
625
+ "tensor(0.2589, grad_fn=<NllLossBackward0>)\n",
626
+ "tensor(0.3578, grad_fn=<NllLossBackward0>)\n",
627
+ "tensor(0.1238, grad_fn=<NllLossBackward0>)\n",
628
+ "tensor(0.3464, grad_fn=<NllLossBackward0>)\n",
629
+ "tensor(0.1637, grad_fn=<NllLossBackward0>)\n",
630
+ "tensor(0.1750, grad_fn=<NllLossBackward0>)\n",
631
+ "tensor(0.4039, grad_fn=<NllLossBackward0>)\n",
632
+ "tensor(0.3257, grad_fn=<NllLossBackward0>)\n",
633
+ "tensor(0.3095, grad_fn=<NllLossBackward0>)\n",
634
+ "tensor(0.1030, grad_fn=<NllLossBackward0>)\n",
635
+ "tensor(0.2661, grad_fn=<NllLossBackward0>)\n",
636
+ "tensor(0.3043, grad_fn=<NllLossBackward0>)\n",
637
+ "tensor(0.4696, grad_fn=<NllLossBackward0>)\n",
638
+ "tensor(0.2800, grad_fn=<NllLossBackward0>)\n",
639
+ "tensor(0.1741, grad_fn=<NllLossBackward0>)\n",
640
+ "tensor(0.1582, grad_fn=<NllLossBackward0>)\n",
641
+ "tensor(0.0720, grad_fn=<NllLossBackward0>)\n",
642
+ "tensor(0.5691, grad_fn=<NllLossBackward0>)\n",
643
+ "tensor(0.2497, grad_fn=<NllLossBackward0>)\n",
644
+ "tensor(0.3357, grad_fn=<NllLossBackward0>)\n",
645
+ "tensor(0.2267, grad_fn=<NllLossBackward0>)\n",
646
+ "tensor(0.1167, grad_fn=<NllLossBackward0>)\n",
647
+ "tensor(0.0201, grad_fn=<NllLossBackward0>)\n",
648
+ "tensor(0.1358, grad_fn=<NllLossBackward0>)\n",
649
+ "tensor(0.1345, grad_fn=<NllLossBackward0>)\n",
650
+ "tensor(0.8850, grad_fn=<NllLossBackward0>)\n",
651
+ "tensor(0.0556, grad_fn=<NllLossBackward0>)\n",
652
+ "tensor(0.0690, grad_fn=<NllLossBackward0>)\n",
653
+ "tensor(0.3296, grad_fn=<NllLossBackward0>)\n",
654
+ "tensor(0.1559, grad_fn=<NllLossBackward0>)\n",
655
+ "tensor(0.3681, grad_fn=<NllLossBackward0>)\n",
656
+ "tensor(0.1394, grad_fn=<NllLossBackward0>)\n",
657
+ "tensor(0.2133, grad_fn=<NllLossBackward0>)\n",
658
+ "tensor(0.2564, grad_fn=<NllLossBackward0>)\n",
659
+ "tensor(0.3522, grad_fn=<NllLossBackward0>)\n",
660
+ "tensor(0.3458, grad_fn=<NllLossBackward0>)\n",
661
+ "tensor(0.2390, grad_fn=<NllLossBackward0>)\n",
662
+ "tensor(0.2744, grad_fn=<NllLossBackward0>)\n",
663
+ "tensor(0.0902, grad_fn=<NllLossBackward0>)\n",
664
+ "tensor(0.3074, grad_fn=<NllLossBackward0>)\n",
665
+ "tensor(0.2031, grad_fn=<NllLossBackward0>)\n",
666
+ "tensor(0.1170, grad_fn=<NllLossBackward0>)\n",
667
+ "tensor(0.5067, grad_fn=<NllLossBackward0>)\n",
668
+ "tensor(0.2392, grad_fn=<NllLossBackward0>)\n",
669
+ "tensor(0.1138, grad_fn=<NllLossBackward0>)\n",
670
+ "tensor(0.4484, grad_fn=<NllLossBackward0>)\n",
671
+ "tensor(0.1577, grad_fn=<NllLossBackward0>)\n",
672
+ "tensor(0.2137, grad_fn=<NllLossBackward0>)\n",
673
+ "tensor(0.1273, grad_fn=<NllLossBackward0>)\n",
674
+ "tensor(0.1333, grad_fn=<NllLossBackward0>)\n",
675
+ "tensor(0.1629, grad_fn=<NllLossBackward0>)\n",
676
+ "tensor(0.1824, grad_fn=<NllLossBackward0>)\n",
677
+ "tensor(0.8445, grad_fn=<NllLossBackward0>)\n",
678
+ "tensor(0.2046, grad_fn=<NllLossBackward0>)\n",
679
+ "tensor(0.1296, grad_fn=<NllLossBackward0>)\n",
680
+ "tensor(0.1347, grad_fn=<NllLossBackward0>)\n",
681
+ "tensor(0.6210, grad_fn=<NllLossBackward0>)\n",
682
+ "tensor(0.2479, grad_fn=<NllLossBackward0>)\n",
683
+ "tensor(0.3683, grad_fn=<NllLossBackward0>)\n",
684
+ "tensor(0.2815, grad_fn=<NllLossBackward0>)\n",
685
+ "tensor(0.4198, grad_fn=<NllLossBackward0>)\n",
686
+ "tensor(0.5143, grad_fn=<NllLossBackward0>)\n",
687
+ "tensor(0.1253, grad_fn=<NllLossBackward0>)\n",
688
+ "tensor(0.3922, grad_fn=<NllLossBackward0>)\n",
689
+ "tensor(0.2052, grad_fn=<NllLossBackward0>)\n",
690
+ "tensor(0.3182, grad_fn=<NllLossBackward0>)\n",
691
+ "tensor(0.3578, grad_fn=<NllLossBackward0>)\n",
692
+ "tensor(0.2138, grad_fn=<NllLossBackward0>)\n",
693
+ "tensor(0.2801, grad_fn=<NllLossBackward0>)\n",
694
+ "tensor(0.4023, grad_fn=<NllLossBackward0>)\n",
695
+ "tensor(0.2817, grad_fn=<NllLossBackward0>)\n",
696
+ "tensor(0.1442, grad_fn=<NllLossBackward0>)\n",
697
+ "tensor(0.5465, grad_fn=<NllLossBackward0>)\n",
698
+ "tensor(0.0325, grad_fn=<NllLossBackward0>)\n",
699
+ "tensor(0.4592, grad_fn=<NllLossBackward0>)\n",
700
+ "tensor(0.2917, grad_fn=<NllLossBackward0>)\n",
701
+ "tensor(0.4769, grad_fn=<NllLossBackward0>)\n",
702
+ "tensor(0.5182, grad_fn=<NllLossBackward0>)\n",
703
+ "tensor(0.2828, grad_fn=<NllLossBackward0>)\n",
704
+ "tensor(0.2595, grad_fn=<NllLossBackward0>)\n",
705
+ "tensor(0.5020, grad_fn=<NllLossBackward0>)\n",
706
+ "tensor(0.1517, grad_fn=<NllLossBackward0>)\n",
707
+ "tensor(0.3279, grad_fn=<NllLossBackward0>)\n",
708
+ "tensor(0.1594, grad_fn=<NllLossBackward0>)\n",
709
+ "tensor(0.0840, grad_fn=<NllLossBackward0>)\n",
710
+ "tensor(0.3132, grad_fn=<NllLossBackward0>)\n",
711
+ "tensor(0.1184, grad_fn=<NllLossBackward0>)\n",
712
+ "tensor(0.0184, grad_fn=<NllLossBackward0>)\n",
713
+ "tensor(0.2888, grad_fn=<NllLossBackward0>)\n",
714
+ "tensor(0.0821, grad_fn=<NllLossBackward0>)\n",
715
+ "tensor(0.2481, grad_fn=<NllLossBackward0>)\n",
716
+ "tensor(0.0216, grad_fn=<NllLossBackward0>)\n",
717
+ "tensor(0.2419, grad_fn=<NllLossBackward0>)\n",
718
+ "tensor(0.3978, grad_fn=<NllLossBackward0>)\n",
719
+ "tensor(0.1400, grad_fn=<NllLossBackward0>)\n",
720
+ "tensor(0.0140, grad_fn=<NllLossBackward0>)\n",
721
+ "tensor(0.4252, grad_fn=<NllLossBackward0>)\n",
722
+ "tensor(0.0495, grad_fn=<NllLossBackward0>)\n",
723
+ "tensor(0.4713, grad_fn=<NllLossBackward0>)\n",
724
+ "tensor(0.0973, grad_fn=<NllLossBackward0>)\n",
725
+ "tensor(0.1307, grad_fn=<NllLossBackward0>)\n",
726
+ "tensor(0.0592, grad_fn=<NllLossBackward0>)\n",
727
+ "tensor(0.4353, grad_fn=<NllLossBackward0>)\n",
728
+ "tensor(0.3089, grad_fn=<NllLossBackward0>)\n",
729
+ "tensor(0.1569, grad_fn=<NllLossBackward0>)\n",
730
+ "tensor(0.2282, grad_fn=<NllLossBackward0>)\n",
731
+ "tensor(0.4177, grad_fn=<NllLossBackward0>)\n",
732
+ "tensor(0.0643, grad_fn=<NllLossBackward0>)\n",
733
+ "tensor(0.4958, grad_fn=<NllLossBackward0>)\n",
734
+ "tensor(0.3452, grad_fn=<NllLossBackward0>)\n",
735
+ "tensor(0.1051, grad_fn=<NllLossBackward0>)\n",
736
+ "tensor(0.4404, grad_fn=<NllLossBackward0>)\n",
737
+ "tensor(0.3820, grad_fn=<NllLossBackward0>)\n",
738
+ "tensor(0.1086, grad_fn=<NllLossBackward0>)\n",
739
+ "tensor(0.2805, grad_fn=<NllLossBackward0>)\n",
740
+ "tensor(0.4529, grad_fn=<NllLossBackward0>)\n",
741
+ "tensor(0.1772, grad_fn=<NllLossBackward0>)\n",
742
+ "tensor(0.1061, grad_fn=<NllLossBackward0>)\n",
743
+ "tensor(0.1318, grad_fn=<NllLossBackward0>)\n",
744
+ "tensor(0.3808, grad_fn=<NllLossBackward0>)\n",
745
+ "tensor(0.3329, grad_fn=<NllLossBackward0>)\n",
746
+ "tensor(0.1924, grad_fn=<NllLossBackward0>)\n",
747
+ "tensor(0.3695, grad_fn=<NllLossBackward0>)\n",
748
+ "tensor(0.2400, grad_fn=<NllLossBackward0>)\n",
749
+ "tensor(0.2193, grad_fn=<NllLossBackward0>)\n",
750
+ "tensor(0.1588, grad_fn=<NllLossBackward0>)\n",
751
+ "tensor(0.1683, grad_fn=<NllLossBackward0>)\n",
752
+ "tensor(0.3439, grad_fn=<NllLossBackward0>)\n",
753
+ "tensor(0.2541, grad_fn=<NllLossBackward0>)\n",
754
+ "tensor(0.2351, grad_fn=<NllLossBackward0>)\n",
755
+ "tensor(0.2033, grad_fn=<NllLossBackward0>)\n",
756
+ "tensor(0.0757, grad_fn=<NllLossBackward0>)\n",
757
+ "tensor(0.1629, grad_fn=<NllLossBackward0>)\n",
758
+ "tensor(0.3000, grad_fn=<NllLossBackward0>)\n",
759
+ "tensor(0.6601, grad_fn=<NllLossBackward0>)\n",
760
+ "tensor(0.1748, grad_fn=<NllLossBackward0>)\n",
761
+ "tensor(0.4209, grad_fn=<NllLossBackward0>)\n",
762
+ "tensor(0.0594, grad_fn=<NllLossBackward0>)\n",
763
+ "tensor(0.2206, grad_fn=<NllLossBackward0>)\n",
764
+ "tensor(0.2674, grad_fn=<NllLossBackward0>)\n",
765
+ "tensor(0.0595, grad_fn=<NllLossBackward0>)\n",
766
+ "tensor(0.2141, grad_fn=<NllLossBackward0>)\n",
767
+ "tensor(0.1375, grad_fn=<NllLossBackward0>)\n",
768
+ "tensor(0.4534, grad_fn=<NllLossBackward0>)\n",
769
+ "tensor(0.2570, grad_fn=<NllLossBackward0>)\n",
770
+ "tensor(0.2481, grad_fn=<NllLossBackward0>)\n",
771
+ "tensor(0.4599, grad_fn=<NllLossBackward0>)\n",
772
+ "tensor(0.2221, grad_fn=<NllLossBackward0>)\n",
773
+ "tensor(0.2963, grad_fn=<NllLossBackward0>)\n",
774
+ "tensor(0.1427, grad_fn=<NllLossBackward0>)\n",
775
+ "tensor(0.4567, grad_fn=<NllLossBackward0>)\n",
776
+ "tensor(0.1509, grad_fn=<NllLossBackward0>)\n",
777
+ "tensor(0.3520, grad_fn=<NllLossBackward0>)\n",
778
+ "tensor(0.3681, grad_fn=<NllLossBackward0>)\n",
779
+ "tensor(0.5287, grad_fn=<NllLossBackward0>)\n",
780
+ "tensor(0.3123, grad_fn=<NllLossBackward0>)\n",
781
+ "tensor(0.3609, grad_fn=<NllLossBackward0>)\n",
782
+ "tensor(0.1110, grad_fn=<NllLossBackward0>)\n",
783
+ "tensor(0.2717, grad_fn=<NllLossBackward0>)\n",
784
+ "tensor(0.1092, grad_fn=<NllLossBackward0>)\n",
785
+ "tensor(0.2693, grad_fn=<NllLossBackward0>)\n",
786
+ "tensor(0.2787, grad_fn=<NllLossBackward0>)\n",
787
+ "tensor(0.1664, grad_fn=<NllLossBackward0>)\n",
788
+ "tensor(0.0727, grad_fn=<NllLossBackward0>)\n",
789
+ "tensor(0.0400, grad_fn=<NllLossBackward0>)\n",
790
+ "tensor(0.1332, grad_fn=<NllLossBackward0>)\n",
791
+ "tensor(0.4125, grad_fn=<NllLossBackward0>)\n",
792
+ "tensor(0.3152, grad_fn=<NllLossBackward0>)\n",
793
+ "tensor(0.4981, grad_fn=<NllLossBackward0>)\n",
794
+ "tensor(0.1758, grad_fn=<NllLossBackward0>)\n",
795
+ "tensor(0.1878, grad_fn=<NllLossBackward0>)\n",
796
+ "tensor(1.1352, grad_fn=<NllLossBackward0>)\n",
797
+ "Epoch 1 | Loss: 0.25651482065232134\n"
798
+ ]
799
+ }
800
+ ],
801
+ "source": [
802
+ "for epoch in range(epochs):\n",
803
+ " model.train()\n",
804
+ " total_loss = 0\n",
805
+ " for batch in train_dataloader:\n",
806
+ " input_ids, labels = batch\n",
807
+ " input_ids, labels = input_ids.to(device), labels.to(device)\n",
808
+ "\n",
809
+ " # Zero the gradients\n",
810
+ " optimizer.zero_grad()\n",
811
+ "\n",
812
+ " # Forward pass\n",
813
+ " outputs = model(input_ids, labels=labels)\n",
814
+ " loss = outputs.loss\n",
815
+ " total_loss += loss.item()\n",
816
+ "\n",
817
+ " # Backward pass\n",
818
+ " loss.backward()\n",
819
+ " optimizer.step()\n",
820
+ " print(loss)\n",
821
+ " print(f\"Epoch {epoch + 1} | Loss: {total_loss / len(train_dataloader)}\")"
822
+ ]
823
+ },
824
+ {
825
+ "cell_type": "code",
826
+ "execution_count": 36,
827
+ "metadata": {},
828
+ "outputs": [
829
+ {
830
+ "name": "stdout",
831
+ "output_type": "stream",
832
+ "text": [
833
+ "Validation Accuracy: 78.96%\n"
834
+ ]
835
+ }
836
+ ],
837
+ "source": [
838
+ "model.eval()\n",
839
+ "correct_predictions = 0\n",
840
+ "total_predictions = 0\n",
841
+ "\n",
842
+ "with torch.no_grad():\n",
843
+ " for batch in val_dataloader:\n",
844
+ " input_ids, labels = batch\n",
845
+ " input_ids, labels = input_ids.to(device), labels.to(device)\n",
846
+ " # Forward pass\n",
847
+ " outputs = model(input_ids)\n",
848
+ " predictions = torch.argmax(outputs.logits, dim=-1)\n",
849
+ "\n",
850
+ " correct_predictions += (predictions == labels).sum().item()\n",
851
+ " total_predictions += labels.size(0)\n",
852
+ "\n",
853
+ "accuracy = correct_predictions / total_predictions\n",
854
+ "print(f\"Validation Accuracy: {accuracy * 100:.2f}%\")"
855
+ ]
856
+ },
857
+ {
858
+ "cell_type": "code",
859
+ "execution_count": 37,
860
+ "metadata": {},
861
+ "outputs": [
862
+ {
863
+ "name": "stdout",
864
+ "output_type": "stream",
865
+ "text": [
866
+ "3\n"
867
+ ]
868
+ }
869
+ ],
870
+ "source": [
871
+ "def predict(text):\n",
872
+ " inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=512)\n",
873
+ " input_ids = inputs['input_ids'].to(device)\n",
874
+ " \n",
875
+ " model.eval()\n",
876
+ " with torch.no_grad():\n",
877
+ " outputs = model(input_ids)\n",
878
+ " prediction = torch.argmax(outputs.logits, dim=-1)\n",
879
+ " return prediction.item()\n",
880
+ "\n",
881
+ "# Example prediction\n",
882
+ "question = \"Compare two dog food commercials. What is the difference between them and how do they both sell their products?\"\n",
883
+ "print(predict(question))\n"
884
+ ]
885
+ },
886
+ {
887
+ "cell_type": "code",
888
+ "execution_count": 47,
889
+ "metadata": {},
890
+ "outputs": [
891
+ {
892
+ "name": "stdout",
893
+ "output_type": "stream",
894
+ "text": [
895
+ "Remembering: 0.6210\n",
896
+ "Understanding: 0.2401\n",
897
+ "Applying: 0.0801\n",
898
+ "Analyzing: 0.0533\n",
899
+ "Evaluating: 0.0028\n",
900
+ "Creating: 0.0026\n"
901
+ ]
902
+ }
903
+ ],
904
+ "source": [
905
+ "from torch.nn.functional import softmax\n",
906
+ "\n",
907
+ "# The mapping of class labels to numeric labels\n",
908
+ "mapping = {\"Remembering\": 0, \"Understanding\": 1, \"Applying\": 2, \"Analyzing\": 3, \"Evaluating\": 4, \"Creating\": 5}\n",
909
+ "\n",
910
+ "# Reverse the mapping to get the class name from the index\n",
911
+ "reverse_mapping = {v: k for k, v in mapping.items()}\n",
912
+ "\n",
913
+ "def predict(text):\n",
914
+ " # Tokenize the input text\n",
915
+ " inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=512)\n",
916
+ " input_ids = inputs['input_ids'].to(device)\n",
917
+ " \n",
918
+ " model.eval()\n",
919
+ " with torch.no_grad():\n",
920
+ " # Get the raw logits from the model\n",
921
+ " outputs = model(input_ids)\n",
922
+ " logits = outputs.logits\n",
923
+ " \n",
924
+ " # Apply softmax to get probabilities\n",
925
+ " probabilities = softmax(logits, dim=-1)\n",
926
+ " \n",
927
+ " # Convert probabilities to a list or dictionary of class probabilities\n",
928
+ " probabilities = probabilities.squeeze().cpu().numpy()\n",
929
+ " \n",
930
+ " # Map the probabilities to the class labels using the reverse mapping\n",
931
+ " class_probabilities = {reverse_mapping[i]: prob for i, prob in enumerate(probabilities)}\n",
932
+ " \n",
933
+ " return class_probabilities\n",
934
+ "\n",
935
+ "# Example prediction\n",
936
+ "question = \"State and explain rules of inference.\"\n",
937
+ "class_probabilities = predict(question)\n",
938
+ "\n",
939
+ "# Display the probabilities for each class label\n",
940
+ "for class_label, prob in class_probabilities.items():\n",
941
+ " print(f\"{class_label}: {prob:.4f}\")\n"
942
+ ]
943
+ },
944
+ {
945
+ "cell_type": "code",
946
+ "execution_count": 48,
947
+ "metadata": {},
948
+ "outputs": [
949
+ {
950
+ "data": {
951
+ "text/plain": [
952
+ "('./fine_tuned_distilbert/tokenizer_config.json',\n",
953
+ " './fine_tuned_distilbert/special_tokens_map.json',\n",
954
+ " './fine_tuned_distilbert/vocab.txt',\n",
955
+ " './fine_tuned_distilbert/added_tokens.json')"
956
+ ]
957
+ },
958
+ "execution_count": 48,
959
+ "metadata": {},
960
+ "output_type": "execute_result"
961
+ }
962
+ ],
963
+ "source": [
964
+ "model.save_pretrained('./fine_tuned_distilbert')\n",
965
+ "\n",
966
+ "# Save the tokenizer\n",
967
+ "tokenizer.save_pretrained('./fine_tuned_distilbert')"
968
+ ]
969
+ },
970
+ {
971
+ "cell_type": "code",
972
+ "execution_count": 49,
973
+ "metadata": {},
974
+ "outputs": [],
975
+ "source": [
976
+ "from transformers import DistilBertForSequenceClassification, DistilBertTokenizer\n",
977
+ "\n",
978
+ "# Load the saved model\n",
979
+ "model = DistilBertForSequenceClassification.from_pretrained('./fine_tuned_distilbert')\n",
980
+ "\n",
981
+ "# Load the saved tokenizer\n",
982
+ "tokenizer = DistilBertTokenizer.from_pretrained('./fine_tuned_distilbert')\n"
983
+ ]
984
+ },
985
+ {
986
+ "cell_type": "code",
987
+ "execution_count": 50,
988
+ "metadata": {},
989
+ "outputs": [
990
+ {
991
+ "name": "stdout",
992
+ "output_type": "stream",
993
+ "text": [
994
+ "Remembering: 0.0049\n",
995
+ "Understanding: 0.0040\n",
996
+ "Applying: 0.3104\n",
997
+ "Analyzing: 0.2497\n",
998
+ "Evaluating: 0.3769\n",
999
+ "Creating: 0.0542\n"
1000
+ ]
1001
+ }
1002
+ ],
1003
+ "source": [
1004
+ "# Example of using the loaded model for prediction\n",
1005
+ "def predict_with_loaded_model(text):\n",
1006
+ " # Tokenize the input text\n",
1007
+ " inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=512)\n",
1008
+ " input_ids = inputs['input_ids'].to(device)\n",
1009
+ "\n",
1010
+ " model.eval()\n",
1011
+ " with torch.no_grad():\n",
1012
+ " outputs = model(input_ids)\n",
1013
+ " logits = outputs.logits\n",
1014
+ " probabilities = softmax(logits, dim=-1)\n",
1015
+ " \n",
1016
+ " # Map probabilities to class labels\n",
1017
+ " probabilities = probabilities.squeeze().cpu().numpy()\n",
1018
+ " class_probabilities = {reverse_mapping[i]: prob for i, prob in enumerate(probabilities)}\n",
1019
+ " \n",
1020
+ " return class_probabilities\n",
1021
+ "\n",
1022
+ "# Example usage with the saved model\n",
1023
+ "question = \"The accuracy of each position in a sequence of GGTACTGAT is 98%, 95%, 97%, 97%, 98%, 99%, 94%, 93%, and 97% respectively.(a) What is the average PHRED quality score of this sequence?\"\n",
1024
+ "class_probabilities = predict_with_loaded_model(question)\n",
1025
+ "\n",
1026
+ "# Display class probabilities\n",
1027
+ "for class_label, prob in class_probabilities.items():\n",
1028
+ " print(f\"{class_label}: {prob:.4f}\")"
1029
+ ]
1030
+ },
1031
+ {
1032
+ "cell_type": "code",
1033
+ "execution_count": 55,
1034
+ "metadata": {},
1035
+ "outputs": [],
1036
+ "source": [
1037
+ "e = ['@ What are the key differences between classification and regression tasks in supervised learning, and how do you determine which algorithm to use for a specific problem?',\n",
1038
+ " '@ How does clustering differ from dimensionality reduction, and can you provide real-world examples of where each is applied?',\n",
1039
+ " '@ What are common evaluation metrics for classification models, and how do precision, recall, and F1-score relate to each other?',\n",
1040
+ " '@ How do convolutional neural networks (CNNs) and recurrent neural networks (RNNs) differ in their architecture and applications?',\n",
1041
+ " '@ What steps can be taken to identify and mitigate bias in machine learning models, and why is this an important consideration?']"
1042
+ ]
1043
+ },
1044
+ {
1045
+ "cell_type": "code",
1046
+ "execution_count": 56,
1047
+ "metadata": {},
1048
+ "outputs": [
1049
+ {
1050
+ "name": "stdout",
1051
+ "output_type": "stream",
1052
+ "text": [
1053
+ "{'Remembering': 0.10612957, 'Understanding': 0.019418646, 'Applying': 0.06178399, 'Analyzing': 0.06437193, 'Evaluating': 0.02016813, 'Creating': 0.7281277}\n",
1054
+ "{'Remembering': 0.0023775953, 'Understanding': 0.007248114, 'Applying': 0.030584276, 'Analyzing': 0.03784482, 'Evaluating': 0.011662786, 'Creating': 0.9102824}\n",
1055
+ "{'Remembering': 0.77779603, 'Understanding': 0.00137261, 'Applying': 0.030797651, 'Analyzing': 0.01779477, 'Evaluating': 0.015782129, 'Creating': 0.15645678}\n",
1056
+ "{'Remembering': 0.0041304147, 'Understanding': 0.0012872498, 'Applying': 0.0071271434, 'Analyzing': 0.08727108, 'Evaluating': 0.012631507, 'Creating': 0.8875526}\n",
1057
+ "{'Remembering': 0.02713421, 'Understanding': 0.0032449323, 'Applying': 0.0559042, 'Analyzing': 0.021534933, 'Evaluating': 0.015711982, 'Creating': 0.8764698}\n"
1058
+ ]
1059
+ }
1060
+ ],
1061
+ "source": [
1062
+ "for i in e:\n",
1063
+ " class_probabilities = predict_with_loaded_model(i)\n",
1064
+ " print(class_probabilities)"
1065
+ ]
1066
+ },
1067
+ {
1068
+ "cell_type": "code",
1069
+ "execution_count": 67,
1070
+ "metadata": {},
1071
+ "outputs": [],
1072
+ "source": [
1073
+ "weights = {\n",
1074
+ " 'Remembering': 0.5,\n",
1075
+ " 'Understanding': 0.5,\n",
1076
+ " 'Applying': 0.5,\n",
1077
+ " 'Analyzing': 0.5,\n",
1078
+ " 'Evaluating': 0.5,\n",
1079
+ " 'Creating':0.5,\n",
1080
+ "}"
1081
+ ]
1082
+ },
1083
+ {
1084
+ "cell_type": "code",
1085
+ "execution_count": 68,
1086
+ "metadata": {},
1087
+ "outputs": [],
1088
+ "source": [
1089
+ "questions = [\n",
1090
+ " {'Remembering': 0.10612957, 'Understanding': 0.019418646, 'Applying': 0.06178399, 'Analyzing': 0.06437193, 'Evaluating': 0.02016813, 'Creating': 0.7281277},\n",
1091
+ " {'Remembering': 0.0023775953, 'Understanding': 0.007248114, 'Applying': 0.030584276, 'Analyzing': 0.03784482, 'Evaluating': 0.011662786, 'Creating': 0.9102824},\n",
1092
+ " {'Remembering': 0.77779603, 'Understanding': 0.00137261, 'Applying': 0.030797651, 'Analyzing': 0.01779477, 'Evaluating': 0.015782129, 'Creating': 0.15645678},\n",
1093
+ " {'Remembering': 0.0041304147, 'Understanding': 0.0012872498, 'Applying': 0.0071271434, 'Analyzing': 0.08727108, 'Evaluating': 0.012631507, 'Creating': 0.8875526},\n",
1094
+ " {'Remembering': 0.02713421, 'Understanding': 0.0032449323, 'Applying': 0.0559042, 'Analyzing': 0.021534933, 'Evaluating': 0.015711982, 'Creating': 0.8764698}\n",
1095
+ "]"
1096
+ ]
1097
+ },
1098
+ {
1099
+ "cell_type": "code",
1100
+ "execution_count": 69,
1101
+ "metadata": {},
1102
+ "outputs": [
1103
+ {
1104
+ "name": "stdout",
1105
+ "output_type": "stream",
1106
+ "text": [
1107
+ "2.49999998975 18.0 90.0\n",
1108
+ "Normalized Score of the Paper: 0.0278\n"
1109
+ ]
1110
+ }
1111
+ ],
1112
+ "source": [
1113
+ "def calculate_score(question, weights):\n",
1114
+ " score = sum(question[level] * weight for level, weight in weights.items())\n",
1115
+ " return score\n",
1116
+ "\n",
1117
+ "total_score = sum(calculate_score(q, weights) for q in questions)\n",
1118
+ "max_score_per_question = sum([weights[level] for level in weights]) * 6 \n",
1119
+ "max_total_score = max_score_per_question * len(questions) \n",
1120
+ "normalized_score = (total_score - 0) / (max_total_score - 0)\n",
1121
+ "print(total_score, max_score_per_question, max_total_score)\n",
1122
+ "print(f\"Normalized Score of the Paper: {normalized_score:.4f}\")"
1123
+ ]
1124
+ },
1125
+ {
1126
+ "cell_type": "code",
1127
+ "execution_count": null,
1128
+ "metadata": {},
1129
+ "outputs": [],
1130
+ "source": []
1131
+ },
1132
+ {
1133
+ "cell_type": "code",
1134
+ "execution_count": 70,
1135
+ "metadata": {},
1136
+ "outputs": [
1137
+ {
1138
+ "name": "stdout",
1139
+ "output_type": "stream",
1140
+ "text": [
1141
+ "{'Remembering': 0.10612957, 'Understanding': 0.019418646, 'Applying': 0.06178399, 'Analyzing': 0.06437193, 'Evaluating': 0.02016813, 'Creating': 0.7281277}\n",
1142
+ "{'Remembering': 0.0023775953, 'Understanding': 0.007248114, 'Applying': 0.030584276, 'Analyzing': 0.03784482, 'Evaluating': 0.011662786, 'Creating': 0.9102824}\n",
1143
+ "{'Remembering': 0.77779603, 'Understanding': 0.00137261, 'Applying': 0.030797651, 'Analyzing': 0.01779477, 'Evaluating': 0.015782129, 'Creating': 0.15645678}\n",
1144
+ "{'Remembering': 0.0041304147, 'Understanding': 0.0012872498, 'Applying': 0.0071271434, 'Analyzing': 0.08727108, 'Evaluating': 0.012631507, 'Creating': 0.8875526}\n",
1145
+ "{'Remembering': 0.02713421, 'Understanding': 0.0032449323, 'Applying': 0.0559042, 'Analyzing': 0.021534933, 'Evaluating': 0.015711982, 'Creating': 0.8764698}\n"
1146
+ ]
1147
+ }
1148
+ ],
1149
+ "source": [
1150
+ "for i in e:\n",
1151
+ " class_probabilities = predict_with_loaded_model(i)\n",
1152
+ " print(class_probabilities)"
1153
+ ]
1154
+ },
1155
+ {
1156
+ "cell_type": "code",
1157
+ "execution_count": null,
1158
+ "metadata": {},
1159
+ "outputs": [],
1160
+ "source": []
1161
+ }
1162
+ ],
1163
+ "metadata": {
1164
+ "kernelspec": {
1165
+ "display_name": "Python 3 (ipykernel)",
1166
+ "language": "python",
1167
+ "name": "python3"
1168
+ },
1169
+ "language_info": {
1170
+ "codemirror_mode": {
1171
+ "name": "ipython",
1172
+ "version": 3
1173
+ },
1174
+ "file_extension": ".py",
1175
+ "mimetype": "text/x-python",
1176
+ "name": "python",
1177
+ "nbconvert_exporter": "python",
1178
+ "pygments_lexer": "ipython3",
1179
+ "version": "3.12.7"
1180
+ }
1181
+ },
1182
+ "nbformat": 4,
1183
+ "nbformat_minor": 4
1184
+ }
t5_training.ipynb ADDED
@@ -0,0 +1,269 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": null,
6
+ "id": "d071d3d0-aa2f-4582-8e43-12f22e64bbee",
7
+ "metadata": {},
8
+ "outputs": [],
9
+ "source": [
10
+ "# !pip install pytorch \n",
11
+ "# !pip install intel-extension-for-pytorch\n",
12
+ "# !pip install transformers\n",
13
+ "# !pip install datasets\n",
14
+ "# !pip install onnxruntime\n",
15
+ "# !pip install neural_compressor"
16
+ ]
17
+ },
18
+ {
19
+ "cell_type": "code",
20
+ "execution_count": null,
21
+ "id": "2d21c5cb-8042-4d63-8534-eb686acf4bf6",
22
+ "metadata": {},
23
+ "outputs": [],
24
+ "source": [
25
+ "from transformers import T5ForConditionalGeneration, T5Tokenizer\n",
26
+ "from datasets import Dataset\n",
27
+ "from transformers import Trainer, TrainingArguments\n",
28
+ "\n",
29
+ "# Load pre-trained FLAN-T5 model and tokenizer\n",
30
+ "model_name = \"google/flan-t5-large\" # FLAN-T5 Base Model\n",
31
+ "tokenizer = T5Tokenizer.from_pretrained(model_name)\n",
32
+ "model = T5ForConditionalGeneration.from_pretrained(model_name)\n",
33
+ "\n",
34
+ "# Example input-output pair for fine-tuning\n",
35
+ "data = {\n",
36
+ " \"input_text\": [\n",
37
+ " \"What are the key differences between classification and regression tasks in supervised learning, and how do you determine which algorithm to use for a specific problem? e How does clustering differ from dimensionality reduction, and can you provide real-world examples of where each is applied?\"\n",
38
+ " ],\n",
39
+ " \"output_text\": [\n",
40
+ " \"@ What are the key differences between classification and regression tasks in supervised learning, and how do you determine which algorithm to use for a specific problem? @ How does clustering differ from dimensionality reduction, and can you provide real-world examples of where each is applied?\"\n",
41
+ " ]\n",
42
+ "}\n",
43
+ "\n",
44
+ "# Convert the data to a Hugging Face dataset\n",
45
+ "dataset = Dataset.from_dict(data)\n",
46
+ "\n",
47
+ "# Tokenize the data\n",
48
+ "def preprocess_function(examples):\n",
49
+ " model_inputs = tokenizer(examples['input_text'], padding=\"max_length\", truncation=True, max_length=2048)\n",
50
+ " labels = tokenizer(examples['output_text'], padding=\"max_length\", truncation=True, max_length=2048)\n",
51
+ " model_inputs['labels'] = labels['input_ids']\n",
52
+ " return model_inputs"
53
+ ]
54
+ },
55
+ {
56
+ "cell_type": "code",
57
+ "execution_count": null,
58
+ "id": "2e0d06e8-f50a-4a22-93b7-44152f06e462",
59
+ "metadata": {},
60
+ "outputs": [],
61
+ "source": [
62
+ "tokenized_datasets = dataset.map(preprocess_function, batched=True)\n",
63
+ "\n",
64
+ "# Set up the training arguments\n",
65
+ "training_args = TrainingArguments(\n",
66
+ " output_dir=\"./flan_t5_results\", # Output directory for model checkpoints\n",
67
+ " eval_strategy=\"epoch\", # Evaluation strategy to use\n",
68
+ " learning_rate=2e-5, # Learning rate for fine-tuning\n",
69
+ " per_device_train_batch_size=1, # Batch size for training\n",
70
+ " num_train_epochs=1, # Number of epochs\n",
71
+ " weight_decay=0.01, # Weight decay for regularization\n",
72
+ " save_steps=10, # Save model every 10 steps\n",
73
+ " save_total_limit=1, # Limit the number of saved models\n",
74
+ " fp16=False, # Disable mixed precision\n",
75
+ " use_cpu=True # Force CPU-only training\n",
76
+ ")\n",
77
+ "\n",
78
+ "# Initialize the Trainer class\n",
79
+ "trainer = Trainer(\n",
80
+ " model=model,\n",
81
+ " args=training_args,\n",
82
+ " train_dataset=tokenized_datasets,\n",
83
+ " eval_dataset=tokenized_datasets # Use the same dataset for evaluation since we only have one data point\n",
84
+ ")\n",
85
+ "\n",
86
+ "# Start training (this will fine-tune the model on the given example)\n",
87
+ "trainer.train()\n",
88
+ "\n",
89
+ "# Save the fine-tuned model\n",
90
+ "#trainer.save_model(\"./flan_t5_finetuned\")\n",
91
+ "model.save_pretrained(\"./flan_t5_finetuned\")\n",
92
+ "tokenizer.save_pretrained(\"./flan_t5_finetuned\")\n",
93
+ "\n",
94
+ "# Evaluate the model on the training data (for a single example)\n",
95
+ "model.eval()\n",
96
+ "inputs = tokenizer(\"What are the key differences between classification and regression tasks in supervised learning, and how do you determine which algorithm to use for a specific problem? e How does clustering differ from dimensionality reduction, and can you provide real-world examples of where each is applied?\", return_tensors=\"pt\", padding=True)\n",
97
+ "outputs = model.generate(inputs['input_ids'], max_length=1024)\n",
98
+ "\n",
99
+ "# Decode the generated output\n",
100
+ "generated_output = tokenizer.decode(outputs[0], skip_special_tokens=True)\n",
101
+ "print(generated_output)"
102
+ ]
103
+ },
104
+ {
105
+ "cell_type": "code",
106
+ "execution_count": null,
107
+ "id": "d4b97afe-f09a-4bee-9139-ed9802da712e",
108
+ "metadata": {
109
+ "scrolled": true
110
+ },
111
+ "outputs": [],
112
+ "source": [
113
+ "from transformers import T5ForConditionalGeneration, T5Tokenizer\n",
114
+ "from neural_compressor.quantization import fit\n",
115
+ "from neural_compressor.config import PostTrainingQuantConfig\n",
116
+ "\n",
117
+ "# Load your FP32 model\n",
118
+ "model_path = \"./flan_t5_finetuned\"\n",
119
+ "model = T5ForConditionalGeneration.from_pretrained(model_path)\n",
120
+ "tokenizer = T5Tokenizer.from_pretrained(model_path)\n",
121
+ "\n",
122
+ "# Define the quantization configuration\n",
123
+ "quant_config = PostTrainingQuantConfig(approach='dynamic') # Dynamic quantization\n",
124
+ "\n",
125
+ "# Quantize the model\n",
126
+ "q_model = fit(model=model, conf=quant_config)\n",
127
+ "\n",
128
+ "# Save the quantized model\n",
129
+ "quantized_model_path = \"./flan_t5_quantized_fp16\"\n",
130
+ "q_model.save_pretrained(quantized_model_path)\n",
131
+ "tokenizer.save_pretrained(quantized_model_path)\n",
132
+ "\n",
133
+ "print(f\"Quantized model saved at: {quantized_model_path}\")"
134
+ ]
135
+ },
136
+ {
137
+ "cell_type": "code",
138
+ "execution_count": null,
139
+ "id": "a152f3d9-7042-479b-b3ba-ff5c957be518",
140
+ "metadata": {},
141
+ "outputs": [],
142
+ "source": [
143
+ "import torch\n",
144
+ "from transformers import T5ForConditionalGeneration, T5Tokenizer\n",
145
+ "import os\n",
146
+ "\n",
147
+ "# Load the FP16 model\n",
148
+ "model_path = \"./flan_t5_fp16\"\n",
149
+ "model = T5ForConditionalGeneration.from_pretrained(model_path)\n",
150
+ "tokenizer = T5Tokenizer.from_pretrained(model_path)\n",
151
+ "\n",
152
+ "# Set the model to evaluation mode\n",
153
+ "model.eval()\n",
154
+ "\n",
155
+ "# Example input text\n",
156
+ "input_text = \"Translate English to French: How are you?\"\n",
157
+ "inputs = tokenizer(input_text, return_tensors=\"pt\", padding=True, truncation=True)\n",
158
+ "\n",
159
+ "# Prepare decoder input: <pad> token is used as the first decoder input\n",
160
+ "decoder_start_token_id = tokenizer.pad_token_id\n",
161
+ "decoder_input_ids = torch.tensor([[decoder_start_token_id]])\n",
162
+ "\n",
163
+ "# Create output directory if it doesn't exist\n",
164
+ "onnx_output_dir = \"./flant5\"\n",
165
+ "os.makedirs(onnx_output_dir, exist_ok=True)\n",
166
+ "\n",
167
+ "# Define the path for the ONNX model\n",
168
+ "onnx_model_path = os.path.join(onnx_output_dir, \"flan_t5_fp16.onnx\")\n",
169
+ "\n",
170
+ "# Export the model to ONNX\n",
171
+ "torch.onnx.export(\n",
172
+ " model, # Model to be converted\n",
173
+ " (inputs[\"input_ids\"], inputs[\"attention_mask\"], decoder_input_ids), # Input tuple\n",
174
+ " onnx_model_path, # Path to save the ONNX model\n",
175
+ " export_params=True, # Store the trained parameters\n",
176
+ " opset_version=13, # ONNX version\n",
177
+ " do_constant_folding=True, # Optimize constants\n",
178
+ " input_names=[\"input_ids\", \"attention_mask\", \"decoder_input_ids\"], # Input tensor names\n",
179
+ " output_names=[\"output\"], # Output tensor name\n",
180
+ " dynamic_axes={ # Dynamic shapes for batching\n",
181
+ " \"input_ids\": {0: \"batch_size\", 1: \"sequence_length\"},\n",
182
+ " \"attention_mask\": {0: \"batch_size\", 1: \"sequence_length\"},\n",
183
+ " \"decoder_input_ids\": {0: \"batch_size\", 1: \"sequence_length\"},\n",
184
+ " \"output\": {0: \"batch_size\", 1: \"sequence_length\"}\n",
185
+ " }\n",
186
+ ")\n",
187
+ "\n",
188
+ "print(f\"ONNX model saved at: {onnx_model_path}\")"
189
+ ]
190
+ },
191
+ {
192
+ "cell_type": "code",
193
+ "execution_count": null,
194
+ "id": "055abefb-2d0f-4819-b859-86b77270c0be",
195
+ "metadata": {},
196
+ "outputs": [],
197
+ "source": [
198
+ "import onnxruntime as ort\n",
199
+ "import numpy as np\n",
200
+ "from transformers import T5Tokenizer\n",
201
+ "\n",
202
+ "# Load the ONNX model and tokenizer\n",
203
+ "onnx_model_path = \"./flan_t5_fp16.onnx\"\n",
204
+ "tokenizer = T5Tokenizer.from_pretrained(\"./flan_t5_fp16\")\n",
205
+ "ort_session = ort.InferenceSession(onnx_model_path)\n",
206
+ "\n",
207
+ "# Input text for the model\n",
208
+ "input_text = \"Translate English to French: How are you?\"\n",
209
+ "inputs = tokenizer(input_text, return_tensors=\"np\", padding=True, truncation=True)\n",
210
+ "\n",
211
+ "# Ensure inputs are numpy arrays\n",
212
+ "input_ids = np.array(inputs[\"input_ids\"], dtype=np.int64)\n",
213
+ "attention_mask = np.array(inputs[\"attention_mask\"], dtype=np.int64)\n",
214
+ "\n",
215
+ "# Prepare the decoder input (<pad> token for initial input to the decoder)\n",
216
+ "decoder_start_token_id = tokenizer.pad_token_id\n",
217
+ "decoder_input_ids = np.array([[decoder_start_token_id]], dtype=np.int64)\n",
218
+ "\n",
219
+ "# ONNX model inputs\n",
220
+ "onnx_inputs = {\n",
221
+ " \"input_ids\": input_ids,\n",
222
+ " \"attention_mask\": attention_mask,\n",
223
+ " \"decoder_input_ids\": decoder_input_ids\n",
224
+ "}\n",
225
+ "\n",
226
+ "# Run the ONNX model\n",
227
+ "onnx_outputs = ort_session.run(None, onnx_inputs)\n",
228
+ "\n",
229
+ "# Convert logits to token IDs\n",
230
+ "logits = onnx_outputs[0] # Shape: [batch_size, sequence_length, vocab_size]\n",
231
+ "token_ids = np.argmax(logits, axis=-1) # Get token IDs with the highest scores\n",
232
+ "\n",
233
+ "# Decode the token IDs into text\n",
234
+ "decoded_output = tokenizer.decode(token_ids[0], skip_special_tokens=True)\n",
235
+ "\n",
236
+ "print(f\"ONNX Model Output: {decoded_output}\")\n"
237
+ ]
238
+ },
239
+ {
240
+ "cell_type": "code",
241
+ "execution_count": null,
242
+ "id": "a9110235-9c49-46ef-86e1-f446b3f12d67",
243
+ "metadata": {},
244
+ "outputs": [],
245
+ "source": []
246
+ }
247
+ ],
248
+ "metadata": {
249
+ "kernelspec": {
250
+ "display_name": "Python 3 (ipykernel)",
251
+ "language": "python",
252
+ "name": "python3"
253
+ },
254
+ "language_info": {
255
+ "codemirror_mode": {
256
+ "name": "ipython",
257
+ "version": 3
258
+ },
259
+ "file_extension": ".py",
260
+ "mimetype": "text/x-python",
261
+ "name": "python",
262
+ "nbconvert_exporter": "python",
263
+ "pygments_lexer": "ipython3",
264
+ "version": "3.12.7"
265
+ }
266
+ },
267
+ "nbformat": 4,
268
+ "nbformat_minor": 5
269
+ }