File size: 57,385 Bytes
cb71ef5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
WEBVTT

0:00:00.060 --> 0:00:07.762
OK good so today's lecture is on on supervised
machines and stations so what you have seen

0:00:07.762 --> 0:00:13.518
so far is different techniques are on supervised
and MP so you are.

0:00:13.593 --> 0:00:18.552
Data right so let's say in English coppers
you are one file and then in German you have

0:00:18.552 --> 0:00:23.454
another file which is sentence to sentence
la and then you try to build systems around

0:00:23.454 --> 0:00:23.679
it.

0:00:24.324 --> 0:00:30.130
But what's different about this lecture is
that you assume that you have no final data

0:00:30.130 --> 0:00:30.663
at all.

0:00:30.663 --> 0:00:37.137
You only have monolingual data and the question
is how can we build systems to translate between

0:00:37.137 --> 0:00:39.405
these two languages right and so.

0:00:39.359 --> 0:00:44.658
This is a bit more realistic scenario because
you have so many languages in the world.

0:00:44.658 --> 0:00:50.323
You cannot expect to have parallel data between
all the two languages and so, but in typical

0:00:50.323 --> 0:00:55.623
cases you have newspapers and so on, which
is like monolingual files, and the question

0:00:55.623 --> 0:00:57.998
is can we build something around them?

0:00:59.980 --> 0:01:01.651
They like said for today.

0:01:01.651 --> 0:01:05.893
First we'll start up with the interactions,
so why do we need it?

0:01:05.893 --> 0:01:11.614
and also some infusion on how these models
work before going into the technical details.

0:01:11.614 --> 0:01:17.335
I want to also go through an example,, which
kind of gives you more understanding on how

0:01:17.335 --> 0:01:19.263
people came into more elders.

0:01:20.820 --> 0:01:23.905
Then the rest of the lecture is going to be
two parts.

0:01:23.905 --> 0:01:26.092
One is we're going to translate words.

0:01:26.092 --> 0:01:30.018
We're not going to care about how can we translate
the full sentence.

0:01:30.018 --> 0:01:35.177
But given to monolingual files, how can we
get a dictionary basically, which is much easier

0:01:35.177 --> 0:01:37.813
than generating something in a sentence level?

0:01:38.698 --> 0:01:43.533
Then we're going to go into the Edwards case,
which is the unsupervised sentence type solution.

0:01:44.204 --> 0:01:50.201
And here what you'll see is what are the training
objectives which are quite different than the

0:01:50.201 --> 0:01:55.699
word translation and also where it doesn't
but because this is also quite important and

0:01:55.699 --> 0:02:01.384
it's one of the reasons why unsupervised does
not use anymore because the limitations kind

0:02:01.384 --> 0:02:03.946
of go away from the realistic use cases.

0:02:04.504 --> 0:02:06.922
And then that leads to the marketing world
model.

0:02:06.922 --> 0:02:07.115
So.

0:02:07.807 --> 0:02:12.915
People are trying to do to build systems for
languages that will not have any parallel data.

0:02:12.915 --> 0:02:17.693
Is use multilingual models and combine with
these training objectives to get better at

0:02:17.693 --> 0:02:17.913
it.

0:02:17.913 --> 0:02:18.132
So.

0:02:18.658 --> 0:02:24.396
People are not trying to build bilingual systems
currently for unsupervised arm translation,

0:02:24.396 --> 0:02:30.011
but I think it's good to know how they came
to hear this point and what they're doing now.

0:02:30.090 --> 0:02:34.687
You also see some patterns overlapping which
people are using.

0:02:36.916 --> 0:02:41.642
So as you said before, and you probably hear
it multiple times now is that we have seven

0:02:41.642 --> 0:02:43.076
thousand languages around.

0:02:43.903 --> 0:02:49.460
Can be different dialects in someone, so it's
quite hard to distinguish what's the language,

0:02:49.460 --> 0:02:54.957
but you can typically approximate that seven
thousand and that leads to twenty five million

0:02:54.957 --> 0:02:59.318
pairs, which is the obvious reason why we do
not have any parallel data.

0:03:00.560 --> 0:03:06.386
So you want to build an empty system for all
possible language pests and the question is

0:03:06.386 --> 0:03:07.172
how can we?

0:03:08.648 --> 0:03:13.325
The typical use case, but there are actually
quite few interesting use cases than what you

0:03:13.325 --> 0:03:14.045
would expect.

0:03:14.614 --> 0:03:20.508
One is the animal languages, which is the
real thing that's happening right now with.

0:03:20.780 --> 0:03:26.250
The dog but with dolphins and so on, but I
couldn't find a picture that could show this,

0:03:26.250 --> 0:03:31.659
but if you are interested in stuff like this
you can check out the website where people

0:03:31.659 --> 0:03:34.916
are actually trying to understand how animals
speak.

0:03:35.135 --> 0:03:37.356
It's Also a Bit More About.

0:03:37.297 --> 0:03:44.124
Knowing what the animals want to say but may
not die dead but still people are trying to

0:03:44.124 --> 0:03:44.661
do it.

0:03:45.825 --> 0:03:50.689
More realistic thing that's happening is the
translation of programming languages.

0:03:51.371 --> 0:03:56.963
And so this is quite a quite good scenario
for entrepreneurs and empty is that you have

0:03:56.963 --> 0:04:02.556
a lot of code available online right in C +
+ and in Python and the question is how can

0:04:02.556 --> 0:04:08.402
we translate by just looking at the code alone
and no parallel functions and so on and this

0:04:08.402 --> 0:04:10.754
is actually quite good right now so.

0:04:12.032 --> 0:04:16.111
See how these techniques were applied to do
the programming translation.

0:04:18.258 --> 0:04:23.882
And then you can also think of language as
something that is quite common so you can take

0:04:23.882 --> 0:04:24.194
off.

0:04:24.194 --> 0:04:29.631
Think of formal sentences in English as one
language and informal sentences in English

0:04:29.631 --> 0:04:35.442
as another language and then learn the kind
to stay between them and then it kind of becomes

0:04:35.442 --> 0:04:37.379
a style plan for a problem so.

0:04:38.358 --> 0:04:43.042
Although it's translation, you can consider
different characteristics of a language and

0:04:43.042 --> 0:04:46.875
then separate them as two different languages
and then try to map them.

0:04:46.875 --> 0:04:52.038
So it's not only about languages, but you
can also do quite cool things by using unsophisticated

0:04:52.038 --> 0:04:54.327
techniques, which are quite possible also.

0:04:56.256 --> 0:04:56.990
I am so.

0:04:56.990 --> 0:05:04.335
This is kind of TV modeling for many of the
use cases that we have for ours, ours and MD.

0:05:04.335 --> 0:05:11.842
But before we go into the modeling of these
systems, what I want you to do is look at these

0:05:11.842 --> 0:05:12.413
dummy.

0:05:13.813 --> 0:05:19.720
We have text and language one, text and language
two right, and nobody knows what these languages

0:05:19.720 --> 0:05:20.082
mean.

0:05:20.082 --> 0:05:23.758
They completely are made up right, and the
question is also.

0:05:23.758 --> 0:05:29.364
They're not parallel lines, so the first line
here and the first line is not a line, they're

0:05:29.364 --> 0:05:30.810
just monolingual files.

0:05:32.052 --> 0:05:38.281
And now think about how can you translate
the word M1 from language one to language two,

0:05:38.281 --> 0:05:41.851
and this kind of you see how we try to model
this.

0:05:42.983 --> 0:05:47.966
Would take your time and then think of how
can you translate more into language two?

0:06:41.321 --> 0:06:45.589
About the model, if you ask somebody who doesn't
know anything about machine translation right,

0:06:45.589 --> 0:06:47.411
and then you ask them to translate more.

0:07:01.201 --> 0:07:10.027
But it's also not quite easy if you think
of the way that I made this example is relatively

0:07:10.027 --> 0:07:10.986
easy, so.

0:07:11.431 --> 0:07:17.963
Basically, the first two sentences are these
two: A, B, C is E, and G cured up the U, V

0:07:17.963 --> 0:07:21.841
is L, A, A, C, S, and S, on and this is used
towards the German.

0:07:22.662 --> 0:07:25.241
And then when you join these two words, it's.

0:07:25.205 --> 0:07:32.445
English German the third line and the last
line, and then the fourth line is the first

0:07:32.445 --> 0:07:38.521
line, so German language, English, and then
speak English, speak German.

0:07:38.578 --> 0:07:44.393
So this is how I made made up the example
and what the intuition here is that you assume

0:07:44.393 --> 0:07:50.535
that the languages have a fundamental structure
right and it's the same across all languages.

0:07:51.211 --> 0:07:57.727
Doesn't matter what language you are thinking
of words kind of you have in the same way join

0:07:57.727 --> 0:07:59.829
together is the same way and.

0:07:59.779 --> 0:08:06.065
And plasma sign thinks the same way but this
is not a realistic assumption for sure but

0:08:06.065 --> 0:08:12.636
it's actually a decent one to make and if you
can think of this like if you can assume this

0:08:12.636 --> 0:08:16.207
then we can model systems in an unsupervised
way.

0:08:16.396 --> 0:08:22.743
So this is the intuition that I want to give,
and you can see that whenever assumptions fail,

0:08:22.743 --> 0:08:23.958
the systems fail.

0:08:23.958 --> 0:08:29.832
So in practice whenever we go far away from
these assumptions, the systems try to more

0:08:29.832 --> 0:08:30.778
time to fail.

0:08:33.753 --> 0:08:39.711
So the example that I gave was actually perfect
mapping right, so it never really sticks bad.

0:08:39.711 --> 0:08:45.353
They have the same number of words, same sentence
structure, perfect mapping, and so on.

0:08:45.353 --> 0:08:50.994
This doesn't happen, but let's assume that
this happens and try to see how we can moral.

0:08:53.493 --> 0:09:01.061
Okay, now let's go a bit more formal, so what
you want to do is unsupervise word translation.

0:09:01.901 --> 0:09:08.773
Here the task is that we have input data as
monolingual data, so a bunch of sentences in

0:09:08.773 --> 0:09:15.876
one file and a bunch of sentences another file
in two different languages, and the question

0:09:15.876 --> 0:09:18.655
is how can we get a bilingual word?

0:09:19.559 --> 0:09:25.134
So if you look at the picture you see that
it's just kind of projected down into two dimension

0:09:25.134 --> 0:09:30.358
planes, but it's basically when you map them
into a plot you see that the words that are

0:09:30.358 --> 0:09:35.874
parallel are closer together, and the question
is how can we do it just looking at two files?

0:09:36.816 --> 0:09:42.502
And you can say that what we want to basically
do is create a dictionary in the end given

0:09:42.502 --> 0:09:43.260
two fights.

0:09:43.260 --> 0:09:45.408
So this is the task that we want.

0:09:46.606 --> 0:09:52.262
And the first step on how we do this is to
learn word vectors, and this chicken is whatever

0:09:52.262 --> 0:09:56.257
techniques that you have seen before, but to
work glow or so on.

0:09:56.856 --> 0:10:00.699
So you take a monolingual data and try to
learn word embeddings.

0:10:02.002 --> 0:10:07.675
Then you plot them into a graph, and then
typically what you would see is that they're

0:10:07.675 --> 0:10:08.979
not aligned at all.

0:10:08.979 --> 0:10:14.717
One word space is somewhere, and one word
space is somewhere else, and this is what you

0:10:14.717 --> 0:10:18.043
would typically expect to see in the in the
image.

0:10:19.659 --> 0:10:23.525
Now our assumption was that both lines we
just have the same.

0:10:23.563 --> 0:10:28.520
Culture and so that we can use this information
to learn the mapping between these two spaces.

0:10:30.130 --> 0:10:37.085
So before how we do it, I think this is quite
famous already, and everybody knows it a bit

0:10:37.085 --> 0:10:41.824
more is that we're emitting capture semantic
relations right.

0:10:41.824 --> 0:10:48.244
So the distance between man and woman is approximately
the same as king and prince.

0:10:48.888 --> 0:10:54.620
It's also for world dances, country capital
and so on, so there are some relationships

0:10:54.620 --> 0:11:00.286
happening in the word emmering space, which
is quite clear for at least one language.

0:11:03.143 --> 0:11:08.082
Now if you think of this, let's say of the
English word embryng.

0:11:08.082 --> 0:11:14.769
Let's say of German word embryng and the way
the King Keene Man woman organized is same

0:11:14.769 --> 0:11:17.733
as the German translation of his word.

0:11:17.998 --> 0:11:23.336
This is the main idea is that although they
are somewhere else, the relationship is the

0:11:23.336 --> 0:11:28.008
same between the both languages and we can
use this to to learn the mapping.

0:11:31.811 --> 0:11:35.716
'S not only for these poor words where it
happens for all the words in the language,

0:11:35.716 --> 0:11:37.783
and so we can use this to to learn the math.

0:11:39.179 --> 0:11:43.828
This is the main idea is that both emittings
have a similar shape.

0:11:43.828 --> 0:11:48.477
It's only that they're just not aligned and
so you go to the here.

0:11:48.477 --> 0:11:50.906
They kind of have a similar shape.

0:11:50.906 --> 0:11:57.221
They're just in some different spaces and
what you need to do is to map them into a common

0:11:57.221 --> 0:11:57.707
space.

0:12:06.086 --> 0:12:12.393
The w, such that if it multiplied w with x,
they both become.

0:12:35.335 --> 0:12:41.097
That's true, but there are also many works
that have the relationship right, and we hope

0:12:41.097 --> 0:12:43.817
that this is enough to learn the mapping.

0:12:43.817 --> 0:12:49.838
So there's always going to be a bit of noise,
as in how when we align them they're not going

0:12:49.838 --> 0:12:51.716
to be exactly the same, but.

0:12:51.671 --> 0:12:57.293
What you can expect is that there are these
main works that allow us to learn the mapping,

0:12:57.293 --> 0:13:02.791
so it's not going to be perfect, but it's an
approximation that we make to to see how it

0:13:02.791 --> 0:13:04.521
works and then practice it.

0:13:04.521 --> 0:13:10.081
Also, it's not that the fact that women do
not have any relationship does not affect that

0:13:10.081 --> 0:13:10.452
much.

0:13:10.550 --> 0:13:15.429
A lot of words usually have, so it kind of
works out in practice.

0:13:22.242 --> 0:13:34.248
I have not heard about it, but if you want
to say something about it, I would be interested,

0:13:34.248 --> 0:13:37.346
but we can do it later.

0:13:41.281 --> 0:13:44.133
Usual case: This is supervised.

0:13:45.205 --> 0:13:49.484
First way to do a supervised work translation
where we have a dictionary right and that we

0:13:49.484 --> 0:13:53.764
can use that to learn the mapping, but in our
case we assume that we have nothing right so

0:13:53.764 --> 0:13:55.222
we only have monolingual data.

0:13:56.136 --> 0:14:03.126
Then we need unsupervised planning to figure
out W, and we're going to use guns to to find

0:14:03.126 --> 0:14:06.122
W, and it's quite a nice way to do it.

0:14:08.248 --> 0:14:15.393
So just before I go on how we use it to use
case, I'm going to go briefly on gas right,

0:14:15.393 --> 0:14:19.940
so we have two components: generator and discriminator.

0:14:21.441 --> 0:14:27.052
Gen data tries to generate something obviously,
and the discriminator tries to see if it's

0:14:27.052 --> 0:14:30.752
real data or something that is generated by
the generation.

0:14:31.371 --> 0:14:37.038
And there's like this two player game where
the winner decides to fool and the winner decides

0:14:37.038 --> 0:14:41.862
to market food and they try to build these
two components and try to learn WWE.

0:14:43.483 --> 0:14:53.163
Okay, so let's say we have two languages,
X and Y right, so the X language has N words

0:14:53.163 --> 0:14:56.167
with numbering dimensions.

0:14:56.496 --> 0:14:59.498
So what I'm reading is matrix is peak or something.

0:14:59.498 --> 0:15:02.211
Then we have target language why with m words.

0:15:02.211 --> 0:15:06.944
I'm also the same amount of things I mentioned
and then we have a matrix peak or.

0:15:07.927 --> 0:15:13.784
Basically what you're going to do is use word
to work and learn our word embedded.

0:15:14.995 --> 0:15:23.134
Now we have these X Mrings, Y Mrings, and
what you want to know is W, such that W X and

0:15:23.134 --> 0:15:24.336
Y are align.

0:15:29.209 --> 0:15:35.489
With guns you have two steps, one is a discriminative
step and one is the the mapping step and the

0:15:35.489 --> 0:15:41.135
discriminative step is to see if the embeddings
are from the source or mapped embedding.

0:15:41.135 --> 0:15:44.688
So it's going to be much scary when I go to
the figure.

0:15:46.306 --> 0:15:50.041
So we have a monolingual documents with two
different languages.

0:15:50.041 --> 0:15:54.522
From here we get our source language ambients
target language ambients right.

0:15:54.522 --> 0:15:57.855
Then we randomly initialize the transformation
metrics W.

0:16:00.040 --> 0:16:06.377
Then we have the discriminator which tries
to see if it's WX or Y, so it needs to know

0:16:06.377 --> 0:16:13.735
that this is a mapped one and this is the original
language, and so if you look at the lost function

0:16:13.735 --> 0:16:20.072
here, it's basically that source is one given
WX, so this is from the source language.

0:16:23.543 --> 0:16:27.339
Which means it's the target language em yeah.

0:16:27.339 --> 0:16:34.436
It's just like my figure is not that great,
but you can assume that they are totally.

0:16:40.260 --> 0:16:43.027
So this is the kind of the lost function.

0:16:43.027 --> 0:16:46.386
We have N source words, M target words, and
so on.

0:16:46.386 --> 0:16:52.381
So that's why you have one by M, one by M,
and the discriminator is to just see if they're

0:16:52.381 --> 0:16:55.741
mapped or they're from the original target
number.

0:16:57.317 --> 0:17:04.024
And then we have the mapping step where we
train W to fool the the discriminators.

0:17:04.564 --> 0:17:10.243
So here it's the same way, but what you're
going to just do is inverse the loss function.

0:17:10.243 --> 0:17:15.859
So now we freeze the discriminators, so it's
important to note that in the previous sect

0:17:15.859 --> 0:17:20.843
we freezed the transformation matrix, and here
we freezed your discriminators.

0:17:22.482 --> 0:17:28.912
And now it's to fool the discriminated rights,
so it should predict that the source is zero

0:17:28.912 --> 0:17:35.271
given the map numbering, and the source is
one given the target numbering, which is wrong,

0:17:35.271 --> 0:17:37.787
which is why we're attaining the W.

0:17:39.439 --> 0:17:46.261
Any questions on this okay so then how do
we know when to stop?

0:17:46.261 --> 0:17:55.854
We just train until we reach convergence right
and then we have our W hopefully train and

0:17:55.854 --> 0:17:59.265
map them into an airline space.

0:18:02.222 --> 0:18:07.097
The question is how can we evaluate this mapping?

0:18:07.097 --> 0:18:13.923
Does anybody know what we can use to mapping
or evaluate the mapping?

0:18:13.923 --> 0:18:15.873
How good is a word?

0:18:28.969 --> 0:18:33.538
We use as I said we use a dictionary, at least
in the end.

0:18:33.538 --> 0:18:40.199
We need a dictionary to evaluate, so this
is our only final, so we aren't using it at

0:18:40.199 --> 0:18:42.600
all in attaining data and the.

0:18:43.223 --> 0:18:49.681
Is one is to check what's the position for
our dictionary, just that.

0:18:50.650 --> 0:18:52.813
The first nearest neighbor and see if it's
there on.

0:18:53.573 --> 0:18:56.855
But this is quite strict because there's a
lot of noise in the emitting space right.

0:18:57.657 --> 0:19:03.114
Not always your first neighbor is going to
be the translation, so what people also report

0:19:03.114 --> 0:19:05.055
is precision at file and so on.

0:19:05.055 --> 0:19:10.209
So you take the finerest neighbors and see
if the translation is in there and so on.

0:19:10.209 --> 0:19:15.545
So the more you increase, the more likely
that there is a translation because where I'm

0:19:15.545 --> 0:19:16.697
being quite noisy.

0:19:19.239 --> 0:19:25.924
What's interesting is that people have used
dictionary to to learn word translation, but

0:19:25.924 --> 0:19:32.985
the way of doing this is much better than using
a dictionary, so somehow our assumption helps

0:19:32.985 --> 0:19:36.591
us to to build better than a supervised system.

0:19:39.099 --> 0:19:42.985
So as you see on the top you have a question
at one five ten.

0:19:42.985 --> 0:19:47.309
These are the typical numbers that you report
for world translation.

0:19:48.868 --> 0:19:55.996
But guns are usually quite tricky to to train,
and it does not converge on on language based,

0:19:55.996 --> 0:20:02.820
and this kind of goes back to a assumption
that they kind of behave in the same structure

0:20:02.820 --> 0:20:03.351
right.

0:20:03.351 --> 0:20:07.142
But if you take a language like English and
some.

0:20:07.087 --> 0:20:12.203
Other languages are almost very lotus, so
it's quite different from English and so on.

0:20:12.203 --> 0:20:13.673
Then I've one language,.

0:20:13.673 --> 0:20:18.789
So whenever whenever our assumption fails,
these unsupervised techniques always do not

0:20:18.789 --> 0:20:21.199
converge or just give really bad scores.

0:20:22.162 --> 0:20:27.083
And so the fact is that the monolingual embryons
for distant languages are too far.

0:20:27.083 --> 0:20:30.949
They do not share the same structure, and
so they do not convert.

0:20:32.452 --> 0:20:39.380
And so I just want to mention that there is
a better retrieval technique than the nearest

0:20:39.380 --> 0:20:41.458
neighbor, which is called.

0:20:42.882 --> 0:20:46.975
But it's more advanced than mathematical,
so I didn't want to go in it now.

0:20:46.975 --> 0:20:51.822
But if your interest is in some quite good
retrieval segments, you can just look at these

0:20:51.822 --> 0:20:53.006
if you're interested.

0:20:55.615 --> 0:20:59.241
Okay, so this is about the the word translation.

0:20:59.241 --> 0:21:02.276
Does anybody have any questions of cure?

0:21:06.246 --> 0:21:07.501
Was the worst answer?

0:21:07.501 --> 0:21:12.580
It was a bit easier than a sentence right,
so you just assume that there's a mapping and

0:21:12.580 --> 0:21:14.577
then you try to learn the mapping.

0:21:14.577 --> 0:21:19.656
But now it's a bit more difficult because
you need to jump at stuff also, which is quite

0:21:19.656 --> 0:21:20.797
much more trickier.

0:21:22.622 --> 0:21:28.512
Task here is that we have our input as manually
well data for both languages as before, but

0:21:28.512 --> 0:21:34.017
now what we want to do is instead of translating
word by word we want to do sentence.

0:21:37.377 --> 0:21:44.002
We have word of work now and so on to learn
word amber inks, but sentence amber inks are

0:21:44.002 --> 0:21:50.627
actually not the site powered often, at least
when people try to work on Answer Voice M,

0:21:50.627 --> 0:21:51.445
E, before.

0:21:52.632 --> 0:21:54.008
Now they're a bit okay.

0:21:54.008 --> 0:21:59.054
I mean, as you've seen in the practice on
where we used places, they were quite decent.

0:21:59.054 --> 0:22:03.011
But then it's also the case on which data
it's trained on and so on.

0:22:03.011 --> 0:22:03.240
So.

0:22:04.164 --> 0:22:09.666
Sentence embedings are definitely much more
harder to get than were embedings, so this

0:22:09.666 --> 0:22:13.776
is a bit more complicated than the task that
you've seen before.

0:22:16.476 --> 0:22:18.701
Before we go into how U.

0:22:18.701 --> 0:22:18.968
N.

0:22:18.968 --> 0:22:19.235
M.

0:22:19.235 --> 0:22:19.502
T.

0:22:19.502 --> 0:22:24.485
Works, so this is your typical supervised
system right.

0:22:24.485 --> 0:22:29.558
So we have parallel data source sentence target
centers.

0:22:29.558 --> 0:22:31.160
We have a source.

0:22:31.471 --> 0:22:36.709
We have a target decoder and then we try to
minimize the cross center pillar on this viral

0:22:36.709 --> 0:22:37.054
data.

0:22:37.157 --> 0:22:39.818
And this is how we train our typical system.

0:22:43.583 --> 0:22:49.506
But now we do not have any parallel data,
and so the intuition here is that if we can

0:22:49.506 --> 0:22:55.429
learn language independent representations
at the end quota outputs, then we can pass

0:22:55.429 --> 0:22:58.046
it along to the decoder that we want.

0:22:58.718 --> 0:23:03.809
It's going to get more clear in the future,
but I'm trying to give a bit more intuition

0:23:03.809 --> 0:23:07.164
before I'm going to show you all the planning
objectives.

0:23:08.688 --> 0:23:15.252
So I assume that we have these different encoders
right, so it's not only two, you have a bunch

0:23:15.252 --> 0:23:21.405
of different source language encoders, a bunch
of different target language decoders, and

0:23:21.405 --> 0:23:26.054
also I assume that the encoder is in the same
representation space.

0:23:26.706 --> 0:23:31.932
If you give a sentence in English and the
same sentence in German, the embeddings are

0:23:31.932 --> 0:23:38.313
quite the same, so like the muddling when embeddings
die right, and so then what we can do is, depending

0:23:38.313 --> 0:23:42.202
on the language we want, pass it to the the
appropriate decode.

0:23:42.682 --> 0:23:50.141
And so the kind of goal here is to find out
a way to create language independent representations

0:23:50.141 --> 0:23:52.909
and then pass it to the decodement.

0:23:54.975 --> 0:23:59.714
Just keep in mind that you're trying to do
language independent for some reason, but it's

0:23:59.714 --> 0:24:02.294
going to be more clear once we see how it works.

0:24:05.585 --> 0:24:12.845
So in total we have three objectives that
we're going to try to train in our systems,

0:24:12.845 --> 0:24:16.981
so this is and all of them use monolingual
data.

0:24:17.697 --> 0:24:19.559
So there's no pilot data at all.

0:24:19.559 --> 0:24:24.469
The first one is denoising water encoding,
so it's more like you add noise to noise to

0:24:24.469 --> 0:24:27.403
the sentence, and then they construct the original.

0:24:28.388 --> 0:24:34.276
Then we have the on the flyby translation,
so this is where you take a sentence, generate

0:24:34.276 --> 0:24:39.902
a translation, and then learn the the word
smarting, which I'm going to show pictures

0:24:39.902 --> 0:24:45.725
stated, and then we have an adverse serial
planning to do learn the language independent

0:24:45.725 --> 0:24:46.772
representation.

0:24:47.427 --> 0:24:52.148
So somehow we'll fill in these three tasks
or retain on these three tasks.

0:24:52.148 --> 0:24:54.728
We somehow get an answer to President M.

0:24:54.728 --> 0:24:54.917
T.

0:24:56.856 --> 0:25:02.964
OK, so first we're going to do is denoising
what I'm cutting right, so as I said we add

0:25:02.964 --> 0:25:06.295
noise to the sentence, so we take our sentence.

0:25:06.826 --> 0:25:09.709
And then there are different ways to add noise.

0:25:09.709 --> 0:25:11.511
You can shuffle words around.

0:25:11.511 --> 0:25:12.712
You can drop words.

0:25:12.712 --> 0:25:18.298
Do whatever you want to do as long as there's
enough information to reconstruct the original

0:25:18.298 --> 0:25:18.898
sentence.

0:25:19.719 --> 0:25:25.051
And then we assume that the nicest one and
the original one are parallel data and train

0:25:25.051 --> 0:25:26.687
similar to the supervised.

0:25:28.168 --> 0:25:30.354
So we have a source sentence.

0:25:30.354 --> 0:25:32.540
We have a noisy source right.

0:25:32.540 --> 0:25:37.130
So here what basically happened is that the
word got shuffled.

0:25:37.130 --> 0:25:39.097
One word is dropped right.

0:25:39.097 --> 0:25:41.356
So this was a noise of source.

0:25:41.356 --> 0:25:47.039
And then we treat the noise of source and
source as a sentence bed basically.

0:25:49.009 --> 0:25:53.874
Way retainers optimizing the cross entropy
loss similar to.

0:25:57.978 --> 0:26:03.211
Basically a picture to show what's happening
and we have the nice resources.

0:26:03.163 --> 0:26:09.210
Now is the target and then we have the reconstructed
original source and original tag and since

0:26:09.210 --> 0:26:14.817
the languages are different we have our source
hand coded target and coded source coded.

0:26:17.317 --> 0:26:20.202
And for this task we only need monolingual
data.

0:26:20.202 --> 0:26:25.267
We don't need any pedal data because it's
just taking a sentence and shuffling it and

0:26:25.267 --> 0:26:27.446
reconstructing the the original one.

0:26:28.848 --> 0:26:31.058
And we are four different blocks.

0:26:31.058 --> 0:26:36.841
This is kind of very important to keep in
mind on how we change these connections later.

0:26:41.121 --> 0:26:49.093
Then this is more like the mathematical formulation
where you predict source given the noisy.

0:26:52.492 --> 0:26:55.090
So that was the nursing water encoding.

0:26:55.090 --> 0:26:58.403
The second step is on the flight back translation.

0:26:59.479 --> 0:27:06.386
So what we do is, we put our model inference
mode right, we take a source of sentences,

0:27:06.386 --> 0:27:09.447
and we generate a translation pattern.

0:27:09.829 --> 0:27:18.534
It might be completely wrong or maybe partially
correct or so on, but we assume that the moral

0:27:18.534 --> 0:27:20.091
knows of it and.

0:27:20.680 --> 0:27:25.779
Tend rate: T head right and then what we do
is assume that T head or not assume but T head

0:27:25.779 --> 0:27:27.572
and S are sentence space right.

0:27:27.572 --> 0:27:29.925
That's how we can handle the translation.

0:27:30.530 --> 0:27:38.824
So we train a supervised system on this sentence
bed, so we do inference and then build a reverse

0:27:38.824 --> 0:27:39.924
translation.

0:27:42.442 --> 0:27:49.495
Are both more concrete, so we have a false
sentence right, then we chamber the translation,

0:27:49.495 --> 0:27:55.091
then we give the general translation as an
input and try to predict the.

0:27:58.378 --> 0:28:03.500
This is how we would do in practice right,
so not before the source encoder was connected

0:28:03.500 --> 0:28:08.907
to the source decoder, but now we interchanged
connections, so the source encoder is connected

0:28:08.907 --> 0:28:10.216
to the target decoder.

0:28:10.216 --> 0:28:13.290
The target encoder is turned into the source
decoder.

0:28:13.974 --> 0:28:20.747
And given s we get t-hat and given t we get
s-hat, so this is the first time.

0:28:21.661 --> 0:28:24.022
On the second time step, what you're going
to do is reverse.

0:28:24.664 --> 0:28:32.625
So as that is here, t hat is here, and given
s hat we are trying to predict t, and given

0:28:32.625 --> 0:28:34.503
t hat we are trying.

0:28:36.636 --> 0:28:39.386
Is this clear you have any questions on?

0:28:45.405 --> 0:28:50.823
Bit more mathematically, we try to play the
class, give and take and so it's always the

0:28:50.823 --> 0:28:53.963
supervised NMP technique that we are trying
to do.

0:28:53.963 --> 0:28:59.689
But you're trying to create this synthetic
pass that kind of helpers to build an unsurprised

0:28:59.689 --> 0:29:00.181
system.

0:29:02.362 --> 0:29:08.611
Now also with maybe you can see here is that
if the source encoded and targeted encoded

0:29:08.611 --> 0:29:14.718
the language independent, we can always shuffle
the connections and the translations.

0:29:14.718 --> 0:29:21.252
That's why it was important to find a way
to generate language independent representations.

0:29:21.441 --> 0:29:26.476
And the way we try to force this language
independence is the gan step.

0:29:27.627 --> 0:29:34.851
So the third step kind of combines all of
them is where we try to use gun to make the

0:29:34.851 --> 0:29:37.959
encoded output language independent.

0:29:37.959 --> 0:29:42.831
So here it's the same picture but from a different
paper.

0:29:42.831 --> 0:29:43.167
So.

0:29:43.343 --> 0:29:48.888
We have X-rays, X-ray objects which is monolingual
in data.

0:29:48.888 --> 0:29:50.182
We add noise.

0:29:50.690 --> 0:29:54.736
Then we encode it using the source and the
target encoders right.

0:29:54.736 --> 0:29:58.292
Then we get the latent space Z source and
Z target right.

0:29:58.292 --> 0:30:03.503
Then we decode and try to reconstruct the
original one and this is the auto encoding

0:30:03.503 --> 0:30:08.469
loss which takes the X source which is the
original one and then the translated.

0:30:08.468 --> 0:30:09.834
Predicted output.

0:30:09.834 --> 0:30:16.740
So hello, it always is the auto encoding step
where the gun concern is in the between gang

0:30:16.740 --> 0:30:24.102
cord outputs, and here we have an discriminator
which tries to predict which language the latent

0:30:24.102 --> 0:30:25.241
space is from.

0:30:26.466 --> 0:30:33.782
So given Z source it has to predict that the
representation is from a language source and

0:30:33.782 --> 0:30:39.961
given Z target it has to predict the representation
from a language target.

0:30:40.520 --> 0:30:45.135
And our headquarters are kind of teaching
data right now, and then we have a separate

0:30:45.135 --> 0:30:49.803
network discriminator which tries to predict
which language the Latin spaces are from.

0:30:53.393 --> 0:30:57.611
And then this one is when we combined guns
with the other ongoing step.

0:30:57.611 --> 0:31:02.767
Then we had an on the fly back translation
step right, and so here what we're trying to

0:31:02.767 --> 0:31:03.001
do.

0:31:03.863 --> 0:31:07.260
Is the same, basically just exactly the same.

0:31:07.260 --> 0:31:12.946
But when we are doing the training, we are
at the adversarial laws here, so.

0:31:13.893 --> 0:31:20.762
We take our X source, gender and intermediate
translation, so why target and why source right?

0:31:20.762 --> 0:31:27.342
This is the previous time step, and then we
have to encode the new sentences and basically

0:31:27.342 --> 0:31:32.764
make them language independent or train to
make them language independent.

0:31:33.974 --> 0:31:43.502
And then the hope is that now if we do this
using monolingual data alone we can just switch

0:31:43.502 --> 0:31:47.852
connections and then get our translation.

0:31:47.852 --> 0:31:49.613
So the scale of.

0:31:54.574 --> 0:32:03.749
And so as I said before, guns are quite good
for vision right, so this is kind of like the

0:32:03.749 --> 0:32:11.312
cycle gun approach that you might have seen
in any computer vision course.

0:32:11.911 --> 0:32:19.055
Somehow protect that place at least not as
promising as for merchants, and so people.

0:32:19.055 --> 0:32:23.706
What they did is to enforce this language
independence.

0:32:25.045 --> 0:32:31.226
They try to use a shared encoder instead of
having these different encoders right, and

0:32:31.226 --> 0:32:37.835
so this is basically the same painting objectives
as before, but what you're going to do now

0:32:37.835 --> 0:32:43.874
is learn cross language language and then use
the single encoder for both languages.

0:32:44.104 --> 0:32:49.795
And this kind also forces them to be in the
same space, and then you can choose whichever

0:32:49.795 --> 0:32:50.934
decoder you want.

0:32:52.552 --> 0:32:58.047
You can use guns or you can just use a shared
encoder and type to build your unsupervised

0:32:58.047 --> 0:32:58.779
MTT system.

0:33:08.488 --> 0:33:09.808
These are now the.

0:33:09.808 --> 0:33:15.991
The enhancements that you can do on top of
your unsavoizant system is one you can create

0:33:15.991 --> 0:33:16.686
a shared.

0:33:18.098 --> 0:33:22.358
On top of the shared encoder you can ask are
your guns lost or whatever so there's a lot

0:33:22.358 --> 0:33:22.550
of.

0:33:24.164 --> 0:33:29.726
The other thing that is more relevant right
now is that you can create parallel data by

0:33:29.726 --> 0:33:35.478
word to word translation right because you
know how to do all supervised word translation.

0:33:36.376 --> 0:33:40.548
First step is to create parallel data, assuming
that word translations are quite good.

0:33:41.361 --> 0:33:47.162
And then you claim a supervised and empty
model on these more likely wrong model data,

0:33:47.162 --> 0:33:50.163
but somehow gives you a good starting point.

0:33:50.163 --> 0:33:56.098
So you build your supervised and empty system
on the word translation data, and then you

0:33:56.098 --> 0:33:59.966
initialize it before you're doing unsupervised
and empty.

0:34:00.260 --> 0:34:05.810
And the hope is that when you're doing the
back pain installation, it's a good starting

0:34:05.810 --> 0:34:11.234
point, but it's one technique that you can
do to to improve your anthropoids and the.

0:34:17.097 --> 0:34:25.879
In the previous case we had: The way we know
when to stop was to see comedians on the gun

0:34:25.879 --> 0:34:26.485
training.

0:34:26.485 --> 0:34:28.849
Actually, all we want to do is when W.

0:34:28.849 --> 0:34:32.062
Comedians, which is quite easy to know when
to stop.

0:34:32.062 --> 0:34:37.517
But in a realistic case, we don't have any
parallel data right, so there's no validation.

0:34:37.517 --> 0:34:42.002
Or I mean, we might have test data in the
end, but there's no validation.

0:34:43.703 --> 0:34:48.826
How will we tune our hyper parameters in this
case because it's not really there's nothing

0:34:48.826 --> 0:34:49.445
for us to?

0:34:50.130 --> 0:34:53.326
Or the gold data in a sense like so.

0:34:53.326 --> 0:35:01.187
How do you think we can evaluate such systems
or how can we tune hyper parameters in this?

0:35:11.711 --> 0:35:17.089
So what you're going to do is use the back
translation technique.

0:35:17.089 --> 0:35:24.340
It's like a common technique where you have
nothing okay that is to use back translation

0:35:24.340 --> 0:35:26.947
somehow and what you can do is.

0:35:26.947 --> 0:35:31.673
The main idea is validate on how good the
reconstruction.

0:35:32.152 --> 0:35:37.534
So the idea is that if you have a good system
then the intermediate translation is quite

0:35:37.534 --> 0:35:39.287
good and going back is easy.

0:35:39.287 --> 0:35:44.669
But if it's just noise that you generate in
the forward step then it's really hard to go

0:35:44.669 --> 0:35:46.967
back, which is kind of the main idea.

0:35:48.148 --> 0:35:53.706
So the way it works is that we take a source
sentence, we generate a translation in target

0:35:53.706 --> 0:35:59.082
language right, and then again can state the
generated sentence and compare it with the

0:35:59.082 --> 0:36:01.342
original one, and if they're closer.

0:36:01.841 --> 0:36:09.745
It means that we have a good system, and if
they are far this is kind of like an unsupervised

0:36:09.745 --> 0:36:10.334
grade.

0:36:17.397 --> 0:36:21.863
As far as the amount of data that you need.

0:36:23.083 --> 0:36:27.995
This was like the first initial resistance
on on these systems is that you had.

0:36:27.995 --> 0:36:32.108
They wanted to do English and French and they
had fifteen million.

0:36:32.108 --> 0:36:38.003
There was fifteen million more linguist sentences
so it's quite a lot and they were able to get

0:36:38.003 --> 0:36:40.581
thirty two blue on these kinds of setups.

0:36:41.721 --> 0:36:47.580
But unsurprisingly if you have zero point
one million pilot sentences you get the same

0:36:47.580 --> 0:36:48.455
performance.

0:36:48.748 --> 0:36:50.357
So it's a lot of training.

0:36:50.357 --> 0:36:55.960
It's a lot of monolingual data, but monolingual
data is relatively easy to obtain is the fact

0:36:55.960 --> 0:37:01.264
that the training is also quite longer than
the supervised system, but it's unsupervised

0:37:01.264 --> 0:37:04.303
so it's kind of the trade off that you are
making.

0:37:07.367 --> 0:37:13.101
The other thing to note is that it's English
and French, which is very close to our exemptions.

0:37:13.101 --> 0:37:18.237
Also, the monolingual data that they took
are kind of from similar domains and so on.

0:37:18.638 --> 0:37:27.564
So that's why they're able to build such a
good system, but you'll see later that it fails.

0:37:36.256 --> 0:37:46.888
Voice, and so mean what people usually do
is first build a system right using whatever

0:37:46.888 --> 0:37:48.110
parallel.

0:37:48.608 --> 0:37:55.864
Then they use monolingual data and do back
translation, so this is always being the standard

0:37:55.864 --> 0:38:04.478
way to to improve, and what people have seen
is that: You don't even need zero point one

0:38:04.478 --> 0:38:05.360
million right.

0:38:05.360 --> 0:38:10.706
You just need like ten thousand or so on and
then you do the monolingual back time station

0:38:10.706 --> 0:38:12.175
and you're still better.

0:38:12.175 --> 0:38:13.291
The answer is why.

0:38:13.833 --> 0:38:19.534
The question is it's really worth trying to
to do this or maybe it's always better to find

0:38:19.534 --> 0:38:20.787
some parallel data.

0:38:20.787 --> 0:38:26.113
I'll expand a bit of money on getting few
parallel data and then use it to start and

0:38:26.113 --> 0:38:27.804
find to build your system.

0:38:27.804 --> 0:38:33.756
So it was kind of the understanding that billing
wool and spoiled systems are not that really.

0:38:50.710 --> 0:38:54.347
The thing is that with unlabeled data.

0:38:57.297 --> 0:39:05.488
Not in an obtaining signal, so when we are
starting basically what we want to do is first

0:39:05.488 --> 0:39:13.224
get a good translation system and then use
an unlabeled monolingual data to improve.

0:39:13.613 --> 0:39:15.015
But if you start from U.

0:39:15.015 --> 0:39:15.183
N.

0:39:15.183 --> 0:39:20.396
Empty our model might be really bad like it
would be somewhere translating completely wrong.

0:39:20.760 --> 0:39:26.721
And then when you find your unlabeled data,
it basically might be harming, or maybe the

0:39:26.721 --> 0:39:28.685
same as supervised applause.

0:39:28.685 --> 0:39:35.322
So the thing is, I hope, by fine tuning on
labeled data as first is to get a good initialization.

0:39:35.835 --> 0:39:38.404
And then use the unsupervised techniques to
get better.

0:39:38.818 --> 0:39:42.385
But if your starting point is really bad then
it's not.

0:39:45.185 --> 0:39:47.324
Year so as we said before.

0:39:47.324 --> 0:39:52.475
This is kind of like the self supervised training
usually works.

0:39:52.475 --> 0:39:54.773
First we have parallel data.

0:39:56.456 --> 0:39:58.062
Source language is X.

0:39:58.062 --> 0:39:59.668
Target language is Y.

0:39:59.668 --> 0:40:06.018
In the end we want a system that does X to
Y, not Y to X, but first we want to train a

0:40:06.018 --> 0:40:10.543
backward model as it is Y to X, so target language
to source.

0:40:11.691 --> 0:40:17.353
Then we take our moonlighting will target
sentences, use our backward model to generate

0:40:17.353 --> 0:40:21.471
synthetic source, and then we join them with
our original data.

0:40:21.471 --> 0:40:27.583
So now we have this noisy input, but always
the gold output, which is kind of really important

0:40:27.583 --> 0:40:29.513
when you're doing backpaints.

0:40:30.410 --> 0:40:36.992
And then you can coordinate these big data
and then you can train your X to Y cholesterol

0:40:36.992 --> 0:40:44.159
system and then you can always do this in multiple
steps and usually three, four steps which kind

0:40:44.159 --> 0:40:48.401
of improves always and then finally get your
best system.

0:40:49.029 --> 0:40:54.844
The point that I'm trying to make is that
although answers and MPs the scores that I've

0:40:54.844 --> 0:41:00.659
shown before were quite good, you probably
can get the same performance with with fifty

0:41:00.659 --> 0:41:06.474
thousand sentences, and also the languages
that they've shown are quite similar and the

0:41:06.474 --> 0:41:08.654
texts were from the same domain.

0:41:14.354 --> 0:41:21.494
So any questions on u n m t ok yeah.

0:41:22.322 --> 0:41:28.982
So after this fact that temperature was already
better than than empty, what people have tried

0:41:28.982 --> 0:41:34.660
is to use this idea of multilinguality as you
have seen in the previous lecture.

0:41:34.660 --> 0:41:41.040
The question is how can we do this knowledge
transfer from high resource language to lower

0:41:41.040 --> 0:41:42.232
source language?

0:41:44.484 --> 0:41:51.074
One way to promote this language independent
representations is to share the encoder and

0:41:51.074 --> 0:41:57.960
decoder for all languages, all their available
languages, and that kind of hopefully enables

0:41:57.960 --> 0:42:00.034
the the knowledge transfer.

0:42:03.323 --> 0:42:08.605
When we're doing multilinguality, the two
questions we need to to think of is how does

0:42:08.605 --> 0:42:09.698
the encoder know?

0:42:09.698 --> 0:42:14.495
How does the encoder encoder know which language
that we're dealing with that?

0:42:15.635 --> 0:42:20.715
You already might have known the answer also,
and the second question is how can we promote

0:42:20.715 --> 0:42:24.139
the encoder to generate language independent
representations?

0:42:25.045 --> 0:42:32.580
By solving these two problems we can take
help of high resource languages to do unsupervised

0:42:32.580 --> 0:42:33.714
translations.

0:42:34.134 --> 0:42:40.997
Typical example would be you want to do unsurpressed
between English and Dutch right, but you are

0:42:40.997 --> 0:42:47.369
parallel data between English and German, so
the question is can we use this parallel data

0:42:47.369 --> 0:42:51.501
to help building an unsurpressed betweenEnglish
and Dutch?

0:42:56.296 --> 0:43:01.240
For the first one we try to take help of language
embeddings for tokens, and this kind of is

0:43:01.240 --> 0:43:05.758
a straightforward way to know to tell them
well which language they're dealing with.

0:43:06.466 --> 0:43:11.993
And for the second one we're going to look
at some pre training objectives which are also

0:43:11.993 --> 0:43:17.703
kind of unsupervised so we need monolingual
data mostly and this kind of helps us to promote

0:43:17.703 --> 0:43:20.221
the language independent representation.

0:43:23.463 --> 0:43:29.954
So the first three things more that we'll
look at is excel, which is quite famous if

0:43:29.954 --> 0:43:32.168
you haven't heard of it yet.

0:43:32.552 --> 0:43:40.577
And: The way it works is that it's basically
a transformer encoder right, so it's like the

0:43:40.577 --> 0:43:42.391
just the encoder module.

0:43:42.391 --> 0:43:44.496
No, there's no decoder here.

0:43:44.884 --> 0:43:51.481
And what we're trying to do is mask two tokens
in a sequence and try to predict these mask

0:43:51.481 --> 0:43:52.061
tokens.

0:43:52.061 --> 0:43:55.467
So I quickly called us mask language modeling.

0:43:55.996 --> 0:44:05.419
Typical language modeling that you see is
the Danish language modeling where you predict

0:44:05.419 --> 0:44:08.278
the next token in English.

0:44:08.278 --> 0:44:11.136
Then we have the position.

0:44:11.871 --> 0:44:18.774
Then we have the token embellings, and then
here we have the mass token, and then we have

0:44:18.774 --> 0:44:22.378
the transformer encoder blocks to predict the.

0:44:24.344 --> 0:44:30.552
To do this for all languages using the same
tang somewhere encoded and this kind of helps

0:44:30.552 --> 0:44:36.760
us to push the the sentence and bearings or
the output of the encoded into a common space

0:44:36.760 --> 0:44:37.726
per multiple.

0:44:42.782 --> 0:44:49.294
So first we train an MLM on both source, both
source and target language sites, and then

0:44:49.294 --> 0:44:54.928
we use it as a starting point for the encoded
and decoded for a UNMP system.

0:44:55.475 --> 0:45:03.175
So we take a monolingual data, build a mass
language model on both source and target languages,

0:45:03.175 --> 0:45:07.346
and then read it to be or initialize that in
the U.

0:45:07.346 --> 0:45:07.586
N.

0:45:07.586 --> 0:45:07.827
P.

0:45:07.827 --> 0:45:08.068
C.

0:45:09.009 --> 0:45:14.629
Here we look at two languages, but you can
also do it with one hundred languages once.

0:45:14.629 --> 0:45:20.185
So they're retain checkpoints that you can
use, which are quite which have seen quite

0:45:20.185 --> 0:45:21.671
a lot of data and use.

0:45:21.671 --> 0:45:24.449
It always has a starting point for your U.

0:45:24.449 --> 0:45:24.643
N.

0:45:24.643 --> 0:45:27.291
MP system, which in practice works well.

0:45:31.491 --> 0:45:36.759
This detail is that since this is an encoder
block only, and your U.

0:45:36.759 --> 0:45:36.988
N.

0:45:36.988 --> 0:45:37.217
M.

0:45:37.217 --> 0:45:37.446
T.

0:45:37.446 --> 0:45:40.347
System is encodered, decodered right.

0:45:40.347 --> 0:45:47.524
So there's this cross attention that's missing,
but you can always branch like that randomly.

0:45:47.524 --> 0:45:48.364
It's fine.

0:45:48.508 --> 0:45:53.077
Not everything is initialized, but it's still
decent.

0:45:56.056 --> 0:46:02.141
Then we have the other one is M by plane,
and here you see that this kind of builds on

0:46:02.141 --> 0:46:07.597
the the unsupervised training objector, which
is the realizing auto encoding.

0:46:08.128 --> 0:46:14.337
So what they do is they say that we don't
even need to do the gun outback translation,

0:46:14.337 --> 0:46:17.406
but you can do it later, but pre training.

0:46:17.406 --> 0:46:24.258
We just do do doing doing doing water inputting
on all different languages, and that also gives

0:46:24.258 --> 0:46:32.660
you: Out of the box good performance, so what
we basically have here is the transformer encoded.

0:46:34.334 --> 0:46:37.726
You are trying to generate a reconstructed
sequence.

0:46:37.726 --> 0:46:38.942
You need a tickle.

0:46:39.899 --> 0:46:42.022
So we gave an input sentence.

0:46:42.022 --> 0:46:48.180
We tried to predict the masked tokens from
the or we tried to reconstruct the original

0:46:48.180 --> 0:46:52.496
sentence from the input segments, which was
corrupted right.

0:46:52.496 --> 0:46:57.167
So this is the same denoting objective that
you have seen before.

0:46:58.418 --> 0:46:59.737
This is for English.

0:46:59.737 --> 0:47:04.195
I think this is for Japanese and then once
we do it for all languages.

0:47:04.195 --> 0:47:09.596
I mean they have this difference on twenty
five, fifty or so on and then you can find

0:47:09.596 --> 0:47:11.794
you on your sentence and document.

0:47:13.073 --> 0:47:20.454
And so what they is this for the supervised
techniques, but you can also use this as initializations

0:47:20.454 --> 0:47:25.058
for unsupervised buildup on that which also
in practice works.

0:47:30.790 --> 0:47:36.136
Then we have these, so still now we kind of
didn't see the the states benefit from the

0:47:36.136 --> 0:47:38.840
high resource language right, so as I said.

0:47:38.878 --> 0:47:44.994
Why you can use English as something for English
to Dutch, and if you want a new Catalan, you

0:47:44.994 --> 0:47:46.751
can use English to French.

0:47:48.408 --> 0:47:55.866
One typical way to do this is to use favorite
translation lights or you take the.

0:47:55.795 --> 0:48:01.114
So here it's finished two weeks so you take
your time say from finish to English English

0:48:01.114 --> 0:48:03.743
two weeks and then you get the translation.

0:48:04.344 --> 0:48:10.094
What's important is that you have these different
techniques and you can always think of which

0:48:10.094 --> 0:48:12.333
one to use given the data situation.

0:48:12.333 --> 0:48:18.023
So if it was like finish to Greek maybe it's
pivotal better because you might get good finish

0:48:18.023 --> 0:48:20.020
to English and English to Greek.

0:48:20.860 --> 0:48:23.255
Sometimes it also depends on the language
pair.

0:48:23.255 --> 0:48:27.595
There might be some information loss and so
on, so there are quite a few variables you

0:48:27.595 --> 0:48:30.039
need to think of and decide which system to
use.

0:48:32.752 --> 0:48:39.654
Then there's a zero shot, which probably also
I've seen in the multilingual course, and how

0:48:39.654 --> 0:48:45.505
if you can improve the language independence
then your zero shot gets better.

0:48:45.505 --> 0:48:52.107
So maybe if you use the multilingual models
and do zero shot directly, it's quite good.

0:48:53.093 --> 0:48:58.524
Thought we have zero shots per word, and then
we have the answer to voice translation where

0:48:58.524 --> 0:49:00.059
we can calculate between.

0:49:00.600 --> 0:49:02.762
Just when there is no battle today.

0:49:06.686 --> 0:49:07.565
Is to solve.

0:49:07.565 --> 0:49:11.959
So sometimes what we have seen so far is that
we basically have.

0:49:15.255 --> 0:49:16.754
To do from looking at it.

0:49:16.836 --> 0:49:19.307
These two files alone you can create a dictionary.

0:49:19.699 --> 0:49:26.773
Can build an unsupervised entry system, not
always, but if the domains are similar in the

0:49:26.773 --> 0:49:28.895
languages, that's similar.

0:49:28.895 --> 0:49:36.283
But if there are distant languages, then the
unsupervised texting doesn't usually work really

0:49:36.283 --> 0:49:36.755
well.

0:49:37.617 --> 0:49:40.297
What um.

0:49:40.720 --> 0:49:46.338
Would be is that if you can get some paddle
data from somewhere or do bitex mining that

0:49:46.338 --> 0:49:51.892
we have seen in the in the laser practicum
then you can use that as to initialize your

0:49:51.892 --> 0:49:57.829
system and then try and accept a semi supervised
energy system and that would be better than

0:49:57.829 --> 0:50:00.063
just building an unsupervised and.

0:50:00.820 --> 0:50:06.546
With that as the end.

0:50:07.207 --> 0:50:08.797
Quickly could be.

0:50:16.236 --> 0:50:25.070
In common, they can catch the worst because
the thing about finding a language is: And

0:50:25.070 --> 0:50:34.874
there's another joy in playing these games,
almost in the middle of a game, and she's a

0:50:34.874 --> 0:50:40.111
characteristic too, and she is a global waver.

0:50:56.916 --> 0:51:03.798
Next talk inside and this somehow gives them
many abilities, not only translation but other

0:51:03.798 --> 0:51:08.062
than that there are quite a few things that
they can do.

0:51:10.590 --> 0:51:17.706
But the translation in itself usually doesn't
really work really well if you build a system

0:51:17.706 --> 0:51:20.878
from your specific system for your case.

0:51:22.162 --> 0:51:27.924
I would guess that it's usually better than
the LLM, but you can always adapt the LLM to

0:51:27.924 --> 0:51:31.355
the task that you want, and then it could be
better.

0:51:32.152 --> 0:51:37.849
A little amount of the box might not be the
best choice for your task force.

0:51:37.849 --> 0:51:44.138
For me, I'm working on new air translation,
so it's more about translating software.

0:51:45.065 --> 0:51:50.451
And it's quite often each domain as well,
and if use the LLM out of the box, they're

0:51:50.451 --> 0:51:53.937
actually quite bad compared to the systems
that built.

0:51:54.414 --> 0:51:56.736
But you can do these different techniques
like prompting.

0:51:57.437 --> 0:52:03.442
This is what people usually do is heart prompting
where they give similar translation pairs in

0:52:03.442 --> 0:52:08.941
the prompt and then ask it to translate and
then that kind of improves the performance

0:52:08.941 --> 0:52:09.383
a lot.

0:52:09.383 --> 0:52:15.135
So there are different techniques that you
can do to adapt your eye lens and then it might

0:52:15.135 --> 0:52:16.399
be better than the.

0:52:16.376 --> 0:52:17.742
Task a fixed system.

0:52:18.418 --> 0:52:22.857
But if you're looking for niche things, I
don't think error limbs are that good.

0:52:22.857 --> 0:52:26.309
But if you want to do to do, let's say, unplugged
translation.

0:52:26.309 --> 0:52:30.036
In this case you can never be sure that they
haven't seen the data.

0:52:30.036 --> 0:52:35.077
First of all is that if you see the data in
that language or not, and if they're panthetic,

0:52:35.077 --> 0:52:36.831
they probably did see the data.

0:52:40.360 --> 0:53:00.276
I feel like they have pretty good understanding
of each million people.

0:53:04.784 --> 0:53:09.059
Depends on the language, but I'm pretty surprised
that it works on a lotus language.

0:53:09.059 --> 0:53:11.121
I would expect it to work on German and.

0:53:11.972 --> 0:53:13.633
But if you take a lot of first language,.

0:53:14.474 --> 0:53:20.973
Don't think it works, and also there are quite
a few papers where they've already showed that

0:53:20.973 --> 0:53:27.610
if you build a system yourself or build a typical
way to build a system, it's quite better than

0:53:27.610 --> 0:53:29.338
the bit better than the.

0:53:29.549 --> 0:53:34.883
But you can always do things with limbs to
get better, but then I'm probably.

0:53:37.557 --> 0:53:39.539
Anymore.

0:53:41.421 --> 0:53:47.461
So if not then we're going to end the lecture
here and then on Thursday we're going to have

0:53:47.461 --> 0:53:51.597
documented empty which is also run by me so
thanks for coming.