File size: 73,664 Bytes
cb71ef5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
2463
2464
2465
2466
2467
2468
2469
2470
2471
2472
2473
2474
2475
2476
2477
2478
2479
2480
2481
2482
2483
2484
2485
2486
2487
2488
2489
2490
2491
2492
2493
2494
2495
2496
2497
2498
2499
2500
2501
2502
2503
2504
2505
2506
2507
2508
2509
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
2524
2525
2526
2527
2528
2529
2530
2531
2532
2533
2534
2535
2536
2537
2538
2539
2540
2541
2542
2543
2544
2545
2546
2547
2548
2549
2550
2551
2552
2553
2554
2555
2556
2557
2558
2559
2560
2561
2562
2563
2564
2565
2566
2567
2568
2569
2570
2571
2572
2573
2574
2575
2576
2577
2578
2579
2580
2581
2582
2583
2584
2585
2586
2587
2588
2589
2590
2591
2592
2593
2594
2595
2596
2597
2598
2599
2600
2601
2602
2603
2604
2605
2606
2607
2608
2609
2610
2611
2612
2613
2614
2615
2616
2617
2618
2619
2620
2621
2622
2623
2624
2625
2626
2627
2628
2629
2630
2631
2632
2633
2634
2635
2636
2637
2638
2639
2640
2641
2642
2643
2644
2645
2646
2647
2648
2649
2650
2651
2652
2653
2654
2655
2656
2657
2658
2659
2660
2661
2662
2663
2664
2665
2666
2667
2668
2669
2670
2671
2672
2673
2674
2675
2676
2677
2678
2679
2680
2681
2682
2683
2684
2685
2686
2687
2688
2689
2690
2691
2692
2693
2694
2695
2696
2697
2698
2699
2700
2701
2702
2703
2704
2705
2706
2707
2708
2709
2710
2711
2712
2713
2714
2715
2716
2717
2718
2719
2720
2721
2722
2723
2724
2725
2726
2727
2728
2729
2730
2731
2732
2733
2734
2735
2736
2737
2738
2739
2740
2741
2742
2743
2744
2745
2746
2747
2748
2749
2750
2751
2752
2753
2754
2755
2756
2757
2758
2759
2760
2761
2762
2763
2764
2765
2766
2767
2768
2769
2770
2771
2772
2773
2774
2775
2776
2777
2778
2779
2780
2781
2782
2783
2784
2785
2786
2787
2788
2789
2790
2791
2792
2793
2794
2795
2796
2797
2798
2799
2800
2801
2802
2803
2804
2805
2806
2807
2808
2809
2810
2811
2812
2813
2814
2815
2816
2817
2818
2819
2820
2821
2822
2823
2824
2825
2826
2827
2828
2829
2830
2831
2832
2833
2834
2835
2836
2837
2838
2839
2840
2841
2842
2843
2844
2845
2846
2847
2848
2849
2850
2851
2852
2853
2854
2855
2856
2857
2858
2859
2860
2861
2862
2863
2864
2865
2866
2867
2868
2869
2870
2871
2872
2873
2874
2875
2876
2877
2878
2879
2880
2881
2882
2883
2884
2885
2886
2887
2888
2889
2890
2891
2892
2893
2894
2895
2896
2897
2898
2899
2900
2901
2902
2903
2904
2905
2906
2907
2908
2909
2910
2911
2912
2913
2914
2915
2916
2917
2918
2919
2920
2921
2922
2923
2924
2925
2926
2927
2928
2929
2930
2931
2932
2933
2934
2935
2936
2937
2938
2939
2940
2941
2942
2943
2944
2945
2946
2947
2948
2949
2950
2951
2952
2953
2954
2955
2956
2957
2958
2959
2960
2961
2962
2963
2964
2965
2966
2967
2968
2969
2970
2971
WEBVTT

0:00:01.721 --> 0:00:08.584
Hey, then welcome to today's lecture on language
modeling.

0:00:09.409 --> 0:00:21.608
We had not a different view on machine translation,
which was the evaluation path it's important

0:00:21.608 --> 0:00:24.249
to evaluate and see.

0:00:24.664 --> 0:00:33.186
We want to continue with building the MT system
and this will be the last part before we are

0:00:33.186 --> 0:00:36.668
going into a neural step on Thursday.

0:00:37.017 --> 0:00:45.478
So we had the the broader view on statistical
machine translation and the.

0:00:45.385 --> 0:00:52.977
Thursday: A week ago we talked about the statistical
machine translation and mainly the translation

0:00:52.977 --> 0:00:59.355
model, so how we model how probable is it that
one word is translated into another.

0:01:00.800 --> 0:01:15.583
However, there is another component when doing
generation tasks in general and machine translation.

0:01:16.016 --> 0:01:23.797
There are several characteristics which you
only need to model on the target side in the

0:01:23.797 --> 0:01:31.754
traditional approach where we talked about
the generation from more semantic or synthectic

0:01:31.754 --> 0:01:34.902
representation into the real world.

0:01:35.555 --> 0:01:51.013
And the challenge is that there's some constructs
which are only there in the target language.

0:01:52.132 --> 0:01:57.908
You cannot really get that translation, but
it's more something that needs to model on

0:01:57.908 --> 0:01:58.704
the target.

0:01:59.359 --> 0:02:05.742
And this is done typically by a language model
and this concept of language model.

0:02:06.326 --> 0:02:11.057
Guess you can assume nowadays very important.

0:02:11.057 --> 0:02:20.416
You've read a lot about large language models
recently and they are all somehow trained or

0:02:20.416 --> 0:02:22.164
the idea behind.

0:02:25.986 --> 0:02:41.802
What we'll look today at if get the next night
and look what a language model is and today's

0:02:41.802 --> 0:02:42.992
focus.

0:02:43.363 --> 0:02:49.188
This was the common approach to the language
model for twenty or thirty years, so a lot

0:02:49.188 --> 0:02:52.101
of time it was really the state of the art.

0:02:52.101 --> 0:02:58.124
And people have used that in many applications
in machine translation and automatic speech

0:02:58.124 --> 0:02:58.985
recognition.

0:02:59.879 --> 0:03:11.607
Again you are measuring the performance, but
this is purely the performance of the language

0:03:11.607 --> 0:03:12.499
model.

0:03:13.033 --> 0:03:23.137
And then we will see that the traditional
language will have a major drawback in how

0:03:23.137 --> 0:03:24.683
we can deal.

0:03:24.944 --> 0:03:32.422
So if you model language you will see that
in most of the sentences and you have not really

0:03:32.422 --> 0:03:39.981
seen and you're still able to assess if this
is good language or if this is native language.

0:03:40.620 --> 0:03:45.092
And this is challenging if you do just like
parameter estimation.

0:03:45.605 --> 0:03:59.277
We are using two different techniques to do:
interpolation, and these are essentially in

0:03:59.277 --> 0:04:01.735
order to build.

0:04:01.881 --> 0:04:11.941
It also motivates why things might be easier
if we are going into neural morals as we will.

0:04:12.312 --> 0:04:18.203
And at the end we'll talk a bit about some
additional type of language models which are

0:04:18.203 --> 0:04:18.605
also.

0:04:20.440 --> 0:04:29.459
So where our language was used, or how are
they used in the machine translations?

0:04:30.010 --> 0:04:38.513
So the idea of a language model is that we
are modeling what is the fluency of language.

0:04:38.898 --> 0:04:49.381
So if you have, for example, sentence will,
then you can estimate that there are some words:

0:04:49.669 --> 0:05:08.929
For example, the next word is valid, but will
card's words not?

0:05:09.069 --> 0:05:13.673
And we can do that.

0:05:13.673 --> 0:05:22.192
We have seen that the noise channel.

0:05:22.322 --> 0:05:33.991
That we have seen someone two weeks ago, and
today we will look into how can we model P

0:05:33.991 --> 0:05:36.909
of Y or how possible.

0:05:37.177 --> 0:05:44.192
Now this is completely independent of the
translation process.

0:05:44.192 --> 0:05:49.761
How fluent is a sentence and how you can express?

0:05:51.591 --> 0:06:01.699
And this language model task has one really
big advantage and assume that is even the big

0:06:01.699 --> 0:06:02.935
advantage.

0:06:03.663 --> 0:06:16.345
The big advantage is the data we need to train
that so normally we are doing supervised learning.

0:06:16.876 --> 0:06:20.206
So machine translation will talk about.

0:06:20.206 --> 0:06:24.867
That means we have the source center and target
center.

0:06:25.005 --> 0:06:27.620
They need to be aligned.

0:06:27.620 --> 0:06:31.386
We look into how we can model them.

0:06:31.386 --> 0:06:39.270
Generally, the problem with this is that:
Machine translation: You still have the advantage

0:06:39.270 --> 0:06:45.697
that there's quite huge amounts of this data
for many languages, not all but many, but other

0:06:45.697 --> 0:06:47.701
classes even more difficult.

0:06:47.701 --> 0:06:50.879
There's very few data where you have summary.

0:06:51.871 --> 0:07:02.185
So the big advantage of language model is
we're only modeling the centers, so we only

0:07:02.185 --> 0:07:04.103
need pure text.

0:07:04.584 --> 0:07:11.286
And pure text, especially since we have the
Internet face melting large amounts of text.

0:07:11.331 --> 0:07:17.886
Of course, it's still, it's still maybe only
for some domains, some type.

0:07:18.198 --> 0:07:23.466
Want to have data for speech about machine
translation.

0:07:23.466 --> 0:07:27.040
Maybe there's only limited data that.

0:07:27.027 --> 0:07:40.030
There's always and also you go to some more
exotic languages and then you will have less

0:07:40.030 --> 0:07:40.906
data.

0:07:41.181 --> 0:07:46.803
And in language once we can now look, how
can we make use of these data?

0:07:47.187 --> 0:07:54.326
And: Nowadays this is often also framed as
self supervised learning because on the one

0:07:54.326 --> 0:08:00.900
hand here we'll see it's a time of classification
cast or supervised learning but we create some

0:08:00.900 --> 0:08:02.730
other data science itself.

0:08:02.742 --> 0:08:13.922
So it's not that we have this pair of data
text and labels, but we have only the text.

0:08:15.515 --> 0:08:21.367
So the question is how can we use this modeling
data and how can we train our language?

0:08:22.302 --> 0:08:35.086
The main goal is to produce fluent English,
so we want to somehow model that something

0:08:35.086 --> 0:08:38.024
is a sentence of a.

0:08:38.298 --> 0:08:44.897
So there is no clear separation about semantics
and syntax, but in this case it is not about

0:08:44.897 --> 0:08:46.317
a clear separation.

0:08:46.746 --> 0:08:50.751
So we will monitor them somehow in there.

0:08:50.751 --> 0:08:56.091
There will be some notion of semantics, some
notion of.

0:08:56.076 --> 0:09:08.748
Because you say you want to water how fluid
or probable is that the native speaker is producing

0:09:08.748 --> 0:09:12.444
that because of the one in.

0:09:12.512 --> 0:09:17.711
We are rarely talking like things that are
semantically wrong, and therefore there is

0:09:17.711 --> 0:09:18.679
also some type.

0:09:19.399 --> 0:09:24.048
So, for example, the house is small.

0:09:24.048 --> 0:09:30.455
It should be a higher stability than the house
is.

0:09:31.251 --> 0:09:38.112
Because home and house are both meaning German,
they are used differently.

0:09:38.112 --> 0:09:43.234
For example, it should be more probable that
the plane.

0:09:44.444 --> 0:09:51.408
So this is both synthetically correct, but
cementically not.

0:09:51.408 --> 0:09:58.372
But still you will see much more often the
probability that.

0:10:03.883 --> 0:10:14.315
So more formally, it's about like the language
should be some type of function, and it gives

0:10:14.315 --> 0:10:18.690
us the probability that this sentence.

0:10:19.519 --> 0:10:27.312
Indicating that this is good English or more
generally English, of course you can do that.

0:10:28.448 --> 0:10:37.609
And earlier times people have even done try
to do that deterministic that was especially

0:10:37.609 --> 0:10:40.903
used for more dialogue systems.

0:10:40.840 --> 0:10:50.660
You have a very strict syntax so you can only
use like turn off the, turn off the radio.

0:10:50.690 --> 0:10:56.928
Something else, but you have a very strict
deterministic finance state grammar like which

0:10:56.928 --> 0:10:58.107
type of phrases.

0:10:58.218 --> 0:11:04.791
The problem of course if we're dealing with
language is that language is variable, we're

0:11:04.791 --> 0:11:10.183
not always talking correct sentences, and so
this type of deterministic.

0:11:10.650 --> 0:11:22.121
That's why for already many, many years people
look into statistical language models and try

0:11:22.121 --> 0:11:24.587
to model something.

0:11:24.924 --> 0:11:35.096
So something like what is the probability
of the sequences of to, and that is what.

0:11:35.495 --> 0:11:43.076
The advantage of doing it statistically is
that we can train large text databases so we

0:11:43.076 --> 0:11:44.454
can train them.

0:11:44.454 --> 0:11:52.380
We don't have to define it and most of these
cases we don't want to have the hard decision.

0:11:52.380 --> 0:11:55.481
This is a sentence of the language.

0:11:55.815 --> 0:11:57.914
Why we want to have some type of probability?

0:11:57.914 --> 0:11:59.785
How probable is this part of the center?

0:12:00.560 --> 0:12:04.175
Because yeah, even for a few minutes, it's
not always clear.

0:12:04.175 --> 0:12:06.782
Is this a sentence that you can use or not?

0:12:06.782 --> 0:12:12.174
I mean, I just in this presentation gave several
sentences, which are not correct English.

0:12:12.174 --> 0:12:17.744
So it might still happen that people speak
sentences or write sentences that I'm not correct,

0:12:17.744 --> 0:12:19.758
and you want to deal with all of.

0:12:20.020 --> 0:12:25.064
So that is then, of course, a big advantage
if you use your more statistical models.

0:12:25.705 --> 0:12:35.810
The disadvantage is that you need a subtitle
of large text databases which might exist from

0:12:35.810 --> 0:12:37.567
many languages.

0:12:37.857 --> 0:12:46.511
Nowadays you see that there is of course issues
that you need large computational resources

0:12:46.511 --> 0:12:47.827
to deal with.

0:12:47.827 --> 0:12:56.198
You need to collect all these crawlers on
the internet which can create enormous amounts

0:12:56.198 --> 0:12:57.891
of training data.

0:12:58.999 --> 0:13:08.224
So if we want to build this then the question
is of course how can we estimate the probability?

0:13:08.448 --> 0:13:10.986
So how probable is the sentence good morning?

0:13:11.871 --> 0:13:15.450
And you all know basic statistics.

0:13:15.450 --> 0:13:21.483
So if you see this you have a large database
of sentences.

0:13:21.901 --> 0:13:28.003
Made this a real example, so this was from
the TED talks.

0:13:28.003 --> 0:13:37.050
I guess most of you have heard about them,
and if you account for all many sentences,

0:13:37.050 --> 0:13:38.523
good morning.

0:13:38.718 --> 0:13:49.513
It happens so the probability of good morning
is sweet point times to the power minus.

0:13:50.030 --> 0:13:53.755
Okay, so this is a very easy thing.

0:13:53.755 --> 0:13:58.101
We can directly model the language model.

0:13:58.959 --> 0:14:03.489
Does anybody see a problem why this might
not be the final solution?

0:14:06.326 --> 0:14:14.962
Think we would need a folder of more sentences
to make anything useful of this.

0:14:15.315 --> 0:14:29.340
Because the probability of the talk starting
with good morning, good morning is much higher

0:14:29.340 --> 0:14:32.084
than ten minutes.

0:14:33.553 --> 0:14:41.700
In all the probability presented in this face,
not how we usually think about it.

0:14:42.942 --> 0:14:55.038
The probability is even OK, but you're going
into the right direction about the large data.

0:14:55.038 --> 0:14:59.771
Yes, you can't form a new sentence.

0:15:00.160 --> 0:15:04.763
It's about a large data, so you said it's
hard to get enough data.

0:15:04.763 --> 0:15:05.931
It's impossible.

0:15:05.931 --> 0:15:11.839
I would say we are always saying sentences
which have never been said and we are able

0:15:11.839 --> 0:15:12.801
to deal with.

0:15:13.133 --> 0:15:25.485
The problem with the sparsity of the data
will have a lot of perfect English sentences.

0:15:26.226 --> 0:15:31.338
And this is, of course, not what we want to
deal with.

0:15:31.338 --> 0:15:39.332
If we want to model that, we need to have
a model which can really estimate how good.

0:15:39.599 --> 0:15:47.970
And if we are just like counting this way,
most of it will get a zero probability, which

0:15:47.970 --> 0:15:48.722
is not.

0:15:49.029 --> 0:15:56.572
So we need to make things a bit different.

0:15:56.572 --> 0:16:06.221
For the models we had already some idea of
doing that.

0:16:06.486 --> 0:16:08.058
And that we can do here again.

0:16:08.528 --> 0:16:12.866
So we can especially use the gel gel.

0:16:12.772 --> 0:16:19.651
The chain rule and the definition of conditional
probability solve the conditional probability.

0:16:19.599 --> 0:16:26.369
Of an event B given in an event A is the probability
of A and B divided to the probability of A.

0:16:26.369 --> 0:16:32.720
Yes, I recently had a exam on a manic speech
recognition and Mister Rival said this is not

0:16:32.720 --> 0:16:39.629
called a chain of wood because I use this terminology
and he said it's just applying base another.

0:16:40.500 --> 0:16:56.684
But this is definitely the definition of the
condition of probability.

0:16:57.137 --> 0:17:08.630
The probability is defined as P of A and P
of supposed to be divided by the one.

0:17:08.888 --> 0:17:16.392
And that can be easily rewritten into and
times given.

0:17:16.816 --> 0:17:35.279
And the nice thing is, we can easily extend
it, of course, into more variables so we can

0:17:35.279 --> 0:17:38.383
have: And so on.

0:17:38.383 --> 0:17:49.823
So more generally you can do that for now
any length of sequence.

0:17:50.650 --> 0:18:04.802
So if we are now going back to words, we can
model that as the probability of the sequence

0:18:04.802 --> 0:18:08.223
is given its history.

0:18:08.908 --> 0:18:23.717
Maybe it's more clear if we're looking at
real works, so if we have pee-off, it's water

0:18:23.717 --> 0:18:26.914
is so transparent.

0:18:26.906 --> 0:18:39.136
So this way we are able to model the ability
of the whole sentence given the sequence by

0:18:39.136 --> 0:18:42.159
looking at each word.

0:18:42.762 --> 0:18:49.206
And of course the big advantage is that each
word occurs less often than the full sect.

0:18:49.206 --> 0:18:54.991
So hopefully we see that still, of course,
the problem the word doesn't occur.

0:18:54.991 --> 0:19:01.435
Then this doesn't work, but let's recover
most of the lectures today about dealing with

0:19:01.435 --> 0:19:01.874
this.

0:19:02.382 --> 0:19:08.727
So by first of all, we generally is at least
easier as the thing we have before.

0:19:13.133 --> 0:19:23.531
That we really make sense easier, no, because
those jumps get utterly long and we have central.

0:19:23.943 --> 0:19:29.628
Yes exactly, so when we look at the last probability
here, we still have to have seen the full.

0:19:30.170 --> 0:19:38.146
So if we want a molecule of transparent, if
water is so we have to see the food sequence.

0:19:38.578 --> 0:19:48.061
So in first step we didn't really have to
have seen the full sentence.

0:19:48.969 --> 0:19:52.090
However, a little bit of a step nearer.

0:19:52.512 --> 0:19:59.673
So this is still a problem and we will never
have seen it for all the time.

0:20:00.020 --> 0:20:08.223
So you can look at this if you have a vocabulary
of words.

0:20:08.223 --> 0:20:17.956
Now, for example, if the average sentence
is, you would leave to the.

0:20:18.298 --> 0:20:22.394
And we are quite sure we have never seen that
much date.

0:20:22.902 --> 0:20:26.246
So this is, we cannot really compute this
probability.

0:20:26.786 --> 0:20:37.794
However, there's a trick how we can do that
and that's the idea between most of the language.

0:20:38.458 --> 0:20:44.446
So instead of saying how often does this work
happen to exactly this history, we are trying

0:20:44.446 --> 0:20:50.433
to do some kind of clustering and cluster a
lot of different histories into the same class,

0:20:50.433 --> 0:20:55.900
and then we are modeling the probability of
the word given this class of histories.

0:20:56.776 --> 0:21:06.245
And then, of course, the big design decision
is how to be modeled like how to cluster history.

0:21:06.666 --> 0:21:17.330
So how do we put all these histories together
so that we have seen each of one off enough

0:21:17.330 --> 0:21:18.396
so that.

0:21:20.320 --> 0:21:25.623
So there is quite different types of things
people can do.

0:21:25.623 --> 0:21:33.533
You can add some speech texts, you can do
semantic words, you can model the similarity,

0:21:33.533 --> 0:21:46.113
you can model grammatical content, and things
like: However, like quite often in these statistical

0:21:46.113 --> 0:21:53.091
models, if you have a very simple solution.

0:21:53.433 --> 0:21:58.455
And this is what most statistical models do.

0:21:58.455 --> 0:22:09.616
They are based on the so called mark of assumption,
and that means we are assuming all this history

0:22:09.616 --> 0:22:12.183
is not that important.

0:22:12.792 --> 0:22:25.895
So we are modeling the probability of zirkins
is so transparent that or we have maybe two

0:22:25.895 --> 0:22:29.534
words by having a fixed.

0:22:29.729 --> 0:22:38.761
So the class of all our history from word
to word minus one is just the last two words.

0:22:39.679 --> 0:22:45.229
And by doing this classification, which of
course does need any additional knowledge.

0:22:45.545 --> 0:22:51.176
It's very easy to calculate we have no limited
our our histories.

0:22:51.291 --> 0:23:00.906
So instead of an arbitrary long one here,
we have here only like.

0:23:00.906 --> 0:23:10.375
For example, if we have two grams, a lot of
them will not occur.

0:23:10.930 --> 0:23:20.079
So it's a very simple trick to make all these
classes into a few classes and motivated by,

0:23:20.079 --> 0:23:24.905
of course, the language the nearest things
are.

0:23:24.944 --> 0:23:33.043
Like a lot of sequences, they mainly depend
on the previous one, and things which are far

0:23:33.043 --> 0:23:33.583
away.

0:23:38.118 --> 0:23:47.361
In our product here everything is just modeled
not by the whole history but by the last and

0:23:47.361 --> 0:23:48.969
minus one word.

0:23:50.470 --> 0:23:54.322
So and this is typically expressed by people.

0:23:54.322 --> 0:24:01.776
They're therefore also talking by an N gram
language model because we are always looking

0:24:01.776 --> 0:24:06.550
at these chimes of N words and modeling the
probability.

0:24:07.527 --> 0:24:10.485
So again start with the most simple case.

0:24:10.485 --> 0:24:15.485
Even extreme is the unigram case, so we're
ignoring the whole history.

0:24:15.835 --> 0:24:24.825
The probability of a sequence of words is
just the probability of each of the words in

0:24:24.825 --> 0:24:25.548
there.

0:24:26.046 --> 0:24:32.129
And therefore we are removing the whole context.

0:24:32.129 --> 0:24:40.944
The most probable sequence would be something
like one of them is the.

0:24:42.162 --> 0:24:44.694
Most probable wordsuit by itself.

0:24:44.694 --> 0:24:49.684
It might not make sense, but it, of course,
can give you a bit of.

0:24:49.629 --> 0:24:52.682
Intuition like which types of words should
be more frequent.

0:24:53.393 --> 0:25:00.012
And if you what you can do is train such a
button and you can just automatically generate.

0:25:00.140 --> 0:25:09.496
And this sequence is generated by sampling,
so we will later come in the lecture too.

0:25:09.496 --> 0:25:16.024
The sampling is that you randomly pick a word
but based on.

0:25:16.096 --> 0:25:22.711
So if the probability of one word is zero
point two then you'll put it on and if another

0:25:22.711 --> 0:25:23.157
word.

0:25:23.483 --> 0:25:36.996
And if you see that you'll see here now, for
example, it seems that these are two occurring

0:25:36.996 --> 0:25:38.024
posts.

0:25:38.138 --> 0:25:53.467
But you see there's not really any continuing
type of structure because each word is modeled

0:25:53.467 --> 0:25:55.940
independently.

0:25:57.597 --> 0:26:03.037
This you can do better even though going to
a biograph, so then we're having a bit of context.

0:26:03.037 --> 0:26:08.650
Of course, it's still very small, so the probability
of your word of the actual word only depends

0:26:08.650 --> 0:26:12.429
on the previous word and all the context before
there is ignored.

0:26:13.133 --> 0:26:18.951
This of course will come to that wrong, but
it models a regular language significantly

0:26:18.951 --> 0:26:19.486
better.

0:26:19.779 --> 0:26:28.094
Seeing some things here still doesn't really
make a lot of sense, but you're seeing some

0:26:28.094 --> 0:26:29.682
typical phrases.

0:26:29.949 --> 0:26:39.619
In this hope doesn't make sense, but in this
issue is also frequent.

0:26:39.619 --> 0:26:51.335
Issue is also: Very nice is this year new
car parking lot after, so if you have the word

0:26:51.335 --> 0:26:53.634
new then the word.

0:26:53.893 --> 0:27:01.428
Is also quite common, but new car they wouldn't
put parking.

0:27:01.428 --> 0:27:06.369
Often the continuation is packing lots.

0:27:06.967 --> 0:27:12.417
And now it's very interesting because here
we see the two cementic meanings of lot: You

0:27:12.417 --> 0:27:25.889
have a parking lot, but in general if you just
think about the history, the most common use

0:27:25.889 --> 0:27:27.353
is a lot.

0:27:27.527 --> 0:27:33.392
So you see that he's really not using the
context before, but he's only using the current

0:27:33.392 --> 0:27:33.979
context.

0:27:38.338 --> 0:27:41.371
So in general we can of course do that longer.

0:27:41.371 --> 0:27:43.888
We can do unigrams, bigrams, trigrams.

0:27:45.845 --> 0:27:52.061
People typically went up to four or five grams,
and then it's getting difficult because.

0:27:52.792 --> 0:27:56.671
There are so many five grams that it's getting
complicated.

0:27:56.671 --> 0:28:02.425
Storing all of them and storing these models
get so big that it's no longer working, and

0:28:02.425 --> 0:28:08.050
of course at some point the calculation of
the probabilities again gets too difficult,

0:28:08.050 --> 0:28:09.213
and each of them.

0:28:09.429 --> 0:28:14.777
If you have a small corpus, of course you
will use a smaller ingram length.

0:28:14.777 --> 0:28:16.466
You will take a larger.

0:28:18.638 --> 0:28:24.976
What is important to keep in mind is that,
of course, this is wrong.

0:28:25.285 --> 0:28:36.608
So we have long range dependencies, and if
we really want to model everything in language

0:28:36.608 --> 0:28:37.363
then.

0:28:37.337 --> 0:28:46.965
So here is like one of these extreme cases,
the computer, which has just put into the machine

0:28:46.965 --> 0:28:49.423
room in the slow crash.

0:28:49.423 --> 0:28:55.978
Like somehow, there is a dependency between
computer and crash.

0:28:57.978 --> 0:29:10.646
However, in most situations these are typically
rare and normally most important things happen

0:29:10.646 --> 0:29:13.446
in the near context.

0:29:15.495 --> 0:29:28.408
But of course it's important to keep that
in mind that you can't model the thing so you

0:29:28.408 --> 0:29:29.876
can't do.

0:29:33.433 --> 0:29:50.200
The next question is again how can we train
so we have to estimate these probabilities.

0:29:51.071 --> 0:30:00.131
And the question is how we do that, and again
the most simple thing.

0:30:00.440 --> 0:30:03.168
The thing is exactly what's maximum legal
destination.

0:30:03.168 --> 0:30:12.641
What gives you the right answer is: So how
probable is that the word is following minus

0:30:12.641 --> 0:30:13.370
one?

0:30:13.370 --> 0:30:20.946
You just count how often does this sequence
happen?

0:30:21.301 --> 0:30:28.165
So guess this is what most of you would have
intuitively done, and this also works best.

0:30:28.568 --> 0:30:39.012
So it's not a complicated train, so you once
have to go over your corpus, you have to count

0:30:39.012 --> 0:30:48.662
our diagrams and unigrams, and then you can
directly train the basic language model.

0:30:49.189 --> 0:30:50.651
Who is it difficult?

0:30:50.651 --> 0:30:58.855
There are two difficulties: The basic language
well doesn't work that well because of zero

0:30:58.855 --> 0:31:03.154
counts and how we address that and the second.

0:31:03.163 --> 0:31:13.716
Because we saw that especially if you go for
larger you have to store all these engrams

0:31:13.716 --> 0:31:15.275
efficiently.

0:31:17.697 --> 0:31:21.220
So how we can do that?

0:31:21.220 --> 0:31:24.590
Here's some examples.

0:31:24.590 --> 0:31:33.626
For example, if you have the sequence your
training curve.

0:31:33.713 --> 0:31:41.372
You see that the word happens, ascends the
star and the sequence happens two times.

0:31:42.182 --> 0:31:45.651
We have three times.

0:31:45.651 --> 0:31:58.043
The same starts as the probability is to thirds
and the other probability.

0:31:58.858 --> 0:32:09.204
Here we have what is following so you have
twice and once do so again two thirds and one.

0:32:09.809 --> 0:32:20.627
And this is all that you need to know here
about it, so you can do this calculation.

0:32:23.723 --> 0:32:35.506
So the question then, of course, is what do
we really learn in these types of models?

0:32:35.506 --> 0:32:45.549
Here are examples from the Europycopterus:
The green, the red, and the blue, and here

0:32:45.549 --> 0:32:48.594
you have the probabilities which is the next.

0:32:48.989 --> 0:33:01.897
That there is a lot more than just like the
syntax because the initial phrase is all the

0:33:01.897 --> 0:33:02.767
same.

0:33:03.163 --> 0:33:10.132
For example, you see the green paper in the
green group.

0:33:10.132 --> 0:33:16.979
It's more European palaman, the red cross,
which is by.

0:33:17.197 --> 0:33:21.777
What you also see that it's like sometimes
Indian, sometimes it's more difficult.

0:33:22.302 --> 0:33:28.345
So, for example, following the rats, in one
hundred cases it was a red cross.

0:33:28.668 --> 0:33:48.472
So it seems to be easier to guess the next
word.

0:33:48.528 --> 0:33:55.152
So there is different types of information
coded in that you also know that I guess sometimes

0:33:55.152 --> 0:33:58.675
you directly know all the speakers will continue.

0:33:58.675 --> 0:34:04.946
It's not a lot of new information in the next
word, but in other cases like blue there's

0:34:04.946 --> 0:34:06.496
a lot of information.

0:34:11.291 --> 0:34:14.849
Another example is this Berkeley restaurant
sentences.

0:34:14.849 --> 0:34:21.059
It's collected at Berkeley and you have sentences
like can you tell me about any good spaghetti

0:34:21.059 --> 0:34:21.835
restaurant.

0:34:21.835 --> 0:34:27.463
Big price title is what I'm looking for so
it's more like a dialogue system and people

0:34:27.463 --> 0:34:31.215
have collected this data and of course you
can also look.

0:34:31.551 --> 0:34:46.878
Into this and get the counts, so you count
the vibrants in the top, so the color is the.

0:34:49.409 --> 0:34:52.912
This is a bigram which is the first word of
West.

0:34:52.912 --> 0:34:54.524
This one fuzzy is one.

0:34:56.576 --> 0:35:12.160
One because want to hyperability, but want
a lot less, and there where you see it, for

0:35:12.160 --> 0:35:17.004
example: So here you see after I want.

0:35:17.004 --> 0:35:23.064
It's very often for I eat, but an island which
is not just.

0:35:27.347 --> 0:35:39.267
The absolute counts of how often each road
occurs, and then you can see here the probabilities

0:35:39.267 --> 0:35:40.145
again.

0:35:42.422 --> 0:35:54.519
Then do that if you want to do iwan Dutch
food you get the sequence you have to multiply

0:35:54.519 --> 0:35:55.471
olive.

0:35:55.635 --> 0:36:00.281
And then you of course get a bit of interesting
experience on that.

0:36:00.281 --> 0:36:04.726
For example: Information is there.

0:36:04.726 --> 0:36:15.876
So, for example, if you compare I want Dutch
or I want Chinese, it seems that.

0:36:16.176 --> 0:36:22.910
That the sentence often starts with eye.

0:36:22.910 --> 0:36:31.615
You have it after two is possible, but after
one it.

0:36:31.731 --> 0:36:39.724
And you cannot say want, but you have to say
want to spend, so there's grammical information.

0:36:40.000 --> 0:36:51.032
To main information and source: Here before
we're going into measuring quality, is there

0:36:51.032 --> 0:36:58.297
any questions about language model and the
idea of modeling?

0:37:02.702 --> 0:37:13.501
Hope that doesn't mean everybody sleeping,
and so when we're doing the training these

0:37:13.501 --> 0:37:15.761
language models,.

0:37:16.356 --> 0:37:26.429
You need to model what is the engrum length
should we use a trigram or a forkrum.

0:37:27.007 --> 0:37:34.040
So in order to decide how can you now decide
which of the two models are better?

0:37:34.914 --> 0:37:40.702
And if you would have to do that, how would
you decide taking language model or taking

0:37:40.702 --> 0:37:41.367
language?

0:37:43.263 --> 0:37:53.484
I take some test text and see which model
assigns a higher probability to me.

0:37:54.354 --> 0:38:03.978
It's very good, so that's even the second
thing, so the first thing maybe would have

0:38:03.978 --> 0:38:04.657
been.

0:38:05.925 --> 0:38:12.300
The problem is the and then you take the language
language language and machine translation.

0:38:13.193 --> 0:38:18.773
Problems: First of all you have to build a
whole system which is very time consuming and

0:38:18.773 --> 0:38:21.407
it might not only depend on the language.

0:38:21.407 --> 0:38:24.730
On the other hand, that's of course what the
end is.

0:38:24.730 --> 0:38:30.373
The end want and the pressure will model each
component individually or do you want to do

0:38:30.373 --> 0:38:31.313
an end to end.

0:38:31.771 --> 0:38:35.463
What can also happen is you'll see your metric
model.

0:38:35.463 --> 0:38:41.412
This is a very good language model, but it
somewhat doesn't really work well with your

0:38:41.412 --> 0:38:42.711
translation model.

0:38:43.803 --> 0:38:49.523
But of course it's very good to also have
this type of intrinsic evaluation where the

0:38:49.523 --> 0:38:52.116
assumption should be as a pointed out.

0:38:52.116 --> 0:38:57.503
If we have Good English it shouldn't be a
high probability and it's bad English.

0:38:58.318 --> 0:39:07.594
And this is measured by the take a held out
data set, so some data which you don't train

0:39:07.594 --> 0:39:12.596
on then calculate the probability of this data.

0:39:12.912 --> 0:39:26.374
Then you're just looking at the language model
and you take the language model.

0:39:27.727 --> 0:39:33.595
You're not directly using the probability,
but you're taking the perplexity.

0:39:33.595 --> 0:39:40.454
The perplexity is due to the power of the
cross entropy, and you see in the cross entropy

0:39:40.454 --> 0:39:46.322
you're doing something like an average probability
of always coming to this.

0:39:46.846 --> 0:39:54.721
Not so how exactly is that define perplexity
is typically what people refer to all across.

0:39:54.894 --> 0:40:02.328
The cross edge is negative and average, and
then you have the lock of the probability of

0:40:02.328 --> 0:40:03.246
the whole.

0:40:04.584 --> 0:40:10.609
We are modeling this probability as the product
of each of the words.

0:40:10.609 --> 0:40:18.613
That's how the end gram was defined and now
you hopefully can remember the rules of logarism

0:40:18.613 --> 0:40:23.089
so you can get the probability within the logarism.

0:40:23.063 --> 0:40:31.036
The sum here so the cross entry is minus one
by two by n, and the sum of all your words

0:40:31.036 --> 0:40:35.566
and the lowerism of the probability of each
word.

0:40:36.176 --> 0:40:39.418
And then the perplexity is just like two to
the power.

0:40:41.201 --> 0:40:44.706
Why can this be interpreted as a branching
factor?

0:40:44.706 --> 0:40:50.479
So it gives you a bit like the average thing,
like how many possibilities you have.

0:40:51.071 --> 0:41:02.249
You have a digit task and you have no idea,
but the probability of the next digit is like

0:41:02.249 --> 0:41:03.367
one ten.

0:41:03.783 --> 0:41:09.354
And if you then take a later perplexity, it
will be exactly ten.

0:41:09.849 --> 0:41:24.191
And that is like this perplexity gives you
a million interpretations, so how much randomness

0:41:24.191 --> 0:41:27.121
is still in there?

0:41:27.307 --> 0:41:32.433
Of course, now it's good to have a lower perplexity.

0:41:32.433 --> 0:41:36.012
We have less ambiguity in there and.

0:41:35.976 --> 0:41:48.127
If you have a hundred words and you only have
to uniformly compare it to ten different, so

0:41:48.127 --> 0:41:49.462
you have.

0:41:49.609 --> 0:41:53.255
Yes, think so it should be.

0:41:53.255 --> 0:42:03.673
You had here logarism and then to the power
and that should then be eliminated.

0:42:03.743 --> 0:42:22.155
So which logarism you use is not that important
because it's a constant factor to reformulate.

0:42:23.403 --> 0:42:28.462
Yes and Yeah So the Best.

0:42:31.931 --> 0:42:50.263
The best model is always like you want to
have a high probability.

0:42:51.811 --> 0:43:04.549
Time you see here, so here the probabilities
would like to commend the rapporteur on his

0:43:04.549 --> 0:43:05.408
work.

0:43:05.285 --> 0:43:14.116
You have then locked two probabilities and
then the average, so this is not the perplexity

0:43:14.116 --> 0:43:18.095
but the cross entropy as mentioned here.

0:43:18.318 --> 0:43:26.651
And then due to the power of that we'll give
you the perplexity of the center.

0:43:29.329 --> 0:43:40.967
And these metrics of perplexity are essential
in modeling that and we'll also see nowadays.

0:43:41.121 --> 0:43:47.898
You also measure like equality often in perplexity
or cross entropy, which gives you how good

0:43:47.898 --> 0:43:50.062
is it in estimating the same.

0:43:50.010 --> 0:43:53.647
The better the model is, the more information
you have about this.

0:43:55.795 --> 0:44:03.106
Talked about isomic ability or quit sentences,
but don't most have to any much because.

0:44:03.463 --> 0:44:12.512
You are doing that in this way implicitly
because of the correct word.

0:44:12.512 --> 0:44:19.266
If you are modeling this one, the sun over
all next.

0:44:20.020 --> 0:44:29.409
Therefore, you have that implicitly in there
because in each position you're modeling the

0:44:29.409 --> 0:44:32.957
probability of this witch behind.

0:44:35.515 --> 0:44:43.811
You have a very large number of negative examples
because all the possible extensions which are

0:44:43.811 --> 0:44:49.515
not there are incorrect, which of course might
also be a problem.

0:44:52.312 --> 0:45:00.256
And the biggest challenge of these types of
models is how to model unseen events.

0:45:00.840 --> 0:45:04.973
So that can be unknown words or it can be
unknown vibrants.

0:45:05.245 --> 0:45:10.096
So that's important also like you've seen
all the words.

0:45:10.096 --> 0:45:17.756
But if you have a bigram language model, if
you haven't seen the bigram, you'll still get

0:45:17.756 --> 0:45:23.628
a zero probability because we know that the
bigram's divided by the.

0:45:24.644 --> 0:45:35.299
If you have unknown words, the problem gets
even bigger because one word typically causes

0:45:35.299 --> 0:45:37.075
a lot of zero.

0:45:37.217 --> 0:45:41.038
So if you, for example, if your vocabulary
is go to and care it,.

0:45:41.341 --> 0:45:43.467
And you have not a sentence.

0:45:43.467 --> 0:45:47.941
I want to pay a T, so you have one word, which
is here 'an'.

0:45:47.887 --> 0:45:54.354
It is unknow then you have the proper.

0:45:54.354 --> 0:46:02.147
It is I get a sentence star and sentence star.

0:46:02.582 --> 0:46:09.850
To model this probability you always have
to take the account from these sequences divided

0:46:09.850 --> 0:46:19.145
by: Since when does it occur, all of these
angrams can also occur because of the word

0:46:19.145 --> 0:46:19.961
middle.

0:46:20.260 --> 0:46:27.800
So all of these probabilities are directly
zero.

0:46:27.800 --> 0:46:33.647
You see that just by having a single.

0:46:34.254 --> 0:46:47.968
Tells you it might not always be better to
have larger grams because if you have a gram

0:46:47.968 --> 0:46:50.306
language more.

0:46:50.730 --> 0:46:57.870
So sometimes it's better to have a smaller
angram counter because the chances that you're

0:46:57.870 --> 0:47:00.170
seeing the angram is higher.

0:47:00.170 --> 0:47:07.310
On the other hand, you want to have a larger
account because the larger the count is, the

0:47:07.310 --> 0:47:09.849
longer the context is modeling.

0:47:10.670 --> 0:47:17.565
So how can we address this type of problem?

0:47:17.565 --> 0:47:28.064
We address this type of problem by somehow
adjusting our accounts.

0:47:29.749 --> 0:47:40.482
We have often, but most of your entries in
the table are zero, and if one of these engrams

0:47:40.482 --> 0:47:45.082
occurs you'll have a zero probability.

0:47:46.806 --> 0:48:06.999
So therefore we need to find some of our ways
in order to estimate this type of event because:

0:48:07.427 --> 0:48:11.619
So there are different ways of how to model
it and how to adjust it.

0:48:11.619 --> 0:48:15.326
The one I hear is to do smoocing and that's
the first thing.

0:48:15.326 --> 0:48:20.734
So in smoocing you're saying okay, we take
a bit of the probability we have to our scene

0:48:20.734 --> 0:48:23.893
events and distribute this thing we're taking
away.

0:48:23.893 --> 0:48:26.567
We're distributing to all the other events.

0:48:26.946 --> 0:48:33.927
The nice thing is in this case oh now each
event has a non zero probability and that is

0:48:33.927 --> 0:48:39.718
of course very helpful because we don't have
zero probabilities anymore.

0:48:40.180 --> 0:48:48.422
It smoothed out, but at least you have some
kind of probability everywhere, so you take

0:48:48.422 --> 0:48:50.764
some of the probability.

0:48:53.053 --> 0:49:05.465
You can also do that more here when you have
the endgram, for example, and this is your

0:49:05.465 --> 0:49:08.709
original distribution.

0:49:08.648 --> 0:49:15.463
Then you are taking some mass away from here
and distributing this mass to all the other

0:49:15.463 --> 0:49:17.453
words that you have seen.

0:49:18.638 --> 0:49:26.797
And thereby you are now making sure that it's
yeah, that it's now possible to model that.

0:49:28.828 --> 0:49:36.163
The other idea we're coming into more detail
on how we can do this type of smoking, but

0:49:36.163 --> 0:49:41.164
one other idea you can do is to do some type
of clustering.

0:49:41.501 --> 0:49:48.486
And that means if we are can't model go Kit's,
for example because we haven't seen that.

0:49:49.349 --> 0:49:56.128
Then we're just looking at the full thing
and we're just going to live directly how probable.

0:49:56.156 --> 0:49:58.162
Go two ways or so.

0:49:58.162 --> 0:50:09.040
Then we are modeling just only the word interpolation
where you're interpolating all the probabilities

0:50:09.040 --> 0:50:10.836
and thereby can.

0:50:11.111 --> 0:50:16.355
These are the two things which are helpful
in order to better calculate all these types.

0:50:19.499 --> 0:50:28.404
Let's start with what counts news so the idea
is okay.

0:50:28.404 --> 0:50:38.119
We have not seen an event and then the probability
is zero.

0:50:38.618 --> 0:50:50.902
It's not that high, but you should always
be aware that there might be new things happening

0:50:50.902 --> 0:50:55.308
and somehow be able to estimate.

0:50:56.276 --> 0:50:59.914
So the idea is okay.

0:50:59.914 --> 0:51:09.442
We can also assign a positive probability
to a higher.

0:51:10.590 --> 0:51:23.233
We are changing so currently we worked on
imperial accounts so how often we have seen

0:51:23.233 --> 0:51:25.292
the accounts.

0:51:25.745 --> 0:51:37.174
And now we are going on to expect account
how often this would occur in an unseen.

0:51:37.517 --> 0:51:39.282
So we are directly trying to model that.

0:51:39.859 --> 0:51:45.836
Of course, the empirical accounts are a good
starting point, so if you've seen the world

0:51:45.836 --> 0:51:51.880
very often in your training data, it's a good
estimation of how often you would see it in

0:51:51.880 --> 0:51:52.685
the future.

0:51:52.685 --> 0:51:58.125
However, it might make sense to think about
it only because you haven't seen it.

0:51:58.578 --> 0:52:10.742
So does anybody have a very simple idea how
you start with smoothing it?

0:52:10.742 --> 0:52:15.241
What count would you give?

0:52:21.281 --> 0:52:32.279
Now you have the probability to calculation
how often have you seen the biogram with zero

0:52:32.279 --> 0:52:33.135
count.

0:52:33.193 --> 0:52:39.209
So what count would you give in order to still
do this calculation?

0:52:39.209 --> 0:52:41.509
We have to smooth, so we.

0:52:44.884 --> 0:52:52.151
We could clump together all the rare words,
for example everywhere we have only seen ones.

0:52:52.652 --> 0:52:56.904
And then just we can do the massive moment
of those and don't.

0:52:56.936 --> 0:53:00.085
So remove the real ones.

0:53:00.085 --> 0:53:06.130
Yes, and then every unseen word is one of
them.

0:53:06.130 --> 0:53:13.939
Yeah, but it's not only about unseen words,
it's even unseen.

0:53:14.874 --> 0:53:20.180
You can even start easier and that's what
people do at the first thing.

0:53:20.180 --> 0:53:22.243
That's at one smooth thing.

0:53:22.243 --> 0:53:28.580
You'll see it's not working good but the variation
works fine and we're just as here.

0:53:28.580 --> 0:53:30.644
We've seen everything once.

0:53:31.771 --> 0:53:39.896
That's similar to this because you're clustering
the one and the zero together and you just

0:53:39.896 --> 0:53:45.814
say you've seen everything once or have seen
them twice and so on.

0:53:46.386 --> 0:53:53.249
And if you've done that wow, there's no probability
because each event has happened once.

0:53:55.795 --> 0:54:02.395
If you otherwise have seen the bigram five
times, you would not now do five times but

0:54:02.395 --> 0:54:03.239
six times.

0:54:03.363 --> 0:54:09.117
So the nice thing is to have seen everything.

0:54:09.117 --> 0:54:19.124
Once the probability of the engrap is now
out, you have seen it divided by the.

0:54:20.780 --> 0:54:23.763
How long ago there's one big big problem with
it?

0:54:24.064 --> 0:54:38.509
Just imagine that you have a vocabulary of
words, and you have a corpus of thirty million

0:54:38.509 --> 0:54:39.954
bigrams.

0:54:39.954 --> 0:54:42.843
So if you have a.

0:54:43.543 --> 0:54:46.580
Simple Things So You've Seen Them Thirty Million
Times.

0:54:47.247 --> 0:54:49.818
That is your count, your distributing.

0:54:49.818 --> 0:54:55.225
According to your gain, the problem is yet
how many possible bigrams do you have?

0:54:55.225 --> 0:55:00.895
You have seven point five billion possible
bigrams, and each of them you are counting

0:55:00.895 --> 0:55:04.785
now as give up your ability, like you give
account of one.

0:55:04.785 --> 0:55:07.092
So each of them is saying a curse.

0:55:07.627 --> 0:55:16.697
Then this number of possible vigrams is many
times larger than the number you really see.

0:55:17.537 --> 0:55:21.151
You're mainly doing equal distribution.

0:55:21.151 --> 0:55:26.753
Everything gets the same because this is much
more important.

0:55:26.753 --> 0:55:31.541
Most of your probability mass is used for
smoothing.

0:55:32.412 --> 0:55:37.493
Because most of the probability miles have
to be distributed that you at least give every

0:55:37.493 --> 0:55:42.687
biogram at least a count of one, and the other
counts are only the thirty million, so seven

0:55:42.687 --> 0:55:48.219
point five billion counts go to like a distribute
around all the engrons, and only thirty million

0:55:48.219 --> 0:55:50.026
are according to your frequent.

0:55:50.210 --> 0:56:02.406
So you put a lot too much mass on your smoothing
and you're doing some kind of extreme smoothing.

0:56:02.742 --> 0:56:08.986
So that of course is a bit bad then and will
give you not the best performance.

0:56:10.130 --> 0:56:16.160
However, there's a nice thing and that means
to do probability calculations.

0:56:16.160 --> 0:56:21.800
We are doing it based on counts, but to do
this division we don't need.

0:56:22.302 --> 0:56:32.112
So we can also do that with floating point
values and there is still a valid type of calculation.

0:56:32.392 --> 0:56:39.380
So we can have less probability mass to unseen
events.

0:56:39.380 --> 0:56:45.352
We don't have to give one because if we count.

0:56:45.785 --> 0:56:50.976
But to do our calculation we can also give
zero point zero to something like that, so

0:56:50.976 --> 0:56:56.167
very small value, and thereby we have less
value on the smooth thing, and we are more

0:56:56.167 --> 0:56:58.038
focusing on the actual corpus.

0:56:58.758 --> 0:57:03.045
And that is what people refer to as Alpha
Smoozing.

0:57:03.223 --> 0:57:12.032
You see that we are now adding not one to
it but only alpha, and then we are giving less

0:57:12.032 --> 0:57:19.258
probability to the unseen event and more probability
to the really seen.

0:57:20.780 --> 0:57:24.713
Questions: Of course, how do you find see
also?

0:57:24.713 --> 0:57:29.711
I'm here to either use some help out data
and optimize them.

0:57:30.951 --> 0:57:35.153
So what what does it now really mean?

0:57:35.153 --> 0:57:40.130
This gives you a bit of an idea behind that.

0:57:40.700 --> 0:57:57.751
So here you have the grams which occur one
time, for example all grams which occur one.

0:57:57.978 --> 0:58:10.890
So, for example, that means that if you have
engrams which occur one time, then.

0:58:11.371 --> 0:58:22.896
If you look at all the engrams which occur
two times, then they occur.

0:58:22.896 --> 0:58:31.013
If you look at the engrams that occur zero,
then.

0:58:32.832 --> 0:58:46.511
So if you are now doing the smoothing you
can look what is the probability estimating

0:58:46.511 --> 0:58:47.466
them.

0:58:47.847 --> 0:59:00.963
You see that for all the endbreaks you heavily
underestimate how often they occur in the test

0:59:00.963 --> 0:59:01.801
card.

0:59:02.002 --> 0:59:10.067
So what you want is very good to estimate
this distribution, so for each Enron estimate

0:59:10.067 --> 0:59:12.083
quite well how often.

0:59:12.632 --> 0:59:16.029
You're quite bad at that for all of them.

0:59:16.029 --> 0:59:22.500
You're apparently underestimating only for
the top ones which you haven't seen.

0:59:22.500 --> 0:59:24.845
You'll heavily overestimate.

0:59:25.645 --> 0:59:30.887
If you're doing alpha smoothing and optimize
that to fit on the zero count because that's

0:59:30.887 --> 0:59:36.361
not completely fair because this alpha is now
optimizes the test counter, you see that you're

0:59:36.361 --> 0:59:37.526
doing a lot better.

0:59:37.526 --> 0:59:42.360
It's not perfect, but you're a lot better
in estimating how often they will occur.

0:59:45.545 --> 0:59:49.316
So this is one idea of doing it.

0:59:49.316 --> 0:59:57.771
Of course there's other ways and this is like
a large research direction.

0:59:58.318 --> 1:00:03.287
So there is this needed estimation.

1:00:03.287 --> 1:00:11.569
What you are doing is filling your trading
data into parts.

1:00:11.972 --> 1:00:19.547
Looking at how many engrams occur exactly
are types, which engrams occur are times in

1:00:19.547 --> 1:00:20.868
your training.

1:00:21.281 --> 1:00:27.716
And then you look for these ones.

1:00:27.716 --> 1:00:36.611
How often do they occur in your training data?

1:00:36.611 --> 1:00:37.746
It's.

1:00:38.118 --> 1:00:45.214
And then you say oh this engram, the expector
counts how often will see.

1:00:45.214 --> 1:00:56.020
It is divided by: Some type of clustering
you're putting all the engrams which occur

1:00:56.020 --> 1:01:04.341
are at times in your data together and in order
to estimate how often.

1:01:05.185 --> 1:01:12.489
And if you do half your data related to your
final estimation by just using those statistics,.

1:01:14.014 --> 1:01:25.210
So this is called added estimation, and thereby
you are not able to estimate better how often

1:01:25.210 --> 1:01:25.924
does.

1:01:28.368 --> 1:01:34.559
And again we can do the same look and compare
it to the expected counts.

1:01:34.559 --> 1:01:37.782
Again we have exactly the same table.

1:01:38.398 --> 1:01:47.611
So then we're having to hear how many engrams
that does exist.

1:01:47.611 --> 1:01:55.361
So, for example, there's like engrams which
you can.

1:01:55.835 --> 1:02:08.583
Then you look into your other half and how
often do these N grams occur in your 2nd part

1:02:08.583 --> 1:02:11.734
of the training data?

1:02:12.012 --> 1:02:22.558
For example, an unseen N gram I expect to
occur, an engram which occurs one time.

1:02:22.558 --> 1:02:25.774
I expect that it occurs.

1:02:27.527 --> 1:02:42.564
Yeah, the number of zero counts are if take
my one grams and then just calculate how many

1:02:42.564 --> 1:02:45.572
possible bigrams.

1:02:45.525 --> 1:02:50.729
Yes, so in this case we are now not assuming
about having a more larger cattle because then,

1:02:50.729 --> 1:02:52.127
of course, it's getting.

1:02:52.272 --> 1:02:54.730
So you're doing that given the current gram.

1:02:54.730 --> 1:03:06.057
The cavalry is better to: So yeah, there's
another problem in how to deal with them.

1:03:06.057 --> 1:03:11.150
This is more about how to smuse the engram
counts to also deal.

1:03:14.394 --> 1:03:18.329
Certainly as I Think The.

1:03:18.198 --> 1:03:25.197
Yes, the last idea of doing is so called good
cheering, and and the I hear here is in it

1:03:25.197 --> 1:03:32.747
similar, so there is a typical mathematic approve,
but you can show that a very good estimation

1:03:32.747 --> 1:03:34.713
for the expected counts.

1:03:34.654 --> 1:03:42.339
Is that you take the number of engrams which
occur one time more divided by the number of

1:03:42.339 --> 1:03:46.011
engram which occur R times and R plus one.

1:03:46.666 --> 1:03:49.263
So this is then the estimation of.

1:03:49.549 --> 1:04:05.911
So if you are looking now at an engram which
occurs times then you are looking at how many

1:04:05.911 --> 1:04:08.608
engrams occur.

1:04:09.009 --> 1:04:18.938
It's very simple, so in this one you only
have to count all the bigrams, how many different

1:04:18.938 --> 1:04:23.471
bigrams out there, and that is very good.

1:04:23.903 --> 1:04:33.137
So if you are saying now about end drums which
occur or times,.

1:04:33.473 --> 1:04:46.626
It might be that there are some occurring
times, but no times, and then.

1:04:46.866 --> 1:04:54.721
So what you normally do is you are doing for
small R, and for large R you do some curve

1:04:54.721 --> 1:04:55.524
fitting.

1:04:56.016 --> 1:05:07.377
In general this type of smoothing is important
for engrams which occur rarely.

1:05:07.377 --> 1:05:15.719
If an engram occurs so this is more important
for events.

1:05:17.717 --> 1:05:25.652
So here again you see you have the counts
and then based on that you get the adjusted

1:05:25.652 --> 1:05:26.390
counts.

1:05:26.390 --> 1:05:34.786
This is here and if you compare it's a test
count you see that it really works quite well.

1:05:35.035 --> 1:05:41.093
But for the low numbers it's a very good modeling
of how much how good this works.

1:05:45.005 --> 1:05:50.018
Then, of course, the question is how good
does it work in language modeling?

1:05:50.018 --> 1:05:51.516
We also want tomorrow.

1:05:52.372 --> 1:05:54.996
We can measure that perplexity.

1:05:54.996 --> 1:05:59.261
We learned that before and then we have everyone's.

1:05:59.579 --> 1:06:07.326
You saw that a lot of too much probability
mass is put to the events which have your probability.

1:06:07.667 --> 1:06:11.098
Then you have an alpha smoothing.

1:06:11.098 --> 1:06:16.042
Here's a start because it's not completely
fair.

1:06:16.042 --> 1:06:20.281
The alpha was maximized on the test data.

1:06:20.480 --> 1:06:25.904
But you see that like the leaded estimation
of the touring gives you a similar performance.

1:06:26.226 --> 1:06:29.141
So they seem to really work quite well.

1:06:32.232 --> 1:06:41.552
So this is about all assigning probability
mass to aimed grams, which we have not seen

1:06:41.552 --> 1:06:50.657
in order to also estimate their probability
before we're going to the interpolation.

1:06:55.635 --> 1:07:00.207
Good, so now we have.

1:07:00.080 --> 1:07:11.818
Done this estimation, and the problem is we
have this general.

1:07:11.651 --> 1:07:19.470
We want to have a longer context because we
can model longer than language better because

1:07:19.470 --> 1:07:21.468
long range dependency.

1:07:21.701 --> 1:07:26.745
On the other hand, we have limited data so
we want to have stored angrums because they

1:07:26.745 --> 1:07:28.426
reach angrums at first more.

1:07:29.029 --> 1:07:43.664
And about the smooth thing in the discounting
we did before, it always treats all angrams.

1:07:44.024 --> 1:07:46.006
So we didn't really look at the end drums.

1:07:46.006 --> 1:07:48.174
They were all classed into how often they
are.

1:07:49.169 --> 1:08:00.006
However, sometimes this might not be very
helpful, so for example look at the engram

1:08:00.006 --> 1:08:06.253
Scottish beer drinkers and Scottish beer eaters.

1:08:06.686 --> 1:08:12.037
Because we have not seen the trigram, so you
will estimate the trigram probability by the

1:08:12.037 --> 1:08:14.593
probability you assign to the zero county.

1:08:15.455 --> 1:08:26.700
However, if you look at the background probability
that you might have seen and might be helpful,.

1:08:26.866 --> 1:08:34.538
So be a drinker is more probable to see than
Scottish be a drinker, and be a drinker should

1:08:34.538 --> 1:08:36.039
be more probable.

1:08:36.896 --> 1:08:39.919
So this type of information is somehow ignored.

1:08:39.919 --> 1:08:45.271
So if we have the Trigram language model,
we are only looking at trigrams divided by

1:08:45.271 --> 1:08:46.089
the Vigrams.

1:08:46.089 --> 1:08:49.678
But if we have not seen the Vigrams, we are
not looking.

1:08:49.678 --> 1:08:53.456
Oh, maybe we will have seen the Vigram and
we can back off.

1:08:54.114 --> 1:09:01.978
And that is what people do in interpolation
and back off.

1:09:01.978 --> 1:09:09.164
The idea is if we don't have seen the large
engrams.

1:09:09.429 --> 1:09:16.169
So don't have to go to a shorter sequence
and try to see if we came on in this probability.

1:09:16.776 --> 1:09:20.730
And this is the idea of interpolation.

1:09:20.730 --> 1:09:25.291
There's like two different ways of doing it.

1:09:25.291 --> 1:09:26.507
One is the.

1:09:26.646 --> 1:09:29.465
The easiest thing is like okay.

1:09:29.465 --> 1:09:32.812
If we have bigrams, we have trigrams.

1:09:32.812 --> 1:09:35.103
If we have programs, why?

1:09:35.355 --> 1:09:46.544
Mean, of course, we have the larger ones,
the larger context, but the short amounts are

1:09:46.544 --> 1:09:49.596
maybe better estimated.

1:09:50.090 --> 1:10:00.487
Time just by taking the probability of just
the word class of probability of and.

1:10:01.261 --> 1:10:07.052
And of course we need to know because otherwise
we don't have a probability distribution, but

1:10:07.052 --> 1:10:09.332
we can somehow optimize the weights.

1:10:09.332 --> 1:10:15.930
For example, the health out data set: And
thereby we have now a probability distribution

1:10:15.930 --> 1:10:17.777
which takes both into account.

1:10:18.118 --> 1:10:23.705
The thing about the Scottish be a drink business.

1:10:23.705 --> 1:10:33.763
The dry rum probability will be the same for
the post office because they both occur zero

1:10:33.763 --> 1:10:34.546
times.

1:10:36.116 --> 1:10:45.332
But the two grand verability will hopefully
be different because we might have seen beer

1:10:45.332 --> 1:10:47.611
eaters and therefore.

1:10:48.668 --> 1:10:57.296
The idea that sometimes it's better to have
different models and combine them instead.

1:10:58.678 --> 1:10:59.976
Another idea in style.

1:11:00.000 --> 1:11:08.506
Of this overall interpolation is you can also
do this type of recursive interpolation.

1:11:08.969 --> 1:11:23.804
The probability of the word given its history
is in the current language model probability.

1:11:24.664 --> 1:11:30.686
Thus one minus the weights of this two some
after one, and here it's an interpolated probability

1:11:30.686 --> 1:11:36.832
from the n minus one breath, and then of course
it goes recursively on until you are at a junigram

1:11:36.832 --> 1:11:37.639
probability.

1:11:38.558 --> 1:11:49.513
What you can also do, you can not only do
the same weights for all our words, but you

1:11:49.513 --> 1:12:06.020
can for example: For example, for engrams,
which you have seen very often, you put more

1:12:06.020 --> 1:12:10.580
weight on the trigrams.

1:12:13.673 --> 1:12:29.892
The other thing you can do is the back off
and the difference in back off is we are not

1:12:29.892 --> 1:12:32.656
interpolating.

1:12:32.892 --> 1:12:41.954
If we have seen the trigram probability so
if the trigram hound is bigger then we take

1:12:41.954 --> 1:12:48.412
the trigram probability and if we have seen
this one then we.

1:12:48.868 --> 1:12:54.092
So that is the difference.

1:12:54.092 --> 1:13:06.279
We are always taking all the angle probabilities
and back off.

1:13:07.147 --> 1:13:09.941
Why do we need to do this just a minute?

1:13:09.941 --> 1:13:13.621
So why have we here just take the probability
of the.

1:13:15.595 --> 1:13:18.711
Yes, because otherwise the probabilities from
some people.

1:13:19.059 --> 1:13:28.213
In order to make them still sound one, we
have to take away a bit of a probability mass

1:13:28.213 --> 1:13:29.773
for the scene.

1:13:29.709 --> 1:13:38.919
The difference is we are no longer distributing
it equally as before to the unseen, but we

1:13:38.919 --> 1:13:40.741
are distributing.

1:13:44.864 --> 1:13:56.220
For example, this can be done with gutturing,
so the expected counts in goodturing we saw.

1:13:57.697 --> 1:13:59.804
The adjusted counts.

1:13:59.804 --> 1:14:04.719
They are always lower than the ones we see
here.

1:14:04.719 --> 1:14:14.972
These counts are always: See that so you can
now take this different and distribute this

1:14:14.972 --> 1:14:18.852
weights to the lower based input.

1:14:23.323 --> 1:14:29.896
Is how we can distribute things.

1:14:29.896 --> 1:14:43.442
Then there is one last thing people are doing,
especially how much.

1:14:43.563 --> 1:14:55.464
And there's one thing which is called well
written by Mozilla.

1:14:55.315 --> 1:15:01.335
In the background, like in the background,
it might make sense to look at the words and

1:15:01.335 --> 1:15:04.893
see how probable it is that you need to background.

1:15:05.425 --> 1:15:11.232
So look at these words five and one cent.

1:15:11.232 --> 1:15:15.934
Those occur exactly times in the.

1:15:16.316 --> 1:15:27.804
They would be treated exactly the same because
both occur at the same time, and it would be

1:15:27.804 --> 1:15:29.053
the same.

1:15:29.809 --> 1:15:48.401
However, it shouldn't really model the same.

1:15:48.568 --> 1:15:57.447
If you compare that for constant there are
four hundred different continuations of this

1:15:57.447 --> 1:16:01.282
work, so there is nearly always this.

1:16:02.902 --> 1:16:11.203
So if you're now seeing a new bigram or a
biogram with Isaac Constant or Spite starting

1:16:11.203 --> 1:16:13.467
and then another word,.

1:16:15.215 --> 1:16:25.606
In constant, it's very frequent that you see
new angrups because there are many different

1:16:25.606 --> 1:16:27.222
combinations.

1:16:27.587 --> 1:16:35.421
Therefore, it might look not only to look
at the counts, the end grams, but also how

1:16:35.421 --> 1:16:37.449
many extensions does.

1:16:38.218 --> 1:16:43.222
And this is done by witt velk smoothing.

1:16:43.222 --> 1:16:51.032
The idea is we count how many possible extensions
in this case.

1:16:51.371 --> 1:17:01.966
So we had for spive, we had possible extensions,
and for constant we had a lot more.

1:17:02.382 --> 1:17:09.394
And then how much we put into our backup model,
how much weight we put into the backup is,

1:17:09.394 --> 1:17:13.170
depending on this number of possible extensions.

1:17:14.374 --> 1:17:15.557
Style.

1:17:15.557 --> 1:17:29.583
We have it here, so this is the weight you
put on your lower end gram probability.

1:17:29.583 --> 1:17:46.596
For example: And if you compare these two
numbers, so for Spike you do how many extensions

1:17:46.596 --> 1:17:55.333
does Spike have divided by: While for constant
you have zero point three, you know,.

1:17:55.815 --> 1:18:05.780
So you're putting a lot more weight to like
it's not as bad to fall off to the back of

1:18:05.780 --> 1:18:06.581
model.

1:18:06.581 --> 1:18:10.705
So for the spy it's really unusual.

1:18:10.730 --> 1:18:13.369
For Constant there's a lot of probability
medicine.

1:18:13.369 --> 1:18:15.906
The chances that you're doing that is quite
high.

1:18:20.000 --> 1:18:26.209
Similarly, but just from the other way around,
it's now looking at this probability distribution.

1:18:26.546 --> 1:18:37.103
So now when we back off the probability distribution
for the lower angrums, we calculated exactly

1:18:37.103 --> 1:18:40.227
the same as the probability.

1:18:40.320 --> 1:18:48.254
However, they are used in a different way,
so the lower order end drums are only used

1:18:48.254 --> 1:18:49.361
if we have.

1:18:50.410 --> 1:18:54.264
So it's like you're modeling something different.

1:18:54.264 --> 1:19:01.278
You're not modeling how probable this engram
if we haven't seen the larger engram and that

1:19:01.278 --> 1:19:04.361
is tried by the diversity of histories.

1:19:04.944 --> 1:19:14.714
For example, if you look at York, that's a
quite frequent work.

1:19:14.714 --> 1:19:18.530
It occurs as many times.

1:19:19.559 --> 1:19:27.985
However, four hundred seventy three times
it was followed the way before it was mute.

1:19:29.449 --> 1:19:40.237
So if you now think the unigram model is only
used, the probability of York as a unigram

1:19:40.237 --> 1:19:49.947
model should be very, very low because: So
you should have a lower probability for your

1:19:49.947 --> 1:19:56.292
than, for example, for foods, although you
have seen both of them at the same time, and

1:19:56.292 --> 1:20:02.853
this is done by Knesser and Nye Smoothing where
you are not counting the words itself, but

1:20:02.853 --> 1:20:05.377
you count the number of mysteries.

1:20:05.845 --> 1:20:15.233
So how many other way around was it followed
by how many different words were before?

1:20:15.233 --> 1:20:28.232
Then instead of the normal way you count the
words: So you don't need to know all the formulas

1:20:28.232 --> 1:20:28.864
here.

1:20:28.864 --> 1:20:33.498
The more important thing is this intuition.

1:20:34.874 --> 1:20:44.646
More than it means already that I haven't
seen the larger end grammar, and therefore

1:20:44.646 --> 1:20:49.704
it might be better to model it differently.

1:20:49.929 --> 1:20:56.976
So if there's a new engram with something
in New York that's very unprofitable compared

1:20:56.976 --> 1:20:57.297
to.

1:21:00.180 --> 1:21:06.130
And yeah, this modified Kneffer Nice music
is what people took into use.

1:21:06.130 --> 1:21:08.249
That's the fall approach.

1:21:08.728 --> 1:21:20.481
Has an absolute discounting for small and
grams, and then bells smoothing, and for it

1:21:20.481 --> 1:21:27.724
uses the discounting of histories which we
just had.

1:21:28.028 --> 1:21:32.207
And there's even two versions of it, like
the backup and the interpolator.

1:21:32.472 --> 1:21:34.264
So that may be interesting.

1:21:34.264 --> 1:21:40.216
These are here even works well for interpolation,
although your assumption is even no longer

1:21:40.216 --> 1:21:45.592
true because you're using the lower engrams
even if you've seen the higher engrams.

1:21:45.592 --> 1:21:49.113
But since you're then focusing on the higher
engrams,.

1:21:49.929 --> 1:21:53.522
So if you see that some beats on the perfectities,.

1:21:54.754 --> 1:22:00.262
So you see normally what interpolated movement
class of nineties gives you some of the best

1:22:00.262 --> 1:22:00.980
performing.

1:22:02.022 --> 1:22:08.032
You see the larger your end drum than it is
with interpolation.

1:22:08.032 --> 1:22:15.168
You also get significant better so you can
not only look at the last words.

1:22:18.638 --> 1:22:32.725
Good so much for these types of things, and
we will finish with some special things about

1:22:32.725 --> 1:22:34.290
language.

1:22:38.678 --> 1:22:44.225
One thing we talked about the unknown words,
so there is different ways of doing it because

1:22:44.225 --> 1:22:49.409
in all the estimations we were still assuming
mostly that we have a fixed vocabulary.

1:22:50.270 --> 1:23:06.372
So you can often, for example, create an unknown
choken and use that while statistical language.

1:23:06.766 --> 1:23:16.292
It was mainly useful language processing since
newer models are coming, but maybe it's surprising.

1:23:18.578 --> 1:23:30.573
What is also nice is that if you're going
to really hard launch and ramps, it's more

1:23:30.573 --> 1:23:33.114
about efficiency.

1:23:33.093 --> 1:23:37.378
And then you have to remember lock it in your
model.

1:23:37.378 --> 1:23:41.422
In a lot of situations it's not really important.

1:23:41.661 --> 1:23:46.964
It's more about ranking so which one is better
and if they don't sum up to one that's not

1:23:46.964 --> 1:23:47.907
that important.

1:23:47.907 --> 1:23:53.563
Of course then you cannot calculate any perplexity
anymore because if this is not a probability

1:23:53.563 --> 1:23:58.807
mass then the thing we had about the negative
example doesn't fit anymore and that's not

1:23:58.807 --> 1:23:59.338
working.

1:23:59.619 --> 1:24:02.202
However, anification is also very helpful.

1:24:02.582 --> 1:24:13.750
And that is why there is this stupid bag-off
presented remove all this complicated things

1:24:13.750 --> 1:24:14.618
which.

1:24:15.055 --> 1:24:28.055
And it just does once we directly take the
absolute account, and otherwise we're doing.

1:24:28.548 --> 1:24:41.867
Is no longer any discounting anymore, so it's
very, very simple and however they show you

1:24:41.867 --> 1:24:47.935
have to calculate a lot less statistics.

1:24:50.750 --> 1:24:57.525
In addition you can have other type of language
models.

1:24:57.525 --> 1:25:08.412
We had word based language models and they
normally go up to four or five for six brands.

1:25:08.412 --> 1:25:10.831
They are too large.

1:25:11.531 --> 1:25:20.570
So what people have then looked also into
is what is referred to as part of speech language

1:25:20.570 --> 1:25:21.258
model.

1:25:21.258 --> 1:25:29.806
So instead of looking at the word sequence
you're modeling directly the part of speech

1:25:29.806 --> 1:25:30.788
sequence.

1:25:31.171 --> 1:25:34.987
Then of course now you're only being modeling
syntax.

1:25:34.987 --> 1:25:41.134
There's no cemented information anymore in
the paddle speech test but now you might go

1:25:41.134 --> 1:25:47.423
to a larger context link so you can do seven
H or nine grams and then you can write some

1:25:47.423 --> 1:25:50.320
of the long range dependencies in order.

1:25:52.772 --> 1:25:59.833
And there's other things people have done
like cash language models, so the idea in cash

1:25:59.833 --> 1:26:07.052
language model is that yes words that you have
recently seen are more frequently to do are

1:26:07.052 --> 1:26:11.891
more probable to reoccurr if you want to model
the dynamics.

1:26:12.152 --> 1:26:20.734
If you're just talking here, we talked about
language models in my presentation.

1:26:20.734 --> 1:26:23.489
There will be a lot more.

1:26:23.883 --> 1:26:37.213
Can do that by having a dynamic and a static
component, and then you have a dynamic component

1:26:37.213 --> 1:26:41.042
which looks at the bigram.

1:26:41.261 --> 1:26:49.802
And thereby, for example, if you once generate
language model of probability, it's increased

1:26:49.802 --> 1:26:52.924
and you're modeling that problem.

1:26:56.816 --> 1:27:03.114
Said the dynamic component is trained on the
text translated so far.

1:27:04.564 --> 1:27:12.488
To train them what you just have done, there's
no human feet there.

1:27:12.712 --> 1:27:25.466
The speech model all the time and then it
will repeat its errors and that is, of course,.

1:27:25.966 --> 1:27:31.506
A similar idea is people have looked into
trigger language model whereas one word occurs

1:27:31.506 --> 1:27:34.931
then you increase the probability of some other
words.

1:27:34.931 --> 1:27:40.596
So if you're talking about money that will
increase the probability of bank saving account

1:27:40.596 --> 1:27:41.343
dollar and.

1:27:41.801 --> 1:27:47.352
Because then you have to somehow model this
dependency, but it's somehow also an idea of

1:27:47.352 --> 1:27:52.840
modeling long range dependency, because if
one word occurs very often in your document,

1:27:52.840 --> 1:27:58.203
you like somehow like learning which other
words to occur because they are more often

1:27:58.203 --> 1:27:59.201
than by chance.

1:28:02.822 --> 1:28:10.822
Yes, then the last thing is, of course, especially
for languages which are, which are morphologically

1:28:10.822 --> 1:28:11.292
rich.

1:28:11.292 --> 1:28:18.115
You can do something similar to BPE so you
can now do more themes or so, and then more

1:28:18.115 --> 1:28:22.821
the morphine sequence because the morphines
are more often.

1:28:23.023 --> 1:28:26.877
However, the program is opposed that your
sequence length also gets longer.

1:28:27.127 --> 1:28:33.185
And so if they have a four gram language model,
it's not counting the last three words but

1:28:33.185 --> 1:28:35.782
only the last three more films, which.

1:28:36.196 --> 1:28:39.833
So of course then it's a bit challenging and
know how to deal with.

1:28:40.680 --> 1:28:51.350
What about language is finished by the idea
of a position at the end of the world?

1:28:51.350 --> 1:28:58.807
Yeah, but there you can typically do something
like that.

1:28:59.159 --> 1:29:02.157
It is not the one perfect solution.

1:29:02.157 --> 1:29:05.989
You have to do a bit of testing what is best.

1:29:06.246 --> 1:29:13.417
One way of dealing with a large vocabulary
that you haven't seen is to split these words

1:29:13.417 --> 1:29:20.508
into parts and themes that either like more
linguistic motivated in more themes or more

1:29:20.508 --> 1:29:25.826
statistically motivated like we have in the
bike pair and coding.

1:29:28.188 --> 1:29:33.216
The representation of your text is different.

1:29:33.216 --> 1:29:41.197
How you are later doing all the counting and
the statistics is the same.

1:29:41.197 --> 1:29:44.914
What you assume is your sequence.

1:29:45.805 --> 1:29:49.998
That's the same thing for the other things
we had here.

1:29:49.998 --> 1:29:55.390
Here you don't have words, but everything
you're doing is done exactly.

1:29:57.857 --> 1:29:59.457
Some practical issues.

1:29:59.457 --> 1:30:05.646
Typically you're doing things on the lock
and you're adding because mild decline in very

1:30:05.646 --> 1:30:09.819
small values gives you sometimes problems with
calculation.

1:30:10.230 --> 1:30:16.687
Good thing is you don't have to care with
this mostly so there is very good two kids

1:30:16.687 --> 1:30:23.448
like Azarayan or Kendalan which when you can
just give your data and they will train the

1:30:23.448 --> 1:30:30.286
language more then do all the complicated maths
behind that and you are able to run them.

1:30:31.911 --> 1:30:39.894
So what you should keep from today is what
is a language model and how we can do maximum

1:30:39.894 --> 1:30:44.199
training on that and different language models.

1:30:44.199 --> 1:30:49.939
Similar ideas we use for a lot of different
statistical models.

1:30:50.350 --> 1:30:52.267
Where You Always Have the Problem.

1:30:53.233 --> 1:31:01.608
Different way of looking at it and doing it
will do it on Thursday when we will go to language.