File size: 69,444 Bytes
cb71ef5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
2463
2464
2465
2466
2467
2468
2469
2470
2471
2472
2473
2474
2475
2476
2477
2478
2479
2480
2481
2482
2483
2484
2485
2486
2487
2488
2489
2490
2491
2492
2493
2494
2495
2496
2497
2498
2499
2500
2501
2502
2503
2504
2505
2506
2507
2508
2509
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
2524
2525
2526
2527
2528
2529
2530
2531
2532
2533
2534
2535
2536
2537
2538
2539
2540
2541
2542
2543
2544
2545
2546
2547
2548
2549
2550
2551
2552
2553
2554
2555
2556
2557
2558
2559
2560
2561
2562
2563
2564
2565
2566
2567
2568
2569
2570
2571
2572
2573
2574
2575
2576
2577
2578
2579
2580
2581
2582
2583
2584
2585
2586
2587
2588
2589
2590
2591
2592
2593
2594
2595
2596
2597
2598
2599
2600
2601
2602
2603
2604
2605
2606
2607
2608
2609
2610
2611
2612
2613
2614
2615
2616
2617
2618
2619
2620
2621
2622
2623
2624
2625
2626
2627
2628
2629
2630
2631
2632
2633
2634
2635
2636
2637
2638
2639
2640
2641
2642
2643
2644
2645
2646
2647
2648
2649
2650
2651
2652
2653
2654
2655
2656
2657
2658
2659
2660
2661
2662
2663
2664
2665
2666
2667
2668
2669
2670
2671
2672
2673
2674
2675
2676
2677
2678
2679
2680
2681
2682
2683
2684
2685
2686
2687
2688
2689
2690
2691
2692
2693
2694
2695
2696
2697
2698
2699
2700
2701
2702
2703
2704
2705
2706
2707
2708
2709
2710
2711
2712
2713
2714
2715
2716
2717
2718
2719
2720
2721
2722
2723
2724
2725
2726
2727
2728
2729
2730
2731
2732
2733
2734
2735
2736
2737
2738
2739
2740
2741
2742
2743
2744
2745
2746
2747
2748
2749
2750
2751
2752
2753
2754
2755
2756
2757
2758
2759
2760
2761
2762
2763
2764
2765
2766
2767
2768
2769
2770
2771
2772
2773
2774
2775
2776
2777
2778
2779
2780
2781
2782
2783
2784
2785
2786
2787
2788
2789
2790
2791
2792
2793
2794
2795
2796
2797
2798
2799
2800
2801
2802
2803
2804
2805
2806
2807
2808
2809
2810
2811
2812
2813
2814
2815
2816
2817
2818
2819
2820
2821
2822
2823
2824
2825
2826
2827
2828
2829
2830
2831
2832
2833
2834
2835
2836
2837
2838
2839
2840
2841
2842
2843
2844
2845
2846
2847
2848
2849
2850
2851
2852
2853
2854
WEBVTT

0:00:01.121 --> 0:00:14.214
Okay, so welcome to today's lecture, on Tuesday
we started to talk about speech translation.

0:00:14.634 --> 0:00:27.037
And the idea is hopefully an idea of the basic
ideas we have in speech translation, the two

0:00:27.037 --> 0:00:29.464
major approaches.

0:00:29.829 --> 0:00:41.459
And the other one is the end system where
we have one large system which is everything

0:00:41.459 --> 0:00:42.796
together.

0:00:43.643 --> 0:00:58.459
Until now we mainly focus on text output that
we'll see today, but you can extend these ideas

0:00:58.459 --> 0:01:01.138
to other speech.

0:01:01.441 --> 0:01:08.592
But since it's also like a machine translation
lecture, you of course mainly focus a bit on

0:01:08.592 --> 0:01:10.768
the translation challenges.

0:01:12.172 --> 0:01:25.045
And what is the main focus of today's lecture
is to look into why that is challenging speech

0:01:25.045 --> 0:01:26.845
translation.

0:01:27.627 --> 0:01:33.901
So a bit more focus on what is now really
the difference to all you and how we can address.

0:01:34.254 --> 0:01:39.703
We'll start there by with the segmentation
problem.

0:01:39.703 --> 0:01:45.990
We had that already of bits, but especially
for end-to-end.

0:01:46.386 --> 0:01:57.253
So the problem is that until now it was easy
to segment the input into sentences and then

0:01:57.253 --> 0:02:01.842
translate each sentence individually.

0:02:02.442 --> 0:02:17.561
When you're now translating audio, the challenge
is that you have just a sequence of audio input

0:02:17.561 --> 0:02:20.055
and there's no.

0:02:21.401 --> 0:02:27.834
So you have this difference that your audio
is a continuous stream, but the text is typically

0:02:27.834 --> 0:02:28.930
sentence based.

0:02:28.930 --> 0:02:31.667
So how can you match this gap in there?

0:02:31.667 --> 0:02:37.690
We'll see that is really essential, and if
you're not using a decent good system there,

0:02:37.690 --> 0:02:41.249
then you can lose a lot of quality and performance.

0:02:41.641 --> 0:02:44.267
That is what also meant before.

0:02:44.267 --> 0:02:51.734
So if you have a more complex system out of
several units, it's really essential that they

0:02:51.734 --> 0:02:56.658
all work together and it's very easy to lose
significantly.

0:02:57.497 --> 0:03:13.029
The second challenge we'll talk about is disfluencies,
so the style of speaking is very different

0:03:13.029 --> 0:03:14.773
from text.

0:03:15.135 --> 0:03:24.727
So if you translate or TedTalks, that's normally
very good speakers.

0:03:24.727 --> 0:03:30.149
They will give you a very fluent text.

0:03:30.670 --> 0:03:36.692
When you want to translate a lecture, it might
be more difficult or rednested.

0:03:37.097 --> 0:03:39.242
Mean people are not well that well.

0:03:39.242 --> 0:03:42.281
They should be prepared in giving the lecture
and.

0:03:42.362 --> 0:03:48.241
But it's not that I mean, typically a lecture
will have like rehearsal like five times before

0:03:48.241 --> 0:03:52.682
he is giving this lecture, and then like will
it completely be fluent?

0:03:52.682 --> 0:03:56.122
He might at some point notice all this is
not perfect.

0:03:56.122 --> 0:04:00.062
I want to rephrase, and he'll have to sing
during the lecture.

0:04:00.300 --> 0:04:04.049
Might be also good that he's thinking, so
he's not going too fast and things like.

0:04:05.305 --> 0:04:07.933
If you then go to the other extreme, it's
more meetings.

0:04:08.208 --> 0:04:15.430
If you have a lively discussion, of course,
people will interrupt, they will restart, they

0:04:15.430 --> 0:04:22.971
will think while they speak, and you know that
sometimes you tell people first think and speak

0:04:22.971 --> 0:04:26.225
because they are changing their opinion.

0:04:26.606 --> 0:04:31.346
So the question of how can you deal with this?

0:04:31.346 --> 0:04:37.498
And there again it might be solutions for
that, or at least.

0:04:39.759 --> 0:04:46.557
Then for the output we will look into simultaneous
translation that is at least not very important

0:04:46.557 --> 0:04:47.175
in text.

0:04:47.175 --> 0:04:53.699
There might be some cases but normally you
have all text available and then you're translating

0:04:53.699 --> 0:04:54.042
and.

0:04:54.394 --> 0:05:09.220
While for speech translation, since it's often
a life interaction, then of course it's important.

0:05:09.149 --> 0:05:12.378
Otherwise it's hard to follow.

0:05:12.378 --> 0:05:19.463
You see what said five minutes ago and the
slide is not as helpful.

0:05:19.739 --> 0:05:35.627
You have to wait very long before you can
answer because you have to first wait for what

0:05:35.627 --> 0:05:39.197
is happening there.

0:05:40.660 --> 0:05:46.177
And finally, we can talk a bit about presentation.

0:05:46.177 --> 0:05:54.722
For example, mentioned that if you're generating
subtitles, it's not possible.

0:05:54.854 --> 0:06:01.110
So in professional subtitles there are clear
rules.

0:06:01.110 --> 0:06:05.681
Subtitle has to be shown for seconds.

0:06:05.681 --> 0:06:08.929
It's maximum of two lines.

0:06:09.549 --> 0:06:13.156
Because otherwise it's getting too long, it's
not able to read it anymore, and so.

0:06:13.613 --> 0:06:19.826
So if you want to achieve that, of course,
you might have to adjust and select what you

0:06:19.826 --> 0:06:20.390
really.

0:06:23.203 --> 0:06:28.393
The first date starts with the segmentation.

0:06:28.393 --> 0:06:36.351
On the one end it's an issue while training,
on the other hand it's.

0:06:38.678 --> 0:06:47.781
What is the problem so when we train it's
relatively easy to separate our data into sentence

0:06:47.781 --> 0:06:48.466
level.

0:06:48.808 --> 0:07:02.241
So if you have your example, you have the
audio and the text, then you typically know

0:07:02.241 --> 0:07:07.083
that this sentence is aligned.

0:07:07.627 --> 0:07:16.702
You can use these time information to cut
your audio and then you can train and then.

0:07:18.018 --> 0:07:31.775
Because what we need for an enchilada model
is to be an output chart, in this case an audio

0:07:31.775 --> 0:07:32.822
chart.

0:07:33.133 --> 0:07:38.551
And even if this is a long speech, it's easy
then since we have this time information to

0:07:38.551 --> 0:07:39.159
separate.

0:07:39.579 --> 0:07:43.866
But we are using therefore, of course, the
target side information.

0:07:45.865 --> 0:07:47.949
The problem is now in runtime.

0:07:47.949 --> 0:07:49.427
This is not possible.

0:07:49.427 --> 0:07:55.341
Here we can do that based on the calculation
marks and the sentence segmentation on the

0:07:55.341 --> 0:07:57.962
target side because that is splitting.

0:07:57.962 --> 0:08:02.129
But during transcript, during translation
it is not possible.

0:08:02.442 --> 0:08:10.288
Because there is just a long audio signal,
and of course if you have your test data to

0:08:10.288 --> 0:08:15.193
split it into: That has been done for some
experience.

0:08:15.193 --> 0:08:22.840
It's fine, but it's not a realistic scenario
because if you really apply it in real world,

0:08:22.840 --> 0:08:25.949
we won't have a manual segmentation.

0:08:26.266 --> 0:08:31.838
If a human has to do that then he can do the
translation so you want to have a full automatic

0:08:31.838 --> 0:08:32.431
pipeline.

0:08:32.993 --> 0:08:38.343
So the question is how can we deal with this
type of you know?

0:09:09.309 --> 0:09:20.232
So the question is how can we deal with this
time of situation and how can we segment the

0:09:20.232 --> 0:09:23.024
audio into some units?

0:09:23.863 --> 0:09:32.495
And here is one further really big advantage
of a cascaded sauce: Because how is this done

0:09:32.495 --> 0:09:34.259
in a cascade of systems?

0:09:34.259 --> 0:09:38.494
We are splitting the audio with some features
we are doing.

0:09:38.494 --> 0:09:42.094
We can use similar ones which we'll discuss
later.

0:09:42.094 --> 0:09:43.929
Then we run against chin.

0:09:43.929 --> 0:09:48.799
We have the transcript and then we can do
what we talked last about.

0:09:49.069 --> 0:10:02.260
So if you have this is an audio signal and
the training data it was good.

0:10:02.822 --> 0:10:07.951
So here we have a big advantage.

0:10:07.951 --> 0:10:16.809
We can use a different segmentation for the
and for the.

0:10:16.809 --> 0:10:21.316
Why is that a big advantage?

0:10:23.303 --> 0:10:34.067
Will say for a team task is more important
because we can then do the sentence transformation.

0:10:34.955 --> 0:10:37.603
See and Yeah, We Can Do the Same Thing.

0:10:37.717 --> 0:10:40.226
To save us, why is it not as important for
us?

0:10:40.226 --> 0:10:40.814
Are maybe.

0:10:43.363 --> 0:10:48.589
We don't need that much context.

0:10:48.589 --> 0:11:01.099
We only try to restrict the word, but the
context to consider is mainly small.

0:11:03.283 --> 0:11:11.419
Would agree with it in more context, but there
is one more important: its.

0:11:11.651 --> 0:11:16.764
The is monotone, so there's no reordering.

0:11:16.764 --> 0:11:22.472
The second part of the signal is no reordering.

0:11:22.472 --> 0:11:23.542
We have.

0:11:23.683 --> 0:11:29.147
And of course if we are doing that we cannot
really order across boundaries between segments.

0:11:29.549 --> 0:11:37.491
It might be challenging if we split the words
so that it's not perfect for so that.

0:11:37.637 --> 0:11:40.846
But we need to do quite long range reordering.

0:11:40.846 --> 0:11:47.058
If you think about the German where the work
has moved, and now the English work is in one

0:11:47.058 --> 0:11:50.198
part, but the end of the sentence is another.

0:11:50.670 --> 0:11:59.427
And of course this advantage we have now here
that if we have a segment we have.

0:12:01.441 --> 0:12:08.817
And that this segmentation is important.

0:12:08.817 --> 0:12:15.294
Here are some motivations for that.

0:12:15.675 --> 0:12:25.325
What you are doing is you are taking the reference
text and you are segmenting.

0:12:26.326 --> 0:12:30.991
And then, of course, your segments are exactly
yeah cute.

0:12:31.471 --> 0:12:42.980
If you're now using different segmentation
strategies, you're using significantly in blue

0:12:42.980 --> 0:12:44.004
points.

0:12:44.004 --> 0:12:50.398
If the segmentation is bad, you have a lot
worse.

0:12:52.312 --> 0:13:10.323
And interesting, here you ought to see how
it was a human, but people have in a competition.

0:13:10.450 --> 0:13:22.996
You can see that by working on the segmentation
and using better segmentation you can improve

0:13:22.996 --> 0:13:25.398
your performance.

0:13:26.006 --> 0:13:29.932
So it's really essential.

0:13:29.932 --> 0:13:41.712
One other interesting thing is if you're looking
into the difference between.

0:13:42.082 --> 0:13:49.145
So it really seems to be more important to
have a good segmentation for our cascaded system.

0:13:49.109 --> 0:13:56.248
For an intra-end system because there you
can't re-segment while it is less important

0:13:56.248 --> 0:13:58.157
for a cascaded system.

0:13:58.157 --> 0:14:05.048
Of course, it's still important, but the difference
between the two segmentations.

0:14:06.466 --> 0:14:18.391
It was a shared task some years ago like it's
just one system from different.

0:14:22.122 --> 0:14:31.934
So the question is how can we deal with this
in speech translation and what people look

0:14:31.934 --> 0:14:32.604
into?

0:14:32.752 --> 0:14:48.360
Now we want to use different techniques to
split the audio signal into segments.

0:14:48.848 --> 0:14:54.413
You have the disadvantage that you can't change
it.

0:14:54.413 --> 0:15:00.407
Therefore, some of the quality might be more
important.

0:15:00.660 --> 0:15:15.678
But in both cases, of course, the A's are
better if you have a good segmentation.

0:15:17.197 --> 0:15:23.149
So any idea, how would you have this task
now split this audio?

0:15:23.149 --> 0:15:26.219
What type of tool would you use?

0:15:28.648 --> 0:15:41.513
The fuse was a new network to segment half
for instance supervise.

0:15:41.962 --> 0:15:44.693
Yes, that's exactly already the better system.

0:15:44.693 --> 0:15:50.390
So for long time people have done more simple
things because we'll come to that a bit challenging

0:15:50.390 --> 0:15:52.250
as creating or having the data.

0:15:53.193 --> 0:16:00.438
The first thing is you use some tool out of
the box like voice activity detection which

0:16:00.438 --> 0:16:07.189
has been there as a whole research field so
people find when somebody's speaking.

0:16:07.647 --> 0:16:14.952
And then you use that in this different threshold
you always have the ability that somebody's

0:16:14.952 --> 0:16:16.273
speaking or not.

0:16:17.217 --> 0:16:19.889
Then you split your signal.

0:16:19.889 --> 0:16:26.762
It will not be perfect, but you transcribe
or translate each component.

0:16:28.508 --> 0:16:39.337
But as you see, a supervised classification
task is even better, and that is now the most

0:16:39.337 --> 0:16:40.781
common use.

0:16:41.441 --> 0:16:49.909
The supervisor is doing that as a supervisor
classification and then you'll try to use this

0:16:49.909 --> 0:16:50.462
type.

0:16:50.810 --> 0:16:53.217
We're going into a bit more detail on how
to do that.

0:16:53.633 --> 0:17:01.354
So what you need to do first is, of course,
you have to have some labels whether this is

0:17:01.354 --> 0:17:03.089
an end of sentence.

0:17:03.363 --> 0:17:10.588
You do that by using the alignment between
the segments and the audio.

0:17:10.588 --> 0:17:12.013
You have the.

0:17:12.212 --> 0:17:15.365
The two people have not for each word, so
these tank steps.

0:17:15.365 --> 0:17:16.889
This word is said this time.

0:17:17.157 --> 0:17:27.935
This word is said by what you typically have
from this time to time to time.

0:17:27.935 --> 0:17:34.654
We have the second segment, the second segment.

0:17:35.195 --> 0:17:39.051
Which also used to trade for example your
advanced system and everything.

0:17:41.661 --> 0:17:53.715
Based on that you can label each frame in
there so if you have a green or blue that is

0:17:53.715 --> 0:17:57.455
our speech segment so you.

0:17:58.618 --> 0:18:05.690
And these labels will then later help you,
but you extract exactly these types of.

0:18:07.067 --> 0:18:08.917
There's one big challenge.

0:18:08.917 --> 0:18:15.152
If you have two sentences which are directly
connected to each other, then if you're doing

0:18:15.152 --> 0:18:18.715
this labeling, you would not have a break in
later.

0:18:18.715 --> 0:18:23.512
If you tried to extract that, there should
be something great or not.

0:18:23.943 --> 0:18:31.955
So what you typically do is in the last frame.

0:18:31.955 --> 0:18:41.331
You mark as outside, although it's not really
outside.

0:18:43.463 --> 0:18:46.882
Yes, I guess you could also do that in more
of a below check.

0:18:46.882 --> 0:18:48.702
I mean, this is the most simple.

0:18:48.702 --> 0:18:51.514
It's like inside outside, so it's related
to that.

0:18:51.514 --> 0:18:54.988
Of course, you could have an extra startup
segment, and so on.

0:18:54.988 --> 0:18:57.469
I guess this is just to make it more simple.

0:18:57.469 --> 0:19:00.226
You only have two labels, not a street classroom.

0:19:00.226 --> 0:19:02.377
But yeah, you could do similar things.

0:19:12.432 --> 0:19:20.460
Has caused down the roads to problems because
it could be an important part of a segment

0:19:20.460 --> 0:19:24.429
which has some meaning and we do something.

0:19:24.429 --> 0:19:28.398
The good thing is frames are normally very.

0:19:28.688 --> 0:19:37.586
Like some milliseconds, so normally if you
remove some milliseconds you can still understand

0:19:37.586 --> 0:19:38.734
everything.

0:19:38.918 --> 0:19:46.999
Mean the speech signal is very repetitive,
and so you have information a lot of times.

0:19:47.387 --> 0:19:50.730
That's why we talked along there last time
they could try to shrink the steak and.

0:19:51.031 --> 0:20:00.995
If you now have a short sequence where there
is like which would be removed and that's not

0:20:00.995 --> 0:20:01.871
really.

0:20:02.162 --> 0:20:06.585
Yeah, but it's not a full letter is missing.

0:20:06.585 --> 0:20:11.009
It's like only the last ending of the vocal.

0:20:11.751 --> 0:20:15.369
Think it doesn't really happen.

0:20:15.369 --> 0:20:23.056
We have our audio signal and we have these
gags that are not above.

0:20:23.883 --> 0:20:29.288
With this blue rectangulars the inside speech
segment and with the guess it's all set yes.

0:20:29.669 --> 0:20:35.736
So then you have the full signal and you're
meaning now labeling your task as a blue or

0:20:35.736 --> 0:20:36.977
white prediction.

0:20:36.977 --> 0:20:39.252
So that is your prediction task.

0:20:39.252 --> 0:20:44.973
You have the audio signal only and your prediction
task is like label one or zero.

0:20:45.305 --> 0:20:55.585
Once you do that then based on this labeling
you can extract each segment again like each

0:20:55.585 --> 0:20:58.212
consecutive blue area.

0:20:58.798 --> 0:21:05.198
See then removed maybe the non-speaking part
already and duo speech translation only on

0:21:05.198 --> 0:21:05.998
the parts.

0:21:06.786 --> 0:21:19.768
Which is good because the training would have
done similarly.

0:21:20.120 --> 0:21:26.842
So on the noise in between you never saw in
the training, so it's good to throw it away.

0:21:29.649 --> 0:21:34.930
One challenge, of course, is now if you're
doing that, what is your input?

0:21:34.930 --> 0:21:40.704
You cannot do the sequence labeling normally
on the whole talk, so it's too long.

0:21:40.704 --> 0:21:46.759
So if you're doing this prediction of the
label, you also have a window for which you

0:21:46.759 --> 0:21:48.238
do the segmentation.

0:21:48.788 --> 0:21:54.515
And that's the bedline we have in the punctuation
prediction.

0:21:54.515 --> 0:22:00.426
If we don't have good borders, random splits
are normally good.

0:22:00.426 --> 0:22:03.936
So what we do now is split the audio.

0:22:04.344 --> 0:22:09.134
So that would be our input, and then the part
three would be our labels.

0:22:09.269 --> 0:22:15.606
This green would be the input and here we
want, for example, blue labels and then white.

0:22:16.036 --> 0:22:20.360
Here only do labors and here at the beginning
why maybe at the end why.

0:22:21.401 --> 0:22:28.924
So thereby you have now a fixed window always
for which you're doing than this task of predicting.

0:22:33.954 --> 0:22:43.914
How you build your classifier that is based
again.

0:22:43.914 --> 0:22:52.507
We had this wave to be mentioned last week.

0:22:52.752 --> 0:23:00.599
So in training you use labels to say whether
it's in speech or outside speech.

0:23:01.681 --> 0:23:17.740
Inference: You give them always the chance
and then predict whether this part like each

0:23:17.740 --> 0:23:20.843
label is afraid.

0:23:23.143 --> 0:23:29.511
Bit more complicated, so one challenge is
if you randomly split off cognition, losing

0:23:29.511 --> 0:23:32.028
your context for the first brain.

0:23:32.028 --> 0:23:38.692
It might be very hard to predict whether this
is now in or out of, and also for the last.

0:23:39.980 --> 0:23:48.449
You often need a bit of context whether this
is audio or not, and at the beginning.

0:23:49.249 --> 0:23:59.563
So what you do is you put the audio in twice.

0:23:59.563 --> 0:24:08.532
You want to do it with splits and then.

0:24:08.788 --> 0:24:15.996
It is shown you have shifted the two offsets,
so one is predicted with the other offset.

0:24:16.416 --> 0:24:23.647
And then averaging the probabilities so that
at each time you have, at least for one of

0:24:23.647 --> 0:24:25.127
the predictions,.

0:24:25.265 --> 0:24:36.326
Because at the end of the second it might
be very hard to predict whether this is now

0:24:36.326 --> 0:24:39.027
speech or nonspeech.

0:24:39.939 --> 0:24:47.956
Think it is a high parameter, but you are
not optimizing it, so you just take two shifts.

0:24:48.328 --> 0:24:54.636
Of course try a lot of different shifts and
so on.

0:24:54.636 --> 0:24:59.707
The thing is it's mainly a problem here.

0:24:59.707 --> 0:25:04.407
If you don't do two outsets you have.

0:25:05.105 --> 0:25:14.761
You could get better by doing that, but would
be skeptical if it really matters, and also

0:25:14.761 --> 0:25:18.946
have not seen any experience in doing.

0:25:19.159 --> 0:25:27.629
Guess you're already good, you have maybe
some arrows in there and you're getting.

0:25:31.191 --> 0:25:37.824
So with this you have your segmentation.

0:25:37.824 --> 0:25:44.296
However, there is a problem in between.

0:25:44.296 --> 0:25:49.150
Once the model is wrong then.

0:25:49.789 --> 0:26:01.755
The normal thing would be the first thing
that you take some threshold and that you always

0:26:01.755 --> 0:26:05.436
label everything in speech.

0:26:06.006 --> 0:26:19.368
The problem is when you are just doing this
one threshold that you might have.

0:26:19.339 --> 0:26:23.954
Those are the challenges.

0:26:23.954 --> 0:26:31.232
Short segments mean you have no context.

0:26:31.232 --> 0:26:35.492
The policy will be bad.

0:26:37.077 --> 0:26:48.954
Therefore, people use this probabilistic divided
cocker algorithm, so the main idea is start

0:26:48.954 --> 0:26:56.744
with the whole segment, and now you split the
whole segment.

0:26:57.397 --> 0:27:09.842
Then you split there and then you continue
until each segment is smaller than the maximum

0:27:09.842 --> 0:27:10.949
length.

0:27:11.431 --> 0:27:23.161
But you can ignore some splits, and if you
split one segment into two parts you first

0:27:23.161 --> 0:27:23.980
trim.

0:27:24.064 --> 0:27:40.197
So normally it's not only one signal position,
it's a longer area of non-voice, so you try

0:27:40.197 --> 0:27:43.921
to find this longer.

0:27:43.943 --> 0:27:51.403
Now your large segment is split into two smaller
segments.

0:27:51.403 --> 0:27:56.082
Now you are checking these segments.

0:27:56.296 --> 0:28:04.683
So if they are very, very short, it might
be good not to spin at this point because you're

0:28:04.683 --> 0:28:05.697
ending up.

0:28:06.006 --> 0:28:09.631
And this way you continue all the time, and
then hopefully you'll have a good stretch.

0:28:10.090 --> 0:28:19.225
So, of course, there's one challenge with
this approach: if you think about it later,

0:28:19.225 --> 0:28:20.606
low latency.

0:28:25.405 --> 0:28:31.555
So in this case you have to have the full
audio available.

0:28:32.132 --> 0:28:38.112
So you cannot continuously do that mean if
you would do it just always.

0:28:38.112 --> 0:28:45.588
If the probability is higher you split but
in this case you try to find a global optimal.

0:28:46.706 --> 0:28:49.134
A heuristic body.

0:28:49.134 --> 0:28:58.170
You find a global solution for your whole
tar and not a local one.

0:28:58.170 --> 0:29:02.216
Where's the system most sure?

0:29:02.802 --> 0:29:12.467
So that's a bit of a challenge here, but the
advantage of course is that in the end you

0:29:12.467 --> 0:29:14.444
have no segments.

0:29:17.817 --> 0:29:23.716
Any more questions like this.

0:29:23.716 --> 0:29:36.693
Then the next thing is we also need to evaluate
in this scenario.

0:29:37.097 --> 0:29:44.349
So know machine translation is quite a long
way.

0:29:44.349 --> 0:29:55.303
History now was the beginning of the semester,
but hope you can remember.

0:29:55.675 --> 0:30:09.214
Might be with blue score, might be with comment
or similar, but you need to have.

0:30:10.310 --> 0:30:22.335
But this assumes that you have this one-to-one
match, so you always have an output and machine

0:30:22.335 --> 0:30:26.132
translation, which is nicely.

0:30:26.506 --> 0:30:34.845
So then it might be that our output has four
segments, while our reference output has only

0:30:34.845 --> 0:30:35.487
three.

0:30:36.756 --> 0:30:40.649
And now is, of course, questionable like what
should we compare in our metric.

0:30:44.704 --> 0:30:53.087
So it's no longer directly possible to directly
do that because what should you compare?

0:30:53.413 --> 0:31:00.214
Just have four segments there and three segments
there, and of course it seems to be that.

0:31:00.920 --> 0:31:06.373
The first one it likes to the first one when
you see I can't speak Spanish, but you're an

0:31:06.373 --> 0:31:09.099
audience of the guests who is already there.

0:31:09.099 --> 0:31:14.491
So even like just a woman, the blue comparing
wouldn't work, so you need to do something

0:31:14.491 --> 0:31:17.157
about that to take this type of evaluation.

0:31:19.019 --> 0:31:21.727
Still any suggestions what you could do.

0:31:25.925 --> 0:31:44.702
How can you calculate a blue score because
you don't have one you want to see?

0:31:45.925 --> 0:31:49.365
Here you put another layer which spies to
add in the second.

0:31:51.491 --> 0:31:56.979
It's even not aligning only, but that's one
solution, so you need to align and resign.

0:31:57.177 --> 0:32:06.886
Because even if you have no alignment so this
to this and this to that you see that it's

0:32:06.886 --> 0:32:12.341
not good because the audio would compare to
that.

0:32:13.453 --> 0:32:16.967
That we'll discuss is even one simpler solution.

0:32:16.967 --> 0:32:19.119
Yes, it's a simpler solution.

0:32:19.119 --> 0:32:23.135
It's called document based blue or something
like that.

0:32:23.135 --> 0:32:25.717
So you just take the full document.

0:32:26.566 --> 0:32:32.630
For some matrix it's good and it's not clear
how good it is to the other, but there might

0:32:32.630 --> 0:32:32.900
be.

0:32:33.393 --> 0:32:36.454
Think of more simple metrics like blue.

0:32:36.454 --> 0:32:40.356
Do you have any idea what could be a disadvantage?

0:32:49.249 --> 0:32:56.616
Blue is matching ingrams so you start with
the original.

0:32:56.616 --> 0:33:01.270
You check how many ingrams in here.

0:33:01.901 --> 0:33:11.233
If you're not doing that on the full document,
you can also match grams from year to year.

0:33:11.751 --> 0:33:15.680
So you can match things very far away.

0:33:15.680 --> 0:33:21.321
Start doing translation and you just randomly
randomly.

0:33:22.142 --> 0:33:27.938
And that, of course, could be a bit of a disadvantage
or like is a problem, and therefore people

0:33:27.938 --> 0:33:29.910
also look into the segmentation.

0:33:29.910 --> 0:33:34.690
But I've recently seen some things, so document
levels tours are also normally.

0:33:34.690 --> 0:33:39.949
If you have a relatively high quality system
or state of the art, then they also have a

0:33:39.949 --> 0:33:41.801
good correlation of the human.

0:33:46.546 --> 0:33:59.241
So how are we doing that so we are putting
end of sentence boundaries in there and then.

0:33:59.179 --> 0:34:07.486
Alignment based on a similar Livingston distance,
so at a distance between our output and the

0:34:07.486 --> 0:34:09.077
reference output.

0:34:09.449 --> 0:34:13.061
And here is our boundary.

0:34:13.061 --> 0:34:23.482
We map the boundary based on the alignment,
so in Lithuania you only have.

0:34:23.803 --> 0:34:36.036
And then, like all the words that are before,
it might be since there is not a random.

0:34:36.336 --> 0:34:44.890
Mean it should be, but it can happen things
like that, and it's not clear where.

0:34:44.965 --> 0:34:49.727
At the break, however, they are typically
not that bad because they are words which are

0:34:49.727 --> 0:34:52.270
not matching between reference and hypothesis.

0:34:52.270 --> 0:34:56.870
So normally it doesn't really matter that
much because they are anyway not matching.

0:34:57.657 --> 0:35:05.888
And then you take the mule as a T output and
use that to calculate your metric.

0:35:05.888 --> 0:35:12.575
Then it's again a perfect alignment for which
you can calculate.

0:35:14.714 --> 0:35:19.229
Any idea you could do it the other way around.

0:35:19.229 --> 0:35:23.359
You could resigment your reference to the.

0:35:29.309 --> 0:35:30.368
Which one would you select?

0:35:34.214 --> 0:35:43.979
I think segmenting the assertive also is much
more natural because the reference sentence

0:35:43.979 --> 0:35:46.474
is the fixed solution.

0:35:47.007 --> 0:35:52.947
Yes, that's the right motivation if you do
think about blue or so.

0:35:52.947 --> 0:35:57.646
Additionally important if you change your
reference.

0:35:57.857 --> 0:36:07.175
You might have a different number of diagrams
or diagrams because the sentences are different

0:36:07.175 --> 0:36:08.067
lengths.

0:36:08.068 --> 0:36:15.347
Here your five system, you're always comparing
it to the same system, and you don't compare

0:36:15.347 --> 0:36:16.455
to different.

0:36:16.736 --> 0:36:22.317
The only different base of segmentation, but
still it could make some do.

0:36:25.645 --> 0:36:38.974
Good, that's all about sentence segmentation,
then a bit about disfluencies and what there

0:36:38.974 --> 0:36:40.146
really.

0:36:42.182 --> 0:36:51.138
So as said in daily life, you're not speaking
like very nice full sentences every.

0:36:51.471 --> 0:36:53.420
He was speaking powerful sentences.

0:36:53.420 --> 0:36:54.448
We do repetitions.

0:36:54.834 --> 0:37:00.915
It's especially if it's more interactive,
so in meetings, phone calls and so on.

0:37:00.915 --> 0:37:04.519
If you have multiple speakers, they also break.

0:37:04.724 --> 0:37:16.651
Each other, and then if you keep them, they
are harder to translate because most of your

0:37:16.651 --> 0:37:17.991
training.

0:37:18.278 --> 0:37:30.449
It's also very difficult to read, so we'll
have some examples there to transcribe everything

0:37:30.449 --> 0:37:32.543
as it was said.

0:37:33.473 --> 0:37:36.555
What type of things are there?

0:37:37.717 --> 0:37:42.942
So you have all these pillow works.

0:37:42.942 --> 0:37:47.442
These are very easy to remove.

0:37:47.442 --> 0:37:52.957
You can just use regular expressions.

0:37:53.433 --> 0:38:00.139
Is getting more difficult with some other
type of filler works.

0:38:00.139 --> 0:38:03.387
In German you have this or in.

0:38:04.024 --> 0:38:08.473
And these ones you cannot just remove by regular
expression.

0:38:08.473 --> 0:38:15.039
You shouldn't remove all yacht from a text
because it might be very important information

0:38:15.039 --> 0:38:15.768
for well.

0:38:15.715 --> 0:38:19.995
It may be not as important as you are, but
still it might be very important.

0:38:20.300 --> 0:38:24.215
So just removing them is there already more
difficult.

0:38:26.586 --> 0:38:29.162
Then you have these repetitions.

0:38:29.162 --> 0:38:32.596
You have something like mean saw him there.

0:38:32.596 --> 0:38:33.611
There was a.

0:38:34.334 --> 0:38:41.001
And while for the first one that might be
very easy to remove because you just look for

0:38:41.001 --> 0:38:47.821
double, the thing is that the repetition might
not be exactly the same, so there is there

0:38:47.821 --> 0:38:48.199
was.

0:38:48.199 --> 0:38:54.109
So there is already getting a bit more complicated,
of course still possible.

0:38:54.614 --> 0:39:01.929
You can remove Denver so the real sense would
be like to have a ticket to Houston.

0:39:02.882 --> 0:39:13.327
But there the detection, of course, is getting
more challenging as you want to get rid of.

0:39:13.893 --> 0:39:21.699
You don't have the data, of course, which
makes all the tasks harder, but you probably

0:39:21.699 --> 0:39:22.507
want to.

0:39:22.507 --> 0:39:24.840
That's really meaningful.

0:39:24.840 --> 0:39:26.185
Current isn't.

0:39:26.185 --> 0:39:31.120
That is now a really good point and it's really
there.

0:39:31.051 --> 0:39:34.785
The thing about what is your final task?

0:39:35.155 --> 0:39:45.526
If you want to have a transcript reading it,
I'm not sure if we have another example.

0:39:45.845 --> 0:39:54.171
So there it's nicer if you have a clean transfer
and if you see subtitles in, they're also not

0:39:54.171 --> 0:39:56.625
having all the repetitions.

0:39:56.625 --> 0:40:03.811
It's the nice way to shorten but also getting
the structure you cannot even make.

0:40:04.064 --> 0:40:11.407
So in this situation, of course, they might
give you information.

0:40:11.407 --> 0:40:14.745
There is a lot of stuttering.

0:40:15.015 --> 0:40:22.835
So in this case agree it might be helpful
in some way, but meaning reading all the disfluencies

0:40:22.835 --> 0:40:25.198
is getting really difficult.

0:40:25.198 --> 0:40:28.049
If you have the next one, we have.

0:40:28.308 --> 0:40:31.630
That's a very long text.

0:40:31.630 --> 0:40:35.883
You need a bit of time to pass.

0:40:35.883 --> 0:40:39.472
This one is not important.

0:40:40.480 --> 0:40:48.461
It might be nice if you can start reading
from here.

0:40:48.461 --> 0:40:52.074
Let's have a look here.

0:40:52.074 --> 0:40:54.785
Try to read this.

0:40:57.297 --> 0:41:02.725
You can understand it, but think you need
a bit of time to really understand what was.

0:41:11.711 --> 0:41:21.480
And now we have the same text, but you have
highlighted in bold, and not only read the

0:41:21.480 --> 0:41:22.154
bold.

0:41:23.984 --> 0:41:25.995
And ignore everything which is not bold.

0:41:30.250 --> 0:41:49.121
Would assume it's easier to read just the
book part more faster and more faster.

0:41:50.750 --> 0:41:57.626
Yeah, it might be, but I'm not sure we have
a master thesis of that.

0:41:57.626 --> 0:41:59.619
If seen my videos,.

0:42:00.000 --> 0:42:09.875
Of the recordings, I also have it more likely
that it's like a fluent speak and I'm not like

0:42:09.875 --> 0:42:12.318
doing the hesitations.

0:42:12.652 --> 0:42:23.764
Don't know if somebody else has looked into
the Cusera video, but notice that.

0:42:25.005 --> 0:42:31.879
For these videos spoke every minute, three
times or something, and then people were there

0:42:31.879 --> 0:42:35.011
and cutting things and making hopefully.

0:42:35.635 --> 0:42:42.445
And therefore if you want to more achieve
that, of course, no longer exactly what was

0:42:42.445 --> 0:42:50.206
happening, but if it more looks like a professional
video, then you would have to do that and cut

0:42:50.206 --> 0:42:50.998
that out.

0:42:50.998 --> 0:42:53.532
But yeah, there are definitely.

0:42:55.996 --> 0:42:59.008
We're also going to do this thing again.

0:42:59.008 --> 0:43:02.315
First turn is like I'm going to have a very.

0:43:02.422 --> 0:43:07.449
Which in the end they start to slow down just
without feeling as though they're.

0:43:07.407 --> 0:43:10.212
It's a good point for the next.

0:43:10.212 --> 0:43:13.631
There is not the one perfect solution.

0:43:13.631 --> 0:43:20.732
There's some work on destruction removal,
but of course there's also disability.

0:43:20.732 --> 0:43:27.394
Removal is not that easy, so do you just remove
that's in order everywhere.

0:43:27.607 --> 0:43:29.708
But how much like cleaning do you do?

0:43:29.708 --> 0:43:31.366
It's more a continuous thing.

0:43:31.811 --> 0:43:38.211
Is it more really you only remove stuff or
are you also into rephrasing and here is only

0:43:38.211 --> 0:43:38.930
removing?

0:43:39.279 --> 0:43:41.664
But maybe you want to rephrase it.

0:43:41.664 --> 0:43:43.231
That's hearing better.

0:43:43.503 --> 0:43:49.185
So then it's going into what people are doing
in style transfer.

0:43:49.185 --> 0:43:52.419
We are going from a speech style to.

0:43:52.872 --> 0:44:07.632
So there is more continuum, and of course
Airconditioner is not the perfect solution,

0:44:07.632 --> 0:44:10.722
but exactly what.

0:44:15.615 --> 0:44:19.005
Yeah, we're challenging.

0:44:19.005 --> 0:44:30.258
You have examples where the direct copy is
not as hard or is not exactly the same.

0:44:30.258 --> 0:44:35.410
That is, of course, more challenging.

0:44:41.861 --> 0:44:49.889
If it's getting really mean why it's so challenging,
if it's really spontaneous even for the speaker,

0:44:49.889 --> 0:44:55.634
you need maybe even the video to really get
that and at least the audio.

0:45:01.841 --> 0:45:06.025
Yeah what it also depends on.

0:45:06.626 --> 0:45:15.253
The purpose, of course, and very important
thing is the easiest tasks just to removing.

0:45:15.675 --> 0:45:25.841
Of course you have to be very careful because
if you remove some of the not, it's normally

0:45:25.841 --> 0:45:26.958
not much.

0:45:27.227 --> 0:45:33.176
But if you remove too much, of course, that's
very, very bad because you're losing important.

0:45:33.653 --> 0:45:46.176
And this might be even more challenging if
you think about rarer and unseen works.

0:45:46.226 --> 0:45:56.532
So when doing this removal, it's important
to be careful and normally more conservative.

0:46:03.083 --> 0:46:15.096
Of course, also you have to again see if you're
doing that now in a two step approach, not

0:46:15.096 --> 0:46:17.076
an end to end.

0:46:17.076 --> 0:46:20.772
So first you need a remote.

0:46:21.501 --> 0:46:30.230
But you have to somehow sing it in the whole
type line.

0:46:30.230 --> 0:46:36.932
If you learn text or remove disfluencies,.

0:46:36.796 --> 0:46:44.070
But it might be that the ASR system is outputing
something else or that it's more of an ASR

0:46:44.070 --> 0:46:44.623
error.

0:46:44.864 --> 0:46:46.756
So um.

0:46:46.506 --> 0:46:52.248
Just for example, if you do it based on language
modeling scores, it might be that you're just

0:46:52.248 --> 0:46:57.568
the language modeling score because the has
done some errors, so you really have to see

0:46:57.568 --> 0:46:59.079
the combination of that.

0:46:59.419 --> 0:47:04.285
And for example, we had like partial words.

0:47:04.285 --> 0:47:06.496
They are like some.

0:47:06.496 --> 0:47:08.819
We didn't have that.

0:47:08.908 --> 0:47:18.248
So these feelings cannot be that you start
in the middle of the world and then you switch

0:47:18.248 --> 0:47:19.182
because.

0:47:19.499 --> 0:47:23.214
And of course, in text in perfect transcript,
that's very easy to recognize.

0:47:23.214 --> 0:47:24.372
That's not a real word.

0:47:24.904 --> 0:47:37.198
However, when you really do it into an system,
he will normally detect some type of word because

0:47:37.198 --> 0:47:40.747
he only can help the words.

0:47:50.050 --> 0:48:03.450
Example: We should think so if you have this
in the transcript it's easy to detect as a

0:48:03.450 --> 0:48:05.277
disgusting.

0:48:05.986 --> 0:48:11.619
And then, of course, it's more challenging
in a real world example where you have.

0:48:12.492 --> 0:48:29.840
Now to the approaches one thing is to really
put it in between so you put your A's system.

0:48:31.391 --> 0:48:45.139
So what your task is like, so you have this
text and the outputs in this text.

0:48:45.565 --> 0:48:49.605
There is different formulations of that.

0:48:49.605 --> 0:48:54.533
You might not be able to do everything like
that.

0:48:55.195 --> 0:49:10.852
Or do you also allow, for example, rephrasing
for reordering so in text you might have the

0:49:10.852 --> 0:49:13.605
word correctly.

0:49:13.513 --> 0:49:24.201
But the easiest thing is you only do it more
like removing, so some things can be removed.

0:49:29.049 --> 0:49:34.508
Any ideas how to do that this is output.

0:49:34.508 --> 0:49:41.034
You have training data so we have training
data.

0:49:47.507 --> 0:49:55.869
To put in with the spoon you can eat it even
after it is out, but after the machine has.

0:50:00.000 --> 0:50:05.511
Was wearing rocks, so you have not just the
shoes you remove but wearing them as input,

0:50:05.511 --> 0:50:07.578
as disfluent text and as output.

0:50:07.578 --> 0:50:09.207
It should be fueled text.

0:50:09.207 --> 0:50:15.219
It can be before or after recycling as you
said, but you have this type of task, so technically

0:50:15.219 --> 0:50:20.042
how would you address this type of task when
you have to solve this type of.

0:50:24.364 --> 0:50:26.181
That's exactly so.

0:50:26.181 --> 0:50:28.859
That's one way of doing it.

0:50:28.859 --> 0:50:33.068
It's a translation task and you train your.

0:50:33.913 --> 0:50:34.683
Can do.

0:50:34.683 --> 0:50:42.865
Then, of course, the bit of the challenge
is that you automatically allow rephrasing

0:50:42.865 --> 0:50:43.539
stuff.

0:50:43.943 --> 0:50:52.240
Which of the one end is good so you have more
opportunities but it might be also a bad thing

0:50:52.240 --> 0:50:58.307
because if you have more opportunities you
have more opportunities.

0:51:01.041 --> 0:51:08.300
If you want to prevent that, it can also do
more simple labeling, so for each word your

0:51:08.300 --> 0:51:10.693
label should not be removed.

0:51:12.132 --> 0:51:17.658
People have also been looked into parsley.

0:51:17.658 --> 0:51:29.097
You remember maybe the past trees at the beginning
like the structure because the ideas.

0:51:29.649 --> 0:51:45.779
There's also more unsupervised approaches
where you then phrase it as a style transfer

0:51:45.779 --> 0:51:46.892
task.

0:51:50.310 --> 0:51:58.601
At the last point since we have that yes,
it has also been done in an end-to-end fashion

0:51:58.601 --> 0:52:06.519
so that it's really you have as input the audio
signal and output you have than the.

0:52:06.446 --> 0:52:10.750
The text, without influence, is a clearly
clear text.

0:52:11.131 --> 0:52:19.069
You model every single total, which of course
has a big advantage.

0:52:19.069 --> 0:52:25.704
You can use these paralinguistic features,
pauses, and.

0:52:25.705 --> 0:52:34.091
If you switch so you start something then
oh it doesn't work continue differently so.

0:52:34.374 --> 0:52:42.689
So you can easily use in a fashion while in
a cascade approach.

0:52:42.689 --> 0:52:47.497
As we saw there you have text input.

0:52:49.990 --> 0:53:02.389
But on the one end we have again, and in the
more extreme case the problem before was endless.

0:53:02.389 --> 0:53:06.957
Of course there is even less data.

0:53:11.611 --> 0:53:12.837
Good.

0:53:12.837 --> 0:53:30.814
This is all about the input to a very more
person, or maybe if you think about YouTube.

0:53:32.752 --> 0:53:34.989
Talk so this could use be very exciting.

0:53:36.296 --> 0:53:42.016
Is more viewed as style transferred.

0:53:42.016 --> 0:53:53.147
You can use ideas from machine translation
where you have one language.

0:53:53.713 --> 0:53:57.193
So there is ways of trying to do this type
of style transfer.

0:53:57.637 --> 0:54:02.478
Think is definitely also very promising to
make it more and more fluent in a business.

0:54:03.223 --> 0:54:17.974
Because one major issue about all the previous
ones is that you need training data and then

0:54:17.974 --> 0:54:21.021
you need training.

0:54:21.381 --> 0:54:32.966
So I mean, think that we are only really of
data that we have for English.

0:54:32.966 --> 0:54:39.453
Maybe there is a very few data in German.

0:54:42.382 --> 0:54:49.722
Okay, then let's talk about low latency speech.

0:54:50.270 --> 0:55:05.158
So the idea is if we are doing life translation
of a talker, so we want to start out.

0:55:05.325 --> 0:55:23.010
This is possible because there is typically
some kind of monotony in many languages.

0:55:24.504 --> 0:55:29.765
And this is also what, for example, human
interpreters are doing to have a really low

0:55:29.765 --> 0:55:30.071
leg.

0:55:30.750 --> 0:55:34.393
They are even going further.

0:55:34.393 --> 0:55:40.926
They guess what will be the ending of the
sentence.

0:55:41.421 --> 0:55:51.120
Then they can already continue, although it's
not sad it might be needed, but that is even

0:55:51.120 --> 0:55:53.039
more challenging.

0:55:54.714 --> 0:55:58.014
Why is it so difficult?

0:55:58.014 --> 0:56:09.837
There is this train of on the one end for
a and you want to have more context because

0:56:09.837 --> 0:56:14.511
we learn if we have more context.

0:56:15.015 --> 0:56:24.033
And therefore to have more contacts you have
to wait as long as possible.

0:56:24.033 --> 0:56:27.689
The best is to have the full.

0:56:28.168 --> 0:56:35.244
On the other hand, you want to have a low
latency for the user to wait to generate as

0:56:35.244 --> 0:56:35.737
soon.

0:56:36.356 --> 0:56:47.149
So if you're doing no situation you have to
find the best way to start in order to have

0:56:47.149 --> 0:56:48.130
a good.

0:56:48.728 --> 0:56:52.296
There's no longer the perfect solution.

0:56:52.296 --> 0:56:56.845
People will also evaluate what is the translation.

0:56:57.657 --> 0:57:09.942
While it's challenging in German to English,
German has this very nice thing where the prefix

0:57:09.942 --> 0:57:16.607
of the word can be put at the end of the sentence.

0:57:17.137 --> 0:57:24.201
And you only know if the person registers
or cancels his station at the end of the center.

0:57:24.985 --> 0:57:33.690
So if you want to start the translation in
English you need to know at this point is the.

0:57:35.275 --> 0:57:39.993
So you would have to wait until the end of
the year.

0:57:39.993 --> 0:57:42.931
That's not really what you want.

0:57:43.843 --> 0:57:45.795
What happened.

0:57:47.207 --> 0:58:12.550
Other solutions of doing that are: Have been
motivating like how we can do that subject

0:58:12.550 --> 0:58:15.957
object or subject work.

0:58:16.496 --> 0:58:24.582
In German it's not always subject, but there
are relative sentence where you have that,

0:58:24.582 --> 0:58:25.777
so it needs.

0:58:28.808 --> 0:58:41.858
How we can do that is, we'll look today into
three ways of doing that.

0:58:41.858 --> 0:58:46.269
The one is to mitigate.

0:58:46.766 --> 0:58:54.824
And then the IVAR idea is to do retranslating,
and there you can now use the text output.

0:58:54.934 --> 0:59:02.302
So the idea is you translate, and if you later
notice it was wrong then you can retranslate

0:59:02.302 --> 0:59:03.343
and correct.

0:59:03.803 --> 0:59:14.383
Or you can do what is called extremely coding,
so you can generically.

0:59:17.237 --> 0:59:30.382
Let's start with the optimization, so if you
have a sentence, it may reach a conference,

0:59:30.382 --> 0:59:33.040
and in this time.

0:59:32.993 --> 0:59:39.592
So you have a good translation quality while
still having low latency.

0:59:39.699 --> 0:59:50.513
You have an extra model which does your segmentation
before, but your aim is not to have a segmentation.

0:59:50.470 --> 0:59:53.624
But you can somehow measure in training data.

0:59:53.624 --> 0:59:59.863
If do these types of segment lengths, that's
my latency and that's my translation quality,

0:59:59.863 --> 1:00:02.811
and then you can try to search a good way.

1:00:03.443 --> 1:00:20.188
If you're doing that one, it's an extra component,
so you can use your system as it was.

1:00:22.002 --> 1:00:28.373
The other idea is to directly output the first
high processes always, so always when you have

1:00:28.373 --> 1:00:34.201
text or audio we translate, and if we then
have more context available we can update.

1:00:35.015 --> 1:00:50.195
So imagine before, if get an eye register
and there's a sentence continued, then.

1:00:50.670 --> 1:00:54.298
So you change the output.

1:00:54.298 --> 1:01:07.414
Of course, that might be also leading to bad
user experience if you always flicker and change

1:01:07.414 --> 1:01:09.228
your output.

1:01:09.669 --> 1:01:15.329
The bit like human interpreters also are able
to correct, so they're doing a more long text.

1:01:15.329 --> 1:01:20.867
If they are guessing how to continue to say
and then he's saying something different, they

1:01:20.867 --> 1:01:22.510
also have to correct them.

1:01:22.510 --> 1:01:26.831
So here, since it's not all you, we can even
change what we have said.

1:01:26.831 --> 1:01:29.630
Yes, that's exactly what we have implemented.

1:01:31.431 --> 1:01:49.217
So how that works is, we are aware, and then
we translate it, and if we get more input like

1:01:49.217 --> 1:01:51.344
you, then.

1:01:51.711 --> 1:02:00.223
And so we can always continue to do that and
improve the transcript that we have.

1:02:00.480 --> 1:02:07.729
So in the end we have the lowest possible
latency because we always output what is possible.

1:02:07.729 --> 1:02:14.784
On the other hand, introducing a bit of a
new problem is: There's another challenge when

1:02:14.784 --> 1:02:20.061
we first used that this one was first used
for old and that it worked fine.

1:02:20.061 --> 1:02:21.380
You switch to NMT.

1:02:21.380 --> 1:02:25.615
You saw one problem that is even generating
more flickering.

1:02:25.615 --> 1:02:28.878
The problem is the normal machine translation.

1:02:29.669 --> 1:02:35.414
So implicitly learn all the output that always
ends with a dot, and it's always a full sentence.

1:02:36.696 --> 1:02:42.466
And this was even more important somewhere
in the model than really what is in the input.

1:02:42.983 --> 1:02:55.910
So if you give him a partial sentence, it
will still generate a full sentence.

1:02:55.910 --> 1:02:58.201
So encourage.

1:02:58.298 --> 1:03:05.821
It's like trying to just continue it somehow
to a full sentence and if it's doing better

1:03:05.821 --> 1:03:10.555
guessing stuff then you have to even have more
changes.

1:03:10.890 --> 1:03:23.944
So here we have a trained mismatch and that's
maybe more a general important thing that the

1:03:23.944 --> 1:03:28.910
modem might learn a bit different.

1:03:29.289 --> 1:03:32.636
It's always ending with a dog, so you don't
just guess something in general.

1:03:33.053 --> 1:03:35.415
So we have your trained test mismatch.

1:03:38.918 --> 1:03:41.248
And we have a trained test message.

1:03:41.248 --> 1:03:43.708
What is the best way to address that?

1:03:46.526 --> 1:03:51.934
That's exactly the right, so we have to like
train also on that.

1:03:52.692 --> 1:03:55.503
The problem is for particle sentences.

1:03:55.503 --> 1:03:59.611
There's not training data, so it's hard to
find all our.

1:04:00.580 --> 1:04:06.531
Hi, I'm ransom quite easy to generate artificial
pottery scent or at least for the source.

1:04:06.926 --> 1:04:15.367
So you just take, you take all the prefixes
of the source data.

1:04:17.017 --> 1:04:22.794
On the problem of course, with a bit what
do you know lying?

1:04:22.794 --> 1:04:30.845
If you have a sentence, I encourage all of
what should be the right target for that.

1:04:31.491 --> 1:04:45.381
And the constraints on the one hand, it should
be as long as possible, so you always have

1:04:45.381 --> 1:04:47.541
a long delay.

1:04:47.687 --> 1:04:55.556
On the other hand, it should be also a suspect
of the previous ones, and it should be not

1:04:55.556 --> 1:04:57.304
too much inventing.

1:04:58.758 --> 1:05:02.170
A very easy solution works fine.

1:05:02.170 --> 1:05:05.478
You can just do a length space.

1:05:05.478 --> 1:05:09.612
You also take two thirds of the target.

1:05:10.070 --> 1:05:19.626
His learning then implicitly to guess a bit
if you think about the beginning of example.

1:05:20.000 --> 1:05:30.287
This one, if you do two sorts like half, in
this case the target would be eye register.

1:05:30.510 --> 1:05:39.289
So you're doing a bit of implicit guessing,
and if it's getting wrong you have rewriting,

1:05:39.289 --> 1:05:43.581
but you're doing a good amount of guessing.

1:05:49.849 --> 1:05:53.950
In addition, this would be like how it looks
like if it was like.

1:05:53.950 --> 1:05:58.300
If it wasn't a housing game, then the target
could be something like.

1:05:58.979 --> 1:06:02.513
One problem is that you just do that this
way.

1:06:02.513 --> 1:06:04.619
It's most of your training.

1:06:05.245 --> 1:06:11.983
And in the end you're interested in the overall
translation quality, so for full sentence.

1:06:11.983 --> 1:06:19.017
So if you train on that, it will mainly learn
how to translate prefixes because ninety percent

1:06:19.017 --> 1:06:21.535
or more of your data is prefixed.

1:06:22.202 --> 1:06:31.636
That's why we'll see that it's better to do
like a ratio.

1:06:31.636 --> 1:06:39.281
So half your training data are full sentences.

1:06:39.759 --> 1:06:47.693
Because if you're doing this well you see
that for every word prefix and only one sentence.

1:06:48.048 --> 1:06:52.252
You also see that nicely here here are both.

1:06:52.252 --> 1:06:56.549
This is the blue scores and you see the bass.

1:06:58.518 --> 1:06:59.618
Is this one?

1:06:59.618 --> 1:07:03.343
It has a good quality because it's trained.

1:07:03.343 --> 1:07:11.385
If you know, train with all the partial sentences
is more focusing on how to translate partial

1:07:11.385 --> 1:07:12.316
sentences.

1:07:12.752 --> 1:07:17.840
Because all the partial sentences will at
some point be removed, because at the end you

1:07:17.840 --> 1:07:18.996
translate the full.

1:07:20.520 --> 1:07:24.079
There's many tasks to read, but you have the
same performances.

1:07:24.504 --> 1:07:26.938
On the other hand, you see here the other
problem.

1:07:26.938 --> 1:07:28.656
This is how many words got updated.

1:07:29.009 --> 1:07:31.579
You want to have as few updates as possible.

1:07:31.579 --> 1:07:34.891
Updates need to remove things which are once
being shown.

1:07:35.255 --> 1:07:40.538
This is quite high for the baseline.

1:07:40.538 --> 1:07:50.533
If you know the partials that are going down,
they should be removed.

1:07:51.151 --> 1:07:58.648
And then for moody tasks you have a bit like
the best note of swim.

1:08:02.722 --> 1:08:05.296
Any more questions to this type of.

1:08:09.309 --> 1:08:20.760
The last thing is that you want to do an extremely.

1:08:21.541 --> 1:08:23.345
Again, it's a bit implication.

1:08:23.345 --> 1:08:25.323
Scenario is what you really want.

1:08:25.323 --> 1:08:30.211
As you said, we sometimes use this updating,
and for text output it'd be very nice.

1:08:30.211 --> 1:08:35.273
But imagine if you want to audio output, of
course you can't change it anymore because

1:08:35.273 --> 1:08:37.891
on one side you cannot change what was said.

1:08:37.891 --> 1:08:40.858
So in this time you more need like a fixed
output.

1:08:41.121 --> 1:08:47.440
And then the style of street decoding is interesting.

1:08:47.440 --> 1:08:55.631
Where you, for example, get sourced, the seagullins
are so stoked in.

1:08:55.631 --> 1:09:00.897
Then you decide oh, now it's better to wait.

1:09:01.041 --> 1:09:14.643
So you somehow need to have this type of additional
information.

1:09:15.295 --> 1:09:23.074
Here you have to decide should know I'll put
a token or should wait for my and feel.

1:09:26.546 --> 1:09:32.649
So you have to do this additional labels like
weight, weight, output, output, wage and so

1:09:32.649 --> 1:09:32.920
on.

1:09:33.453 --> 1:09:38.481
There are different ways of doing that.

1:09:38.481 --> 1:09:45.771
You can have an additional model that does
this decision.

1:09:46.166 --> 1:09:53.669
And then have a higher quality or better to
continue and then have a lower latency in this

1:09:53.669 --> 1:09:54.576
different.

1:09:55.215 --> 1:09:59.241
Surprisingly, a very easy task also works,
sometimes quite good.

1:10:03.043 --> 1:10:10.981
And that is the so called way care policy
and the idea is there at least for text to

1:10:10.981 --> 1:10:14.623
text translation that is working well.

1:10:14.623 --> 1:10:22.375
It's like you wait for words and then you
always output one and like one for each.

1:10:22.682 --> 1:10:28.908
So your weight slow works at the beginning
of the sentence, and every time a new board

1:10:28.908 --> 1:10:29.981
is coming you.

1:10:31.091 --> 1:10:39.459
So you have the same times to beat as input,
so you're not legging more or less, but to

1:10:39.459 --> 1:10:41.456
have enough context.

1:10:43.103 --> 1:10:49.283
Of course this for example for the unmarried
will not solve it perfectly but if you have

1:10:49.283 --> 1:10:55.395
a bit of local reordering inside your token
that you can manage very well and then it's

1:10:55.395 --> 1:10:57.687
a very simple solution but it's.

1:10:57.877 --> 1:11:00.481
The other one was dynamic.

1:11:00.481 --> 1:11:06.943
Depending on the context you can decide how
long you want to wait.

1:11:07.687 --> 1:11:21.506
It also only works if you have a similar amount
of tokens, so if your target is very short

1:11:21.506 --> 1:11:22.113
of.

1:11:22.722 --> 1:11:28.791
That's why it's also more challenging for
audio input because the speaking rate is changing

1:11:28.791 --> 1:11:29.517
and so on.

1:11:29.517 --> 1:11:35.586
You would have to do something like I'll output
a word for every second a year or something

1:11:35.586 --> 1:11:35.981
like.

1:11:36.636 --> 1:11:45.459
The problem is that the audio speaking speed
is not like fixed but quite very, and therefore.

1:11:50.170 --> 1:11:58.278
Therefore, what you can also do is you can
use a similar solution than we had before with

1:11:58.278 --> 1:11:59.809
the resetteling.

1:12:00.080 --> 1:12:02.904
You remember we were re-decoded all the time.

1:12:03.423 --> 1:12:12.253
And you can do something similar in this case
except that you add something in that you're

1:12:12.253 --> 1:12:16.813
saying, oh, if I read it cold, I'm not always.

1:12:16.736 --> 1:12:22.065
Can decode as I want, but you can do this
target prefix decoding, so what you say is

1:12:22.065 --> 1:12:23.883
in your achievement section.

1:12:23.883 --> 1:12:26.829
You can easily say generate a translation
bus.

1:12:27.007 --> 1:12:29.810
The translation has to start with the prefix.

1:12:31.251 --> 1:12:35.350
How can you do that?

1:12:39.839 --> 1:12:49.105
In the decoder exactly you start, so if you
do beam search you select always the most probable.

1:12:49.349 --> 1:12:57.867
And now you say oh, I'm not selecting the
most perfect, but this is the fourth, so in

1:12:57.867 --> 1:13:04.603
the first step have to take this one, in the
second start decoding.

1:13:04.884 --> 1:13:09.387
And then you're making sure that your second
always starts with this prefix.

1:13:10.350 --> 1:13:18.627
And then you can use your immediate retranslation,
but you're no longer changing the output.

1:13:19.099 --> 1:13:31.595
Out as it works, so it may get a speech signal
and input, and it is not outputing any.

1:13:32.212 --> 1:13:45.980
So then if you got you get a translation maybe
and then you decide yes output.

1:13:46.766 --> 1:13:54.250
And then you're translating as one as two
as sweet as four, but now you say generate

1:13:54.250 --> 1:13:55.483
only outputs.

1:13:55.935 --> 1:14:07.163
And then you're translating and maybe you're
deciding on and now a good translation.

1:14:07.163 --> 1:14:08.880
Then you're.

1:14:09.749 --> 1:14:29.984
Yes, but don't get to worry about what the
effect is.

1:14:30.050 --> 1:14:31.842
We're generating your target text.

1:14:32.892 --> 1:14:36.930
But we're not always outputing the full target
text now.

1:14:36.930 --> 1:14:43.729
What we are having is we have here some strategy
to decide: Oh, is a system already sure enough

1:14:43.729 --> 1:14:44.437
about it?

1:14:44.437 --> 1:14:49.395
If it's sure enough and it has all the information,
we can output it.

1:14:49.395 --> 1:14:50.741
And then the next.

1:14:51.291 --> 1:14:55.931
If we say here sometimes with better not to
get output we won't output it already.

1:14:57.777 --> 1:15:06.369
And thereby the hope is in the uphill model
should not yet outcut a register because it

1:15:06.369 --> 1:15:10.568
doesn't mean no yet if it's a case or not.

1:15:13.193 --> 1:15:18.056
So what we have to discuss is what is a good
output strategy.

1:15:18.658 --> 1:15:20.070
So you could do.

1:15:20.070 --> 1:15:23.806
The output strategy could be something like.

1:15:23.743 --> 1:15:39.871
If you think of weight cape, this is an output
strategy here that you always input.

1:15:40.220 --> 1:15:44.990
Good, and you can view your weight in a similar
way as.

1:15:45.265 --> 1:15:55.194
But now, of course, we can also look at other
output strategies where it's more generic and

1:15:55.194 --> 1:15:59.727
it's deciding whether in some situations.

1:16:01.121 --> 1:16:12.739
And one thing that works quite well is referred
to as local agreement, and that means you're

1:16:12.739 --> 1:16:13.738
always.

1:16:14.234 --> 1:16:26.978
Then you're looking what is the same thing
between my current translation and the one

1:16:26.978 --> 1:16:28.756
did before.

1:16:29.349 --> 1:16:31.201
So let's do that again in six hours.

1:16:31.891 --> 1:16:45.900
So your input is a first audio segment and
your title text is all model trains.

1:16:46.346 --> 1:16:53.231
Then you're getting six opposites, one and
two, and this time the output is all models.

1:16:54.694 --> 1:17:08.407
You see trains are different, but both of
them agree that it's all so in those cases.

1:17:09.209 --> 1:17:13.806
So we can be hopefully a big show that really
starts with all.

1:17:15.155 --> 1:17:22.604
So now we say we're output all, so at this
time instead we'll output all, although before.

1:17:23.543 --> 1:17:27.422
We are getting one, two, three as input.

1:17:27.422 --> 1:17:35.747
This time we have a prefix, so now we are
only allowing translations to start with all.

1:17:35.747 --> 1:17:42.937
We cannot change that anymore, so we now need
to generate some translation.

1:17:43.363 --> 1:17:46.323
And then it can be that its now all models
are run.

1:17:47.927 --> 1:18:01.908
Then we compare here and see this agrees on
all models so we can output all models.

1:18:02.882 --> 1:18:07.356
So this by we can dynamically decide is a
model is very anxious.

1:18:07.356 --> 1:18:10.178
We always talk with something different.

1:18:11.231 --> 1:18:24.872
Then it's, we'll wait longer, it's more for
the same thing, and hope we don't need to wait.

1:18:30.430 --> 1:18:40.238
Is it clear again that the signal wouldn't
be able to detect?

1:18:43.203 --> 1:18:50.553
The hope it is because if it's not sure of,
of course, it in this kind would have to switch

1:18:50.553 --> 1:18:51.671
all the time.

1:18:56.176 --> 1:19:01.375
So if it would be the first step to register
and the second time to cancel and they may

1:19:01.375 --> 1:19:03.561
register again, they wouldn't do it.

1:19:03.561 --> 1:19:08.347
Of course, it is very short because in register
a long time, then it can't deal.

1:19:08.568 --> 1:19:23.410
That's why there's two parameters that you
can use and which might be important, or how.

1:19:23.763 --> 1:19:27.920
So you do it like every one second, every
five seconds or something like that.

1:19:28.648 --> 1:19:37.695
Put it more often as your latency will be
because your weight is less long, but also

1:19:37.695 --> 1:19:39.185
you might do.

1:19:40.400 --> 1:19:50.004
So that is the one thing and the other thing
is for words you might do everywhere, but if

1:19:50.004 --> 1:19:52.779
you think about audio it.

1:19:53.493 --> 1:20:04.287
And the other question you can do like the
agreement, so the model is sure.

1:20:04.287 --> 1:20:10.252
If you say have to agree, then hopefully.

1:20:10.650 --> 1:20:21.369
What we saw is think there has been a really
normally good performance and otherwise your

1:20:21.369 --> 1:20:22.441
latency.

1:20:22.963 --> 1:20:42.085
Okay, we'll just make more tests and we'll
get the confidence.

1:20:44.884 --> 1:20:47.596
Have to completely agree with that.

1:20:47.596 --> 1:20:53.018
So when this was done, that was our first
idea of using the confidence.

1:20:53.018 --> 1:21:00.248
The problem is that currently that's my assumption
is that the modeling the model confidence is

1:21:00.248 --> 1:21:03.939
not that easy, and they are often overconfident.

1:21:04.324 --> 1:21:17.121
In the paper there is this type also where
you try to use the confidence in some way to

1:21:17.121 --> 1:21:20.465
decide the confidence.

1:21:21.701 --> 1:21:26.825
But that gave worse results, and that's why
we looked into that.

1:21:27.087 --> 1:21:38.067
So it's a very good idea think, but it seems
not to at least how it was implemented.

1:21:38.959 --> 1:21:55.670
There is one way that maybe goes in more direction,
which is very new.

1:21:55.455 --> 1:22:02.743
If this one, the last word is attending mainly
to the end of the audio.

1:22:02.942 --> 1:22:04.934
You might you should not output it yet.

1:22:05.485 --> 1:22:15.539
Because they might think there is something
more missing than you need to know, so they

1:22:15.539 --> 1:22:24.678
look at the attention and only output parts
which look to not the audio signal.

1:22:25.045 --> 1:22:40.175
So there is, of course, a lot of ways how
you can do it better or easier in some way.

1:22:41.901 --> 1:22:53.388
Instead tries to predict the next word with
a large language model, and then for text translation

1:22:53.388 --> 1:22:54.911
you predict.

1:22:55.215 --> 1:23:01.177
Then you translate all of them and decide
if there is a change so you can even earlier

1:23:01.177 --> 1:23:02.410
do your decision.

1:23:02.362 --> 1:23:08.714
The idea is that if we continue and then this
will be to a change in the translation, then

1:23:08.714 --> 1:23:10.320
we should have opened.

1:23:10.890 --> 1:23:18.302
So it's more doing your estimate about possible
continuations of the source instead of looking

1:23:18.302 --> 1:23:19.317
at previous.

1:23:23.783 --> 1:23:31.388
All that works is a bit here like one example.

1:23:31.388 --> 1:23:39.641
It has a legacy baselines and you are not
putting.

1:23:40.040 --> 1:23:47.041
And you see in this case you have worse blood
scores here.

1:23:47.041 --> 1:23:51.670
For equal one you have better latency.

1:23:52.032 --> 1:24:01.123
The how to and how does anybody have an idea
of what could be challenging there or when?

1:24:05.825 --> 1:24:20.132
One problem of these models are hallucinations,
and often very long has a negative impact on.

1:24:24.884 --> 1:24:30.869
If you don't remove the last four words but
your model now starts to hallucinate and invent

1:24:30.869 --> 1:24:37.438
just a lot of new stuff then yeah you're removing
the last four words of that but if it has invented

1:24:37.438 --> 1:24:41.406
ten words and you're still outputting six of
these invented.

1:24:41.982 --> 1:24:48.672
Typically once it starts hallucination generating
some output, it's quite long, so then it's

1:24:48.672 --> 1:24:50.902
no longer enough to just hold.

1:24:51.511 --> 1:24:57.695
And then, of course, a bit better if you compare
to the previous ones.

1:24:57.695 --> 1:25:01.528
Their destinations are typically different.

1:25:07.567 --> 1:25:25.939
Yes, so we don't talk about the details, but
for outputs, for presentations, there's different

1:25:25.939 --> 1:25:27.100
ways.

1:25:27.347 --> 1:25:36.047
So you want to have maximum two lines, maximum
forty-two characters per line, and the reading

1:25:36.047 --> 1:25:40.212
speed is a maximum of twenty-one characters.

1:25:40.981 --> 1:25:43.513
How to Do That We Can Skip.

1:25:43.463 --> 1:25:46.804
Then you can generate something like that.

1:25:46.886 --> 1:25:53.250
Another challenge is, of course, that you
not only need to generate the translation,

1:25:53.250 --> 1:25:59.614
but for subtlyning you also want to generate
when to put breaks and what to display.

1:25:59.619 --> 1:26:06.234
Because it cannot be full sentences, as said
here, if you have like maximum twenty four

1:26:06.234 --> 1:26:10.443
characters per line, that's not always a full
sentence.

1:26:10.443 --> 1:26:12.247
So how can you make it?

1:26:13.093 --> 1:26:16.253
And then for speech there's not even a hint
of wisdom.

1:26:18.398 --> 1:26:27.711
So what we have done today is yeah, we looked
into maybe three challenges: We have this segmentation,

1:26:27.711 --> 1:26:33.013
which is a challenge both in evaluation and
in the decoder.

1:26:33.013 --> 1:26:40.613
We talked about disfluencies and we talked
about simultaneous translations and how to

1:26:40.613 --> 1:26:42.911
address these challenges.

1:26:43.463 --> 1:26:45.507
Any more questions.

1:26:48.408 --> 1:26:52.578
Good then new content.

1:26:52.578 --> 1:26:58.198
We are done for this semester.

1:26:58.198 --> 1:27:04.905
You can keep your knowledge in that.

1:27:04.744 --> 1:27:09.405
Repetition where we can try to repeat a bit
what we've done all over the semester.

1:27:10.010 --> 1:27:13.776
Now prepare a bit of repetition to what think
is important.

1:27:14.634 --> 1:27:21.441
But of course is also the chance for you to
ask specific questions.

1:27:21.441 --> 1:27:25.445
It's not clear to me how things relate.

1:27:25.745 --> 1:27:34.906
So if you have any specific questions, please
come to me or send me an email or so, then

1:27:34.906 --> 1:27:36.038
I'm happy.

1:27:36.396 --> 1:27:46.665
If should focus on it really in depth, it
might be good not to come and send me an email

1:27:46.665 --> 1:27:49.204
on Wednesday evening.