WEBVTT 0:00:01.561 --> 0:00:05.186 Okay So Um. 0:00:08.268 --> 0:00:17.655 Welcome to today's presentation of the second class and machine translation where we'll today 0:00:17.655 --> 0:00:25.044 do a bit of a specific topic and we'll talk about linguistic backgrounds. 0:00:26.226 --> 0:00:34.851 Will cover their three different parts of the lecture. 0:00:35.615 --> 0:00:42.538 We'll do first a very, very brief introduction about linguistic background in a way that what 0:00:42.538 --> 0:00:49.608 is language, what are ways of describing language, what are a bit serious behind it, very, very 0:00:49.608 --> 0:00:50.123 short. 0:00:50.410 --> 0:00:57.669 Don't know some of you have listened, think to NLP in the last semester or so. 0:00:58.598 --> 0:01:02.553 So there we did a lot longer explanation. 0:01:02.553 --> 0:01:08.862 Here is just because we are not talking about machine translation. 0:01:09.109 --> 0:01:15.461 So it's really focused on the parts which are important when we talk about machine translation. 0:01:15.755 --> 0:01:19.377 Though for everybody who has listened to that already, it's a bit of a repetition. 0:01:19.377 --> 0:01:19.683 Maybe. 0:01:19.980 --> 0:01:23.415 But it's really trying to look. 0:01:23.415 --> 0:01:31.358 These are properties of languages and how can they influence translation. 0:01:31.671 --> 0:01:38.928 We'll use that in the second part to discuss why is machine translation more from what we 0:01:38.928 --> 0:01:40.621 know about language. 0:01:40.940 --> 0:01:47.044 We will see that I mean there's two main things is that the language might express ideas and 0:01:47.044 --> 0:01:53.279 information differently, and if they are expressed different in different languages we have to 0:01:53.279 --> 0:01:54.920 do somehow the transfer. 0:01:55.135 --> 0:02:02.771 And it's not purely that we know there's words used for it, but it's not that simple and very 0:02:02.771 --> 0:02:03.664 different. 0:02:04.084 --> 0:02:10.088 And the other problem we mentioned last time about biases is that there's not always the 0:02:10.088 --> 0:02:12.179 same amount of information in. 0:02:12.592 --> 0:02:18.206 So it can be that there's some more information in the one or you can't express that few information 0:02:18.206 --> 0:02:19.039 on the target. 0:02:19.039 --> 0:02:24.264 We had that also, for example, with the example with the rice plant in Germany, we would just 0:02:24.264 --> 0:02:24.820 say rice. 0:02:24.904 --> 0:02:33.178 Or in English, while in other countries you have to distinguish between rice plant or rice 0:02:33.178 --> 0:02:33.724 as a. 0:02:34.194 --> 0:02:40.446 And then it's not always possible to directly infer this on the surface. 0:02:41.781 --> 0:02:48.501 And if we make it to the last point otherwise we'll do that next Tuesday or we'll partly 0:02:48.501 --> 0:02:55.447 do it only here is like we'll describe briefly the three main approaches on a rule based so 0:02:55.447 --> 0:02:59.675 linguistic motivated ways of doing machine translation. 0:02:59.779 --> 0:03:03.680 We mentioned them last time like the direct translation. 0:03:03.680 --> 0:03:10.318 The translation by transfer the lingua interlingua bass will do that a bit more in detail today. 0:03:10.590 --> 0:03:27.400 But very briefly because this is not a focus of this class and then next week because. 0:03:29.569 --> 0:03:31.757 Why do we think this is important? 0:03:31.757 --> 0:03:37.259 On the one hand, of course, we are dealing with natural language, so therefore it might 0:03:37.259 --> 0:03:43.074 be good to spend a bit of time in understanding what we are really dealing with because this 0:03:43.074 --> 0:03:45.387 is challenging these other problems. 0:03:45.785 --> 0:03:50.890 And on the other hand, this was the first way of how we're doing machine translation. 0:03:51.271 --> 0:04:01.520 Therefore, it's interesting to understand what was the idea behind that and also to later 0:04:01.520 --> 0:04:08.922 see what is done differently and to understand when some models. 0:04:13.453 --> 0:04:20.213 When we're talking about linguistics, we can of course do that on different levels and there's 0:04:20.213 --> 0:04:21.352 different ways. 0:04:21.521 --> 0:04:26.841 On the right side here you are seeing the basic levels of linguistics. 0:04:27.007 --> 0:04:31.431 So we have at the bottom the phonetics and phonology. 0:04:31.431 --> 0:04:38.477 Phones will not cover this year because we are mainly focusing on text input where we 0:04:38.477 --> 0:04:42.163 are directly having directors and then work. 0:04:42.642 --> 0:04:52.646 Then what we touch today, at least mention what it is, is a morphology which is the first 0:04:52.646 --> 0:04:53.424 level. 0:04:53.833 --> 0:04:59.654 Already mentioned it a bit on Tuesday that of course there are some languages where this 0:04:59.654 --> 0:05:05.343 is very, very basic and there is not really a lot of rules of how you can build words. 0:05:05.343 --> 0:05:11.099 But since I assume you all have some basic knowledge of German there is like a lot more 0:05:11.099 --> 0:05:12.537 challenges than that. 0:05:13.473 --> 0:05:20.030 You know, maybe if you're a native speaker that's quite easy and everything is clear, 0:05:20.030 --> 0:05:26.969 but if you have to learn it like the endings of a word, we are famous for doing compositar 0:05:26.969 --> 0:05:29.103 and putting words together. 0:05:29.103 --> 0:05:31.467 So this is like the first lab. 0:05:32.332 --> 0:05:40.268 Then we have the syntax, which is both on the word and on the sentence level, and that's 0:05:40.268 --> 0:05:43.567 about the structure of the sentence. 0:05:43.567 --> 0:05:46.955 What are the functions of some words? 0:05:47.127 --> 0:05:51.757 You might remember part of speech text from From Your High School Time. 0:05:51.757 --> 0:05:57.481 There is like noun and adjective and and things like that and this is something helpful. 0:05:57.737 --> 0:06:03.933 Just imagine in the beginning that it was not only used for rule based but for statistical 0:06:03.933 --> 0:06:10.538 machine translation, for example, the reordering between languages was quite a challenging task. 0:06:10.770 --> 0:06:16.330 Especially if you have long range reorderings and their part of speech information is very 0:06:16.330 --> 0:06:16.880 helpful. 0:06:16.880 --> 0:06:20.301 You know, in German you have to move the word the verb. 0:06:20.260 --> 0:06:26.599 To the second position, if you have Spanish you have to change the noun and the adjective 0:06:26.599 --> 0:06:30.120 so information from part of speech could be very. 0:06:30.410 --> 0:06:38.621 Then you have a syntax base structure where you have a full syntax tree in the beginning 0:06:38.621 --> 0:06:43.695 and then it came into statistical machine translation. 0:06:44.224 --> 0:06:50.930 And it got more and more important for statistical machine translation that you are really trying 0:06:50.930 --> 0:06:53.461 to model the whole syntax tree of a. 0:06:53.413 --> 0:06:57.574 Sentence in order to better match how to do that in UM. 0:06:57.574 --> 0:07:04.335 In the target language, a bit yeah, the syntax based statistical machine translation had a 0:07:04.335 --> 0:07:05.896 bitter of a problem. 0:07:05.896 --> 0:07:08.422 It got better and better and was. 0:07:08.368 --> 0:07:13.349 Just on the way of getting better in some languages than traditional statistical models. 0:07:13.349 --> 0:07:18.219 But then the neural models came up and they were just so much better in modelling that 0:07:18.219 --> 0:07:19.115 all implicitly. 0:07:19.339 --> 0:07:23.847 So that they are never were used in practice so much. 0:07:24.304 --> 0:07:34.262 And then we'll talk about the semantics, so what is the meaning of the words? 0:07:34.262 --> 0:07:40.007 Last time words can have different meanings. 0:07:40.260 --> 0:07:46.033 And yeah, how you represent meaning of cause is very challenging. 0:07:45.966 --> 0:07:53.043 And normally that like formalizing this is typically done in quite limited domains because 0:07:53.043 --> 0:08:00.043 like doing that for like all possible words has not really been achieved yet in this very 0:08:00.043 --> 0:08:00.898 challenge. 0:08:02.882 --> 0:08:09.436 About pragmatics, so pragmatics is then what is meaning in the context of the current situation. 0:08:09.789 --> 0:08:16.202 So one famous example is there, for example, if you say the light is red. 0:08:16.716 --> 0:08:21.795 The traffic light is red so that typically not you don't want to tell the other person 0:08:21.795 --> 0:08:27.458 if you're sitting in a car that it's surprising oh the light is red but typically you're meaning 0:08:27.458 --> 0:08:30.668 okay you should stop and you shouldn't pass the light. 0:08:30.850 --> 0:08:40.994 So the meaning of this sentence, the light, is red in the context of sitting in the car. 0:08:42.762 --> 0:08:51.080 So let's start with the morphology so that with the things we are starting there and one 0:08:51.080 --> 0:08:53.977 easy and first thing is there. 0:08:53.977 --> 0:09:02.575 Of course we have to split the sentence into words or joint directors so that we have word. 0:09:02.942 --> 0:09:09.017 Because in most of our work we'll deal like machine translation with some type of words. 0:09:09.449 --> 0:09:15.970 In neuromachine translation, people are working also on director based and subwords, but a 0:09:15.970 --> 0:09:20.772 basic unique words of the sentence is a very important first step. 0:09:21.421 --> 0:09:32.379 And for many languages that is quite simple in German, it's not that hard to determine 0:09:32.379 --> 0:09:33.639 the word. 0:09:34.234 --> 0:09:46.265 In tokenization, the main challenge is if we are doing corpus-based methods that we are 0:09:46.265 --> 0:09:50.366 also dealing as normal words. 0:09:50.770 --> 0:10:06.115 And there of course it's getting a bit more challenging. 0:10:13.173 --> 0:10:17.426 So that is maybe the main thing where, for example, in Germany, if you think of German 0:10:17.426 --> 0:10:19.528 tokenization, it's easy to get every word. 0:10:19.779 --> 0:10:26.159 You split it at a space, but then you would have the dots at the end join to the last word, 0:10:26.159 --> 0:10:30.666 and of course that you don't want because it's a different word. 0:10:30.666 --> 0:10:37.046 The last word would not be go, but go dot, but what you can do is split up the dots always. 0:10:37.677 --> 0:10:45.390 Can you really do that always or it might be sometimes better to keep the dot as a point? 0:10:47.807 --> 0:10:51.001 For example, email addresses or abbreviations here. 0:10:51.001 --> 0:10:56.284 For example, doctor, maybe it doesn't make sense to split up the dot because then you 0:10:56.284 --> 0:11:01.382 would assume all year starts a new sentence, but it's just the DR dot from doctor. 0:11:01.721 --> 0:11:08.797 Or if you have numbers like he's a seventh person like the zipter, then you don't want 0:11:08.797 --> 0:11:09.610 to split. 0:11:09.669 --> 0:11:15.333 So there are some things where it could be a bit more difficult, but it's not really challenging. 0:11:16.796 --> 0:11:23.318 In other languages it's getting a lot more challenging, especially in Asian languages 0:11:23.318 --> 0:11:26.882 where often there are no spaces between words. 0:11:27.147 --> 0:11:32.775 So you just have the sequence of characters. 0:11:32.775 --> 0:11:38.403 The quick brown fox jumps over the lazy dog. 0:11:38.999 --> 0:11:44.569 And then it still might be helpful to work on something like words. 0:11:44.569 --> 0:11:48.009 Then you need to have a bit more complex. 0:11:48.328 --> 0:11:55.782 And here you see we are again having our typical problem. 0:11:55.782 --> 0:12:00.408 That means that there is ambiguity. 0:12:00.600 --> 0:12:02.104 So you're seeing here. 0:12:02.104 --> 0:12:08.056 We have exactly the same sequence of characters or here, but depending on how we split it, 0:12:08.056 --> 0:12:12.437 it means he is your servant or he is the one who used your things. 0:12:12.437 --> 0:12:15.380 Or here we have round eyes and take the air. 0:12:15.895 --> 0:12:22.953 So then of course yeah this type of tokenization gets more important because you could introduce 0:12:22.953 --> 0:12:27.756 already arrows and you can imagine if you're doing it here wrong. 0:12:27.756 --> 0:12:34.086 If you once do a wrong decision it's quite difficult to recover from a wrong decision. 0:12:34.634 --> 0:12:47.088 And so in these cases looking about how we're doing tokenization is an important issue. 0:12:47.127 --> 0:12:54.424 And then it might be helpful to do things like director based models where we treat each 0:12:54.424 --> 0:12:56.228 director as a symbol. 0:12:56.228 --> 0:13:01.803 For example, do this decision in the later or never really do this? 0:13:06.306 --> 0:13:12.033 The other thing is that if we have words we might, it might not be the optimal unit to 0:13:12.033 --> 0:13:18.155 work with because it can be that we should look into the internal structure of words because 0:13:18.155 --> 0:13:20.986 if we have a morphological rich language,. 0:13:21.141 --> 0:13:27.100 That means we have a lot of different types of words, and if you have a lot of many different 0:13:27.100 --> 0:13:32.552 types of words, it on the other hand means of course each of these words we have seen 0:13:32.552 --> 0:13:33.757 very infrequently. 0:13:33.793 --> 0:13:39.681 So if you only have ten words and you have a large corpus, each word occurs more often. 0:13:39.681 --> 0:13:45.301 If you have three million different words, then each of them will occur less often. 0:13:45.301 --> 0:13:51.055 Hopefully you know, from machine learning, it's helpful if you have seen each example 0:13:51.055 --> 0:13:51.858 very often. 0:13:52.552 --> 0:13:54.524 And so why does it help? 0:13:54.524 --> 0:13:56.495 Why does it help happen? 0:13:56.495 --> 0:14:02.410 Yeah, in some languages we have quite a complex information inside a word. 0:14:02.410 --> 0:14:09.271 So here's a word from a finish talosanikiko or something like that, and it means in my 0:14:09.271 --> 0:14:10.769 house to question. 0:14:11.491 --> 0:14:15.690 So you have all these information attached to the word. 0:14:16.036 --> 0:14:20.326 And that of course in extreme case that's why typically, for example, Finnish is the 0:14:20.326 --> 0:14:20.831 language. 0:14:20.820 --> 0:14:26.725 Where machine translation quality is less good because generating all these different 0:14:26.725 --> 0:14:33.110 morphological variants is is a challenge and the additional challenge is typically in finish 0:14:33.110 --> 0:14:39.564 not really low resource but for in low resource languages you quite often have more difficult 0:14:39.564 --> 0:14:40.388 morphology. 0:14:40.440 --> 0:14:43.949 Mean English is an example of a relatively easy one. 0:14:46.066 --> 0:14:54.230 And so in general we can say that words are composed of more themes, and more themes are 0:14:54.230 --> 0:15:03.069 the smallest meaning carrying unit, so normally it means: All morphine should have some type 0:15:03.069 --> 0:15:04.218 of meaning. 0:15:04.218 --> 0:15:09.004 For example, here does not really have a meaning. 0:15:09.289 --> 0:15:12.005 Bian has some type of meaning. 0:15:12.005 --> 0:15:14.371 It's changing the meaning. 0:15:14.371 --> 0:15:21.468 The NES has the meaning that it's making out of an adjective, a noun, and happy. 0:15:21.701 --> 0:15:31.215 So each of these parts conveys some meaning, but you cannot split them further up and have 0:15:31.215 --> 0:15:32.156 somehow. 0:15:32.312 --> 0:15:36.589 You see that of course a little bit more is happening. 0:15:36.589 --> 0:15:43.511 Typically the Y is going into an E so there can be some variation, but these are typical 0:15:43.511 --> 0:15:46.544 examples of what we have as morphines. 0:16:02.963 --> 0:16:08.804 That is, of course, a problem and that's the question why how you do your splitting. 0:16:08.804 --> 0:16:15.057 But that problem we have anyway always because even full words can have different meanings 0:16:15.057 --> 0:16:17.806 depending on the context they're using. 0:16:18.038 --> 0:16:24.328 So we always have to somewhat have a model which can infer or represent the meaning of 0:16:24.328 --> 0:16:25.557 the word in the. 0:16:25.825 --> 0:16:30.917 But you are right that this problem might get even more severe if you're splitting up. 0:16:30.917 --> 0:16:36.126 Therefore, it might not be the best to go for the very extreme and represent each letter 0:16:36.126 --> 0:16:41.920 and have a model which is only on letters because, of course, a letter can have a lot of different 0:16:41.920 --> 0:16:44.202 meanings depending on where it's used. 0:16:44.524 --> 0:16:50.061 And yeah, there is no right solution like what is the right splitting. 0:16:50.061 --> 0:16:56.613 It depends on the language and the application on the amount of data you're having. 0:16:56.613 --> 0:17:01.058 For example, typically it means the fewer data you have. 0:17:01.301 --> 0:17:12.351 The more splitting you should do, if you have more data, then you can be better distinguish. 0:17:13.653 --> 0:17:19.065 Then there are different types of morphines: So we have typically one stemmed theme: It's 0:17:19.065 --> 0:17:21.746 like house or tish, so the main meaning. 0:17:21.941 --> 0:17:29.131 And then you can have functional or bound morphemes which can be f which can be prefix, 0:17:29.131 --> 0:17:34.115 suffix, infix or circumfix so it can be before can be after. 0:17:34.114 --> 0:17:39.416 It can be inside or it can be around it, something like a coughed there. 0:17:39.416 --> 0:17:45.736 Typically you would say that it's not like two more themes, G and T, because they both 0:17:45.736 --> 0:17:50.603 describe the function, but together G and T are marking the cough. 0:17:53.733 --> 0:18:01.209 For what are people using them you can use them for inflection to describe something like 0:18:01.209 --> 0:18:03.286 tense count person case. 0:18:04.604 --> 0:18:09.238 That is yeah, if you know German, this is commonly used in German. 0:18:10.991 --> 0:18:16.749 But of course there is a lot more complicated things: I think in in some languages it also. 0:18:16.749 --> 0:18:21.431 I mean, in Germany it only depends counting person on the subject. 0:18:21.431 --> 0:18:27.650 For the word, for example, in other languages it can also determine the first and on the 0:18:27.650 --> 0:18:28.698 second object. 0:18:28.908 --> 0:18:35.776 So that it like if you buy an apple or an house, that not only the, the, the. 0:18:35.776 --> 0:18:43.435 Kauft depends on on me like in German, but it can also depend on whether it's an apple 0:18:43.435 --> 0:18:44.492 or a house. 0:18:44.724 --> 0:18:48.305 And then of course you have an exploding number of web fronts. 0:18:49.409 --> 0:19:04.731 Furthermore, it can be used to do derivations so you can make other types of words from it. 0:19:05.165 --> 0:19:06.254 And then yeah. 0:19:06.254 --> 0:19:12.645 This is like creating new words by joining them like rainbow waterproof but for example 0:19:12.645 --> 0:19:19.254 in German like Einköw's Wagen, Ice Cult and so on where you can join where you can do that 0:19:19.254 --> 0:19:22.014 with nouns and German adjectives and. 0:19:22.282 --> 0:19:29.077 Then of course you might have additional challenges like the Fugan where you have to add this one. 0:19:32.452 --> 0:19:39.021 Yeah, then there is a yeah of course additional special things. 0:19:39.639 --> 0:19:48.537 You have to sometimes put extra stuff because of phonology, so it's dig the plural, not plural. 0:19:48.537 --> 0:19:56.508 The third person singular, as in English, is normally S, but by Goes, for example, is 0:19:56.508 --> 0:19:57.249 an E S. 0:19:57.277 --> 0:20:04.321 In German you can also have other things that like Osmutta gets Mutter so you're changing 0:20:04.321 --> 0:20:11.758 the Umlaud in order to express the plural and in other languages for example the vowel harmony 0:20:11.758 --> 0:20:17.315 where the vowels inside are changing depending on which form you have. 0:20:17.657 --> 0:20:23.793 Which makes things more difficult than splitting a word into its part doesn't really work anymore. 0:20:23.793 --> 0:20:28.070 So like for Muta and Muta, for example, that is not really possible. 0:20:28.348 --> 0:20:36.520 The nice thing is, of course, more like a general thing, but often irregular things are 0:20:36.520 --> 0:20:39.492 happening as words which occur. 0:20:39.839 --> 0:20:52.177 So that you can have enough examples, while the regular things you can do by some type 0:20:52.177 --> 0:20:53.595 of rules. 0:20:55.655 --> 0:20:57.326 Yeah, This Can Be Done. 0:20:57.557 --> 0:21:02.849 So there are tasks on this: how to do automatic inflection, how to analyze them. 0:21:02.849 --> 0:21:04.548 So you give it a word to. 0:21:04.548 --> 0:21:10.427 It's telling you what are the possible forms of that, like how they are built, and so on. 0:21:10.427 --> 0:21:15.654 And for the at least Ah Iris shoes language, there are a lot of tools for that. 0:21:15.654 --> 0:21:18.463 Of course, if you now want to do that for. 0:21:18.558 --> 0:21:24.281 Some language which is very low resourced might be very difficult and there might be 0:21:24.281 --> 0:21:25.492 no tool for them. 0:21:28.368 --> 0:21:37.652 Good before we are going for the next part about part of speech, are there any questions 0:21:37.652 --> 0:21:38.382 about? 0:22:01.781 --> 0:22:03.187 Yeah, we'll come to that a bit. 0:22:03.483 --> 0:22:09.108 So it's a very good question and difficult and especially we'll see that later if you 0:22:09.108 --> 0:22:14.994 just put in words it would be very bad because words are put into neural networks just as 0:22:14.994 --> 0:22:15.844 some digits. 0:22:15.844 --> 0:22:21.534 Each word is mapped into a jitter and you put it in so it doesn't really know any more 0:22:21.534 --> 0:22:22.908 about the structure. 0:22:23.543 --> 0:22:29.898 What we will see therefore the most successful approach which is mostly done is a subword 0:22:29.898 --> 0:22:34.730 unit where we split: But we will do this. 0:22:34.730 --> 0:22:40.154 Don't know if you have been in advanced. 0:22:40.154 --> 0:22:44.256 We'll cover this on a Tuesday. 0:22:44.364 --> 0:22:52.316 So there is an algorithm called bite pairing coding, which is about splitting words into 0:22:52.316 --> 0:22:52.942 parts. 0:22:53.293 --> 0:23:00.078 So it's doing the splitting of words but not morphologically motivated but more based on 0:23:00.078 --> 0:23:00.916 frequency. 0:23:00.940 --> 0:23:11.312 However, it performs very good and that's why it's used and there is a bit of correlation. 0:23:11.312 --> 0:23:15.529 Sometimes they agree on count based. 0:23:15.695 --> 0:23:20.709 So we're splitting words and we're splitting especially words which are infrequent and that's 0:23:20.709 --> 0:23:23.962 maybe a good motivation why that's good for neural networks. 0:23:23.962 --> 0:23:28.709 That means if you have seen a word very often you don't need to split it and it's easier 0:23:28.709 --> 0:23:30.043 to just process it fast. 0:23:30.690 --> 0:23:39.218 While if you have seen the words infrequently, it is good to split it into parts so it can 0:23:39.218 --> 0:23:39.593 do. 0:23:39.779 --> 0:23:47.729 So there is some way of doing it, but linguists would say this is not a morphological analyst. 0:23:47.729 --> 0:23:53.837 That is true, but we are spitting words into parts if they are not seen. 0:23:59.699 --> 0:24:06.324 Yes, so another important thing about words are the paddle speech text. 0:24:06.324 --> 0:24:14.881 These are the common ones: noun, verb, adjective, verb, determine, pronoun, proposition, and 0:24:14.881 --> 0:24:16.077 conjunction. 0:24:16.077 --> 0:24:26.880 There are some more: They are not the same in all language, but for example there is this 0:24:26.880 --> 0:24:38.104 universal grammar which tries to do this type of part of speech text for many languages. 0:24:38.258 --> 0:24:42.018 And then, of course, it's helping you for generalization. 0:24:42.018 --> 0:24:48.373 There are some language deals with verbs and nouns, especially if you look at sentence structure. 0:24:48.688 --> 0:24:55.332 And so if you know the part of speech tag you can easily generalize and do get these 0:24:55.332 --> 0:24:58.459 rules or apply these rules as you know. 0:24:58.459 --> 0:25:02.680 The verb in English is always at the second position. 0:25:03.043 --> 0:25:10.084 So you know how to deal with verbs independently of which words you are now really looking at. 0:25:12.272 --> 0:25:18.551 And that again can be done is ambiguous. 0:25:18.598 --> 0:25:27.171 So there are some words which can have several pot of speech text. 0:25:27.171 --> 0:25:38.686 Example are the word can, for example, which can be the can of beans or can do something. 0:25:38.959 --> 0:25:46.021 Often is also in English related work. 0:25:46.021 --> 0:25:55.256 Access can be to excess or to access to something. 0:25:56.836 --> 0:26:02.877 Most words have only one single part of speech tag, but they are some where it's a bit more 0:26:02.877 --> 0:26:03.731 challenging. 0:26:03.731 --> 0:26:09.640 The nice thing is the ones which are in big are often more words, which occur more often, 0:26:09.640 --> 0:26:12.858 while for really ware words it's not that often. 0:26:13.473 --> 0:26:23.159 If you look at these classes you can distinguish open classes where new words can happen so 0:26:23.159 --> 0:26:25.790 we can invent new nouns. 0:26:26.926 --> 0:26:31.461 But then there are the close classes which I think are determined or pronoun. 0:26:31.461 --> 0:26:35.414 For example, it's not that you can easily develop your new pronoun. 0:26:35.414 --> 0:26:38.901 So there is a fixed list of pronouns and we are using that. 0:26:38.901 --> 0:26:44.075 So it's not like that or tomorrow there is something happening and then people are using 0:26:44.075 --> 0:26:44.482 a new. 0:26:45.085 --> 0:26:52.426 Pronoun or new conjectures, so it's like end, because it's not that you normally invent a 0:26:52.426 --> 0:26:52.834 new. 0:27:00.120 --> 0:27:03.391 And additional to part of speech text. 0:27:03.391 --> 0:27:09.012 Then some of these part of speech texts have different properties. 0:27:09.389 --> 0:27:21.813 So, for example, for nouns and adjectives we can have a singular plural: In other languages, 0:27:21.813 --> 0:27:29.351 there is a duel so that a word is not only like a single or in plural, but also like a 0:27:29.351 --> 0:27:31.257 duel if it's meaning. 0:27:31.631 --> 0:27:36.246 You have the gender and masculine feminine neutre we know. 0:27:36.246 --> 0:27:43.912 In other language there is animated and inanimated and you have the cases like in German you have 0:27:43.912 --> 0:27:46.884 no maternative guinetive acquisitive. 0:27:47.467 --> 0:27:57.201 So here and then in other languages you also have Latin with the upper teeth. 0:27:57.497 --> 0:28:03.729 So there's like more, it's just like yeah, and there you have no one to one correspondence, 0:28:03.729 --> 0:28:09.961 so it can be that there are some cases which are only in the one language and do not happen 0:28:09.961 --> 0:28:11.519 in the other language. 0:28:13.473 --> 0:28:20.373 For whorps we have tenses of course like walk is walking walked have walked head walked will 0:28:20.373 --> 0:28:21.560 walk and so on. 0:28:21.560 --> 0:28:28.015 Interestingly for example in Japanese this can also happen for adjectives though there 0:28:28.015 --> 0:28:32.987 is a difference between something is white or something was white. 0:28:35.635 --> 0:28:41.496 There is this continuous thing which should not really have that commonly in German and 0:28:41.496 --> 0:28:47.423 I guess that's if you're German and learning English that's something like she sings and 0:28:47.423 --> 0:28:53.350 she is singing and of course we can express that but it's not commonly used and normally 0:28:53.350 --> 0:28:55.281 we're not doing this aspect. 0:28:55.455 --> 0:28:57.240 Also about tenses. 0:28:57.240 --> 0:29:05.505 If you use pasts in English you will also use past tenses in German, so we have similar 0:29:05.505 --> 0:29:09.263 tenses, but the use might be different. 0:29:14.214 --> 0:29:20.710 There is uncertainty like the mood in there indicative. 0:29:20.710 --> 0:29:26.742 If he were here, there's voices active and passive. 0:29:27.607 --> 0:29:34.024 That you know, that is like both in German and English there, but there is something in 0:29:34.024 --> 0:29:35.628 the Middle and Greek. 0:29:35.628 --> 0:29:42.555 I get myself taught, so there is other phenomens than which might only happen in one language. 0:29:42.762 --> 0:29:50.101 This is, like yeah, the different synthetic structures that you can can have in the language, 0:29:50.101 --> 0:29:57.361 and where there's the two things, so it might be that some only are in some language, others 0:29:57.361 --> 0:29:58.376 don't exist. 0:29:58.358 --> 0:30:05.219 And on the other hand there is also matching, so it might be that in some situations you 0:30:05.219 --> 0:30:07.224 use different structures. 0:30:10.730 --> 0:30:13.759 The next would be then about semantics. 0:30:13.759 --> 0:30:16.712 Do you have any questions before that? 0:30:19.819 --> 0:30:31.326 I'll just continue, but if something is unclear beside the structure, we typically have more 0:30:31.326 --> 0:30:39.863 ambiguities, so it can be that words itself have different meanings. 0:30:40.200 --> 0:30:48.115 And we are typically talking about polysemy and homonyme, where polysemy means that a word 0:30:48.115 --> 0:30:50.637 can have different meanings. 0:30:50.690 --> 0:30:58.464 So if you have the English word interest, it can be that you are interested in something. 0:30:58.598 --> 0:31:07.051 Or it can be like the interest rate financial, but it is somehow related because if you are 0:31:07.051 --> 0:31:11.002 getting some interest rates there is some. 0:31:11.531 --> 0:31:18.158 Are, but there is a homophemer where they really are not related. 0:31:18.458 --> 0:31:24.086 So you can and can doesn't really have anything in common, so it's really very different. 0:31:24.324 --> 0:31:29.527 And of course that's not completely clear so there is not a clear definition so for example 0:31:29.527 --> 0:31:34.730 for the bank it can be that you say it's related but it can also be other can argue that so 0:31:34.730 --> 0:31:39.876 there are some clear things which is interest there are some which is vague and then there 0:31:39.876 --> 0:31:43.439 are some where it's very clear again that there are different. 0:31:45.065 --> 0:31:49.994 And in order to translate them, of course, we might need the context to disambiguate. 0:31:49.994 --> 0:31:54.981 That's typically where we can disambiguate, and that's not only for lexical semantics, 0:31:54.981 --> 0:32:00.198 that's generally very often that if you want to disambiguate, context can be very helpful. 0:32:00.198 --> 0:32:03.981 So in which sentence and which general knowledge who is speaking? 0:32:04.944 --> 0:32:09.867 You can do that externally by some disinvigration task. 0:32:09.867 --> 0:32:14.702 Machine translation system will also do it internally. 0:32:16.156 --> 0:32:21.485 And sometimes you're lucky and you don't need to do it because you just have the same ambiguity 0:32:21.485 --> 0:32:23.651 in the source and the target language. 0:32:23.651 --> 0:32:26.815 And then it doesn't matter if you think about the mouse. 0:32:26.815 --> 0:32:31.812 As I said, you don't really need to know if it's a computer mouse or the living mouse you 0:32:31.812 --> 0:32:36.031 translate from German to English because it has exactly the same ambiguity. 0:32:40.400 --> 0:32:46.764 There's also relations between words like synonyms, antonyms, hipponomes, like the is 0:32:46.764 --> 0:32:50.019 a relation and the part of like Dora House. 0:32:50.019 --> 0:32:55.569 Big small is an antonym and synonym is like which needs something similar. 0:32:56.396 --> 0:33:03.252 There are resources which try to express all these linguistic information like word net 0:33:03.252 --> 0:33:10.107 or German net where you have a graph with words and how they are related to each other. 0:33:11.131 --> 0:33:12.602 Which can be helpful. 0:33:12.602 --> 0:33:18.690 Typically these things were more used in tasks where there is fewer data, so there's a lot 0:33:18.690 --> 0:33:24.510 of tasks in NLP where you have very limited data because you really need to hand align 0:33:24.510 --> 0:33:24.911 that. 0:33:25.125 --> 0:33:28.024 Machine translation has a big advantage. 0:33:28.024 --> 0:33:31.842 There's naturally a lot of text translated out there. 0:33:32.212 --> 0:33:39.519 Typically in machine translation we have compared to other tasks significantly amount of data. 0:33:39.519 --> 0:33:46.212 People have looked into integrating wordnet or things like that, but it is rarely used 0:33:46.212 --> 0:33:49.366 in like commercial systems or something. 0:33:52.692 --> 0:33:55.626 So this was based on the words. 0:33:55.626 --> 0:34:03.877 We have morphology, syntax, and semantics, and then of course it makes sense to also look 0:34:03.877 --> 0:34:06.169 at the bigger structure. 0:34:06.169 --> 0:34:08.920 That means information about. 0:34:08.948 --> 0:34:17.822 Of course, we don't have a really morphology there because morphology about the structure 0:34:17.822 --> 0:34:26.104 of words, but we have syntax on the sentence level and the semantic representation. 0:34:28.548 --> 0:34:35.637 When we are thinking about the sentence structure, then the sentence is, of course, first a sequence 0:34:35.637 --> 0:34:37.742 of words terminated by a dot. 0:34:37.742 --> 0:34:42.515 Jane bought the house and we can say something about the structure. 0:34:42.515 --> 0:34:47.077 It's typically its subject work and then one or several objects. 0:34:47.367 --> 0:34:51.996 And the number of objects, for example, is then determined by the word. 0:34:52.232 --> 0:34:54.317 It's Called the Valency. 0:34:54.354 --> 0:35:01.410 So you have intransitive verbs which don't get any object, it's just to sleep. 0:35:02.622 --> 0:35:05.912 For example, there is no object sleep beds. 0:35:05.912 --> 0:35:14.857 You cannot say that: And there are transitive verbs where you have to put one or more objects, 0:35:14.857 --> 0:35:16.221 and you always. 0:35:16.636 --> 0:35:19.248 Sentence is not correct if you don't put the object. 0:35:19.599 --> 0:35:33.909 So if you have to buy something you have to say bought this or give someone something then. 0:35:34.194 --> 0:35:40.683 Here you see a bit that may be interesting the relation between word order and morphology. 0:35:40.683 --> 0:35:47.243 Of course it's not that strong, but for example in English you always have to first say who 0:35:47.243 --> 0:35:49.453 you gave it and what you gave. 0:35:49.453 --> 0:35:53.304 So the structure is very clear and cannot be changed. 0:35:54.154 --> 0:36:00.801 German, for example, has a possibility of determining what you gave and whom you gave 0:36:00.801 --> 0:36:07.913 it because there is a morphology and you can do what you gave a different form than to whom 0:36:07.913 --> 0:36:08.685 you gave. 0:36:11.691 --> 0:36:18.477 And that is a general tendency that if you have morphology then typically the word order 0:36:18.477 --> 0:36:25.262 is more free and possible, while in English you cannot express these information through 0:36:25.262 --> 0:36:26.482 the morphology. 0:36:26.706 --> 0:36:30.238 You typically have to express them through the word order. 0:36:30.238 --> 0:36:32.872 It's not as free, but it's more restricted. 0:36:35.015 --> 0:36:40.060 Yeah, the first part is typically the noun phrase, the subject, and that can not only 0:36:40.060 --> 0:36:43.521 be a single noun, but of course it can be a longer phrase. 0:36:43.521 --> 0:36:48.860 So if you have Jane the woman, it can be Jane, it can be the woman, it can a woman, it can 0:36:48.860 --> 0:36:52.791 be the young woman or the young woman who lives across the street. 0:36:53.073 --> 0:36:56.890 All of these are the subjects, so this can be already very, very long. 0:36:57.257 --> 0:36:58.921 And they also put this. 0:36:58.921 --> 0:37:05.092 The verb is on the second position in a bit more complicated way because if you have now 0:37:05.092 --> 0:37:11.262 the young woman who lives across the street runs to somewhere or so then yeah runs is at 0:37:11.262 --> 0:37:16.185 the second position in this tree but the first position is quite long. 0:37:16.476 --> 0:37:19.277 And so it's not just counting okay. 0:37:19.277 --> 0:37:22.700 The second word is always is always a word. 0:37:26.306 --> 0:37:32.681 Additional to these simple things, there's more complex stuff. 0:37:32.681 --> 0:37:43.104 Jane bought the house from Jim without hesitation, or Jane bought the house in the pushed neighborhood 0:37:43.104 --> 0:37:44.925 across the river. 0:37:45.145 --> 0:37:51.694 And these often lead to additional ambiguities because it's not always completely clear to 0:37:51.694 --> 0:37:53.565 which this prepositional. 0:37:54.054 --> 0:37:59.076 So that we'll see and you have, of course, subclasses and so on. 0:38:01.061 --> 0:38:09.926 And then there is a theory behind it which was very important for rule based machine translation 0:38:09.926 --> 0:38:14.314 because that's exactly what you're doing there. 0:38:14.314 --> 0:38:18.609 You would take the sentence, do the syntactic. 0:38:18.979 --> 0:38:28.432 So that we can have this constituents which like describe the basic parts of the language. 0:38:28.468 --> 0:38:35.268 And we can create the sentence structure as a context free grammar, which you hopefully 0:38:35.268 --> 0:38:42.223 remember from basic computer science, which is a pair of non terminals, terminal symbols, 0:38:42.223 --> 0:38:44.001 production rules, and. 0:38:43.943 --> 0:38:50.218 And the star symbol, and you can then describe a sentence by this phrase structure grammar: 0:38:51.751 --> 0:38:59.628 So a simple example would be something like that: you have a lexicon, Jane is a noun, Frays 0:38:59.628 --> 0:39:02.367 is a noun, Telescope is a noun. 0:39:02.782 --> 0:39:10.318 And then you have these production rules sentences: a noun phrase in the web phrase. 0:39:10.318 --> 0:39:18.918 The noun phrase can either be a determinized noun or it can be a noun phrase and a propositional 0:39:18.918 --> 0:39:19.628 phrase. 0:39:19.919 --> 0:39:25.569 Or a prepositional phrase and a prepositional phrase is a preposition and a non phrase. 0:39:26.426 --> 0:39:27.622 We're looking at this. 0:39:27.622 --> 0:39:30.482 What is the valency of the word we're describing here? 0:39:33.513 --> 0:39:36.330 How many objects would in this case the world have? 0:39:46.706 --> 0:39:48.810 We're looking at the web phrase. 0:39:48.810 --> 0:39:54.358 The web phrase is a verb and a noun phrase, so one object here, so this would be for a 0:39:54.358 --> 0:39:55.378 balance of one. 0:39:55.378 --> 0:40:00.925 If you have intransitive verbs, it would be verb phrases, just a word, and if you have 0:40:00.925 --> 0:40:03.667 two, it would be noun phrase, noun phrase. 0:40:08.088 --> 0:40:15.348 And yeah, then the, the, the challenge or what you have to do is like this: Given a natural 0:40:15.348 --> 0:40:23.657 language sentence, you want to parse it to get this type of pastry from programming languages 0:40:23.657 --> 0:40:30.198 where you also need to parse the code in order to get the representation. 0:40:30.330 --> 0:40:39.356 However, there is one challenge if you parse natural language compared to computer language. 0:40:43.823 --> 0:40:56.209 So there are different ways of how you can express things and there are different pastures 0:40:56.209 --> 0:41:00.156 belonging to the same input. 0:41:00.740 --> 0:41:05.241 So if you have Jane buys a horse, how's that an easy example? 0:41:05.241 --> 0:41:07.491 So you do the lexicon look up. 0:41:07.491 --> 0:41:13.806 Jane can be a noun phrase, a bias is a verb, a is a determiner, and a house is a noun. 0:41:15.215 --> 0:41:18.098 And then you can now use the grammar rules of here. 0:41:18.098 --> 0:41:19.594 There is no rule for that. 0:41:20.080 --> 0:41:23.564 Here we have no rules, but here we have a rule. 0:41:23.564 --> 0:41:27.920 A noun is a non-phrase, so we have mapped that to the noun. 0:41:28.268 --> 0:41:34.012 Then we can map this to the web phrase. 0:41:34.012 --> 0:41:47.510 We have a verb noun phrase to web phrase and then we can map this to a sentence representing: 0:41:49.069 --> 0:41:53.042 We can have that even more complex. 0:41:53.042 --> 0:42:01.431 The woman who won the lottery yesterday bought the house across the street. 0:42:01.431 --> 0:42:05.515 The structure gets more complicated. 0:42:05.685 --> 0:42:12.103 You now see that the word phrase is at the second position, but the noun phrase is quite. 0:42:12.052 --> 0:42:18.655 Quite big in here and the p p phrases, it's sometimes difficult where to put them because 0:42:18.655 --> 0:42:25.038 they can be put to the noun phrase, but in other sentences they can also be put to the 0:42:25.038 --> 0:42:25.919 web phrase. 0:42:36.496 --> 0:42:38.250 Yeah. 0:42:43.883 --> 0:42:50.321 Yes, so then either it can have two tags, noun or noun phrase, or you can have the extra 0:42:50.321 --> 0:42:50.755 rule. 0:42:50.755 --> 0:42:57.409 The noun phrase can not only be a determiner in the noun, but it can also be a noun phrase. 0:42:57.717 --> 0:43:04.360 Then of course either you introduce additional rules when what is possible or the problem 0:43:04.360 --> 0:43:11.446 that if you do pastures which are not correct and then you have to add some type of probability 0:43:11.446 --> 0:43:13.587 which type is more probable. 0:43:16.876 --> 0:43:23.280 But of course some things also can't really model easily with this type of cheese. 0:43:23.923 --> 0:43:32.095 There, for example, the agreement is not straightforward to do so that in subject and work you can check 0:43:32.095 --> 0:43:38.866 that the person, the agreement, the number in person, the number agreement is correct, 0:43:38.866 --> 0:43:41.279 but if it's a singular object. 0:43:41.561 --> 0:43:44.191 A singular verb, it's also a singular. 0:43:44.604 --> 0:43:49.242 Non-subject, and if it's a plural subject, it's a plural work. 0:43:49.489 --> 0:43:56.519 Things like that are yeah, the agreement in determining action driven now, so they also 0:43:56.519 --> 0:43:57.717 have to agree. 0:43:57.877 --> 0:44:05.549 Things like that cannot be easily done with this type of grammar or this subcategorization 0:44:05.549 --> 0:44:13.221 that you check whether the verb is transitive or intransitive, and that Jane sleeps is OK, 0:44:13.221 --> 0:44:16.340 but Jane sleeps the house is not OK. 0:44:16.436 --> 0:44:21.073 And Jane Walterhouse is okay, but Jane Walterhouse is not okay. 0:44:23.183 --> 0:44:29.285 Furthermore, this long range dependency might be difficult and which word orders are allowed 0:44:29.285 --> 0:44:31.056 and which are not allowed. 0:44:31.571 --> 0:44:40.011 This is also not directly so you can say Maria give de man das bourg, de man give Maria das 0:44:40.011 --> 0:44:47.258 bourg, das bourg give Maria, de man aber Maria, de man give des bourg is some. 0:44:47.227 --> 0:44:55.191 One yeah, which one from this one is possible and not is sometimes not possible to model, 0:44:55.191 --> 0:44:56.164 is simple. 0:44:56.876 --> 0:45:05.842 Therefore, people have done more complex stuff like this unification grammar and tried to 0:45:05.842 --> 0:45:09.328 model both the categories of verb. 0:45:09.529 --> 0:45:13.367 The agreement has to be that it's person and single. 0:45:13.367 --> 0:45:20.028 You're joining that so you're annotating this thing with more information and then you have 0:45:20.028 --> 0:45:25.097 more complex synthetic structures in order to model also these types. 0:45:28.948 --> 0:45:33.137 Yeah, why is this difficult? 0:45:33.873 --> 0:45:39.783 We have different ambiguities and that makes it different, so words have different part 0:45:39.783 --> 0:45:43.610 of speech text and if you have time flies like an error. 0:45:43.583 --> 0:45:53.554 It can mean that sometimes the animal L look like an arrow and or it can mean that the time 0:45:53.554 --> 0:45:59.948 is flying very fast is going away very fast like an error. 0:46:00.220 --> 0:46:10.473 And if you want to do a pastry, these two meanings have a different part of speech text, 0:46:10.473 --> 0:46:13.008 so flies is the verb. 0:46:13.373 --> 0:46:17.999 And of course that is a different semantic, and so that is very different. 0:46:19.499 --> 0:46:23.361 And otherwise a structural. 0:46:23.243 --> 0:46:32.419 Ambiguity so that like some part of the sentence can have different rules, so the famous thing 0:46:32.419 --> 0:46:34.350 is this attachment. 0:46:34.514 --> 0:46:39.724 So the cops saw the Bulgara with a binoculars. 0:46:39.724 --> 0:46:48.038 Then with a binocular can be attached to saw or it can be attached to the. 0:46:48.448 --> 0:46:59.897 And so in the first two it's more probable that he saw the theft, and not that the theft 0:46:59.897 --> 0:47:01.570 has the one. 0:47:01.982 --> 0:47:13.356 And this, of course, makes things difficult while parsing and doing structure implicitly 0:47:13.356 --> 0:47:16.424 defining the semantics. 0:47:20.120 --> 0:47:29.736 Therefore, we would then go directly to semantics, but maybe some questions about spintax and 0:47:29.736 --> 0:47:31.373 how that works. 0:47:33.113 --> 0:47:46.647 Then we'll do a bit more about semantics, so now we only describe the structure of the 0:47:46.647 --> 0:47:48.203 sentence. 0:47:48.408 --> 0:47:55.584 And for the meaning of the sentence we typically have the compositionality of meaning. 0:47:55.584 --> 0:48:03.091 The meaning of the full sentence is determined by the meaning of the individual words, and 0:48:03.091 --> 0:48:06.308 they together form the meaning of the. 0:48:06.686 --> 0:48:17.936 For words that is partly true but not always mean for things like rainbow, jointly rain 0:48:17.936 --> 0:48:19.086 and bow. 0:48:19.319 --> 0:48:26.020 But this is not always a case, while for sentences typically that is happening because you can't 0:48:26.020 --> 0:48:30.579 directly determine the full meaning, but you split it into parts. 0:48:30.590 --> 0:48:36.164 Sometimes only in some parts like kick the bucket the expression. 0:48:36.164 --> 0:48:43.596 Of course you cannot get the meaning of kick the bucket by looking at the individual or 0:48:43.596 --> 0:48:46.130 in German abyss in its grass. 0:48:47.207 --> 0:48:53.763 You cannot get that he died by looking at the individual words of Bis ins grass, but 0:48:53.763 --> 0:48:54.611 they have. 0:48:55.195 --> 0:49:10.264 And there are different ways of describing that some people have tried that more commonly 0:49:10.264 --> 0:49:13.781 used for some tasks. 0:49:14.654 --> 0:49:20.073 Will come to so the first thing would be something like first order logic. 0:49:20.073 --> 0:49:27.297 If you have Peter loves Jane then you have this meaning and you're having the end of representation 0:49:27.297 --> 0:49:33.005 that you have a love property between Peter and Jane and you try to construct. 0:49:32.953 --> 0:49:40.606 That you will see this a lot more complex than directly than only doing syntax but also 0:49:40.606 --> 0:49:43.650 doing this type of representation. 0:49:44.164 --> 0:49:47.761 The other thing is to try to do frame semantics. 0:49:47.867 --> 0:49:55.094 That means that you try to represent the knowledge about the world and you have these ah frames. 0:49:55.094 --> 0:49:58.372 For example, you might have a frame to buy. 0:49:58.418 --> 0:50:05.030 And the meaning is that you have a commercial transaction. 0:50:05.030 --> 0:50:08.840 You have a person who is selling. 0:50:08.969 --> 0:50:10.725 You Have a Person Who's Buying. 0:50:11.411 --> 0:50:16.123 You have something that is priced, you might have a price, and so on. 0:50:17.237 --> 0:50:22.698 And then what you are doing in semantic parsing with frame semantics you first try to determine. 0:50:22.902 --> 0:50:30.494 Which frames are happening in the sentence, so if it's something with Bowie buying you 0:50:30.494 --> 0:50:33.025 would try to first identify. 0:50:33.025 --> 0:50:40.704 Oh, here we have to try Brain B, which does not always have to be indicated by the verb 0:50:40.704 --> 0:50:42.449 cell or other ways. 0:50:42.582 --> 0:50:52.515 And then you try to find out which elements of these frame are in the sentence and try 0:50:52.515 --> 0:50:54.228 to align them. 0:50:56.856 --> 0:51:01.121 Yeah, you have, for example, to buy and sell. 0:51:01.121 --> 0:51:07.239 If you have a model that has frames, they have the same elements. 0:51:09.829 --> 0:51:15.018 In addition over like sentence, then you have also a phenomenon beyond sentence level. 0:51:15.018 --> 0:51:20.088 We're coming to this later because it's a special challenge for machine translation. 0:51:20.088 --> 0:51:22.295 There is, for example, co reference. 0:51:22.295 --> 0:51:27.186 That means if you first mention it, it's like the President of the United States. 0:51:27.467 --> 0:51:30.107 And later you would refer to him maybe as he. 0:51:30.510 --> 0:51:36.966 And that is especially challenging in machine translation because you're not always using 0:51:36.966 --> 0:51:38.114 the same thing. 0:51:38.114 --> 0:51:44.355 Of course, for the president, it's he and air in German, but for other things it might 0:51:44.355 --> 0:51:49.521 be different depending on the gender in languages that you refer to it. 0:51:55.435 --> 0:52:03.866 So much for the background and the next, we want to look based on the knowledge we have 0:52:03.866 --> 0:52:04.345 now. 0:52:04.345 --> 0:52:10.285 Why is machine translation difficult before we have any more? 0:52:16.316 --> 0:52:22.471 The first type of problem is what we refer to as translation divers. 0:52:22.471 --> 0:52:30.588 That means that we have the same information in source and target, but the problem is that 0:52:30.588 --> 0:52:33.442 they are expressed differently. 0:52:33.713 --> 0:52:42.222 So it is not the same way, and we have to translate these things more easily by just 0:52:42.222 --> 0:52:44.924 having a bit more complex. 0:52:45.325 --> 0:52:51.324 So example is if it's only a structure in English, the delicious. 0:52:51.324 --> 0:52:59.141 The adjective is before the noun, while in Spanish you have to put it after the noun, 0:52:59.141 --> 0:53:02.413 and so you have to change the word. 0:53:02.983 --> 0:53:10.281 So there are different ways of divergence, so there can be structural divergence, which 0:53:10.281 --> 0:53:10.613 is. 0:53:10.550 --> 0:53:16.121 The word orders so that the order is different, so in German we have that especially in the 0:53:16.121 --> 0:53:19.451 in the sub clause, while in English in the sub clause. 0:53:19.451 --> 0:53:24.718 The verb is also at the second position, in German it's at the end, and so you have to 0:53:24.718 --> 0:53:25.506 move it all. 0:53:25.465 --> 0:53:27.222 Um All Over. 0:53:27.487 --> 0:53:32.978 It can be that that it's a complete different grammatical role. 0:53:33.253 --> 0:53:35.080 So,. 0:53:35.595 --> 0:53:37.458 You Have You Like Her. 0:53:38.238 --> 0:53:41.472 And eh in in. 0:53:41.261 --> 0:53:47.708 English: In Spanish it's a la ti gusta which means she so now she is no longer like object 0:53:47.708 --> 0:53:54.509 but she is subject here and you are now acquisitive and then pleases or like yeah so you really 0:53:54.509 --> 0:53:58.689 use a different sentence structure and you have to change. 0:53:59.139 --> 0:54:03.624 Can also be the head switch. 0:54:03.624 --> 0:54:09.501 In English you say the baby just ate. 0:54:09.501 --> 0:54:16.771 In Spanish literary you say the baby finishes. 0:54:16.997 --> 0:54:20.803 So the is no longer the word, but the finishing is the word. 0:54:21.241 --> 0:54:30.859 So you have to learn so you cannot always have the same structures in your input and 0:54:30.859 --> 0:54:31.764 output. 0:54:36.856 --> 0:54:42.318 Lexical things like to swim across or to cross swimming. 0:54:43.243 --> 0:54:57.397 You have categorical like an adjective gets into a noun, so you have a little bread to 0:54:57.397 --> 0:55:00.162 make a decision. 0:55:00.480 --> 0:55:15.427 That is the one challenge and the even bigger challenge is referred to as translation. 0:55:17.017 --> 0:55:19.301 That can be their lexical mismatch. 0:55:19.301 --> 0:55:21.395 That's the fish we talked about. 0:55:21.395 --> 0:55:27.169 If it's like the, the fish you eat or the fish which is living is the two different worlds 0:55:27.169 --> 0:55:27.931 in Spanish. 0:55:28.108 --> 0:55:34.334 And then that's partly sometimes even not known, so even the human might not be able 0:55:34.334 --> 0:55:34.627 to. 0:55:34.774 --> 0:55:40.242 Infer that you maybe need to see the context you maybe need to have the sentences around, 0:55:40.242 --> 0:55:45.770 so one problem is that at least traditional machine translation works on a sentence level, 0:55:45.770 --> 0:55:51.663 so we take each sentence and translate it independent of everything else, but that's, of course, 0:55:51.663 --> 0:55:52.453 not correct. 0:55:52.532 --> 0:55:59.901 Will look into some ways of looking at and doing document-based machine translation, but. 0:56:00.380 --> 0:56:06.793 There's gender information might be a problem, so in English it's player and you don't know 0:56:06.793 --> 0:56:10.139 if it's Spieler Spielerin or if it's not known. 0:56:10.330 --> 0:56:15.770 But in the English, if you now generate German, you should know is the reader. 0:56:15.770 --> 0:56:21.830 Does he know the gender or does he not know the gender and then generate the right one? 0:56:22.082 --> 0:56:38.333 So just imagine a commentator if he's talking about the player and you can see if it's male 0:56:38.333 --> 0:56:40.276 or female. 0:56:40.540 --> 0:56:47.801 So in generally the problem is that if you have less information and you need more information 0:56:47.801 --> 0:56:51.928 in your target, this translation doesn't really work. 0:56:55.175 --> 0:56:59.180 Another problem is we just talked about the the. 0:56:59.119 --> 0:57:01.429 The co reference. 0:57:01.641 --> 0:57:08.818 So if you refer to an object and that can be across sentence boundaries then you have 0:57:08.818 --> 0:57:14.492 to use the right pronoun and you cannot just translate the pronoun. 0:57:14.492 --> 0:57:18.581 If the baby does not thrive on raw milk boil it. 0:57:19.079 --> 0:57:28.279 And if you are now using it and just take the typical translation, it will be: And That 0:57:28.279 --> 0:57:31.065 Will Be Ah Wrong. 0:57:31.291 --> 0:57:35.784 No, that will be even right because it is dust baby. 0:57:35.784 --> 0:57:42.650 Yes, but I mean, you have to determine that and it might be wrong at some point. 0:57:42.650 --> 0:57:48.753 So getting this this um yeah, it will be wrong yes, that is right yeah. 0:57:48.908 --> 0:57:55.469 Because in English both are baby and milk, and baby are both referred to it, so if you 0:57:55.469 --> 0:58:02.180 do S it will be to the first one referred to, so it's correct, but in Germany it will be 0:58:02.180 --> 0:58:06.101 S, and so if you translate it as S it will be baby. 0:58:06.546 --> 0:58:13.808 But you have to do Z because milk is female, although that is really very uncommon because 0:58:13.808 --> 0:58:18.037 maybe a model is an object and so it should be more. 0:58:18.358 --> 0:58:25.176 Of course, I agree there might be a situation which is a bit created and not a common thing, 0:58:25.176 --> 0:58:29.062 but you can see that these things are not that easy. 0:58:29.069 --> 0:58:31.779 Another example is this: Dr. 0:58:31.779 --> 0:58:37.855 McLean often brings his dog champion to visit with his patients. 0:58:37.855 --> 0:58:41.594 He loves to give big wets loppy kisses. 0:58:42.122 --> 0:58:58.371 And there, of course, it's also important if he refers to the dog or to the doctor. 0:58:59.779 --> 0:59:11.260 Another example of challenging is that we don't have a fixed language and that was referred 0:59:11.260 --> 0:59:16.501 to morphology and we can build new words. 0:59:16.496 --> 0:59:23.787 So we can in all languages build new words by just concatinating part of it like braxits, 0:59:23.787 --> 0:59:30.570 some things like: And then, of course, also words don't exist in languages, don't exist 0:59:30.570 --> 0:59:31.578 in isolations. 0:59:32.012 --> 0:59:41.591 In Germany you can now use the word download somewhere and you can also use a morphological 0:59:41.591 --> 0:59:43.570 operation on that. 0:59:43.570 --> 0:59:48.152 I guess there is even not the correct word. 0:59:48.508 --> 0:59:55.575 But so you have to deal with these things, and yeah, in social meters. 0:59:55.996 --> 1:00:00.215 This word is maybe most of you have forgotten already. 1:00:00.215 --> 1:00:02.517 This was ten years ago or so. 1:00:02.517 --> 1:00:08.885 I don't know there was a volcano in Iceland which stopped Europeans flying around. 1:00:09.929 --> 1:00:14.706 So there is always new words coming up and you have to deal with. 1:00:18.278 --> 1:00:24.041 Yeah, one last thing, so some of these examples we have seen are a bit artificial. 1:00:24.041 --> 1:00:30.429 So one example what is very common with machine translation doesn't really work is this box 1:00:30.429 --> 1:00:31.540 was in the pen. 1:00:32.192 --> 1:00:36.887 And maybe you would be surprised, at least when read it. 1:00:36.887 --> 1:00:39.441 How can a box be inside a pen? 1:00:40.320 --> 1:00:44.175 Does anybody have a solution for that while the sentence is still correct? 1:00:47.367 --> 1:00:51.692 Maybe it's directly clear for you, maybe your English was aside, yeah. 1:00:54.654 --> 1:01:07.377 Yes, like at a farm or for small children, and that is also called a pen or a pen on a 1:01:07.377 --> 1:01:08.254 farm. 1:01:08.368 --> 1:01:12.056 And then this is, and so you can mean okay. 1:01:12.056 --> 1:01:16.079 To infer these two meanings is quite difficult. 1:01:16.436 --> 1:01:23.620 But at least when I saw it, I wasn't completely convinced because it's maybe not the sentence 1:01:23.620 --> 1:01:29.505 you're using in your daily life, and some of these constructions seem to be. 1:01:29.509 --> 1:01:35.155 They are very good in showing where the problem is, but the question is, does it really imply 1:01:35.155 --> 1:01:35.995 in real life? 1:01:35.996 --> 1:01:42.349 And therefore here some examples also that we had here with a lecture translator that 1:01:42.349 --> 1:01:43.605 really occurred. 1:01:43.605 --> 1:01:49.663 They maybe looked simple, but you will see that some of them still are happening. 1:01:50.050 --> 1:01:53.948 And they are partly about spitting words, and then they are happening. 1:01:54.294 --> 1:01:56.816 So Um. 1:01:56.596 --> 1:02:03.087 We had a text about the numeral system in German, the Silen system, which got splitted 1:02:03.087 --> 1:02:07.041 into sub parts because otherwise we can't translate. 1:02:07.367 --> 1:02:14.927 And then he did only a proximate match and was talking about the binary payment system 1:02:14.927 --> 1:02:23.270 because the payment system was a lot more common in the training data than the Thailand system. 1:02:23.823 --> 1:02:29.900 And so there you see like rare words, which don't occur that often. 1:02:29.900 --> 1:02:38.211 They are very challenging to deal with because we are good and inferring that sometimes, but 1:02:38.211 --> 1:02:41.250 for others that's very difficult. 1:02:44.344 --> 1:02:49.605 Another challenge is that, of course, the context is very difficult. 1:02:50.010 --> 1:02:56.448 This is also an example a bit older from also the lecture translators we were translating 1:02:56.448 --> 1:03:01.813 in mass lecture, and he was always talking about the omens of the numbers. 1:03:02.322 --> 1:03:11.063 Which doesn't make any sense at all, but the German word fortsizing can of course mean the 1:03:11.063 --> 1:03:12.408 sign and the. 1:03:12.732 --> 1:03:22.703 And if you not have the right to main knowledge in there and encode it, it might use the main 1:03:22.703 --> 1:03:23.869 knowledge. 1:03:25.705 --> 1:03:31.205 A more recent version of that is like here from a paper where it's about translating. 1:03:31.205 --> 1:03:36.833 We had this pivot based translation where you translate maybe to English and to another 1:03:36.833 --> 1:03:39.583 because you have not enough training data. 1:03:40.880 --> 1:03:48.051 And we did that from Dutch to German guess if you don't understand Dutch, if you speak 1:03:48.051 --> 1:03:48.710 German. 1:03:48.908 --> 1:03:56.939 So we have this raven forebuilt, which means to geben in English. 1:03:56.939 --> 1:04:05.417 It's correctly in setting an example: However, if we're then translate to German, he didn't 1:04:05.417 --> 1:04:11.524 get the full context, and in German you normally don't set an example, but you give an example, 1:04:11.524 --> 1:04:16.740 and so yes, going through another language you introduce their additional errors. 1:04:19.919 --> 1:04:27.568 Good so much for this are there more questions about why this is difficult. 1:04:30.730 --> 1:04:35.606 Then we'll start with this one. 1:04:35.606 --> 1:04:44.596 I have to leave a bit early today in a quarter of an hour. 1:04:44.904 --> 1:04:58.403 If you look about linguistic approaches to machine translation, they are typically described 1:04:58.403 --> 1:05:03.599 by: So we can do a direct translation, so you take the Suez language. 1:05:03.599 --> 1:05:09.452 Do not apply a lot of the analysis we were discussing today about syntax representation, 1:05:09.452 --> 1:05:11.096 semantic representation. 1:05:11.551 --> 1:05:14.678 But you directly translate to your target text. 1:05:14.678 --> 1:05:16.241 That's here the direct. 1:05:16.516 --> 1:05:19.285 Then there is a transfer based approach. 1:05:19.285 --> 1:05:23.811 Then you transfer everything over and you do the text translation. 1:05:24.064 --> 1:05:28.354 And you can do that at two levels, more at the syntax level. 1:05:28.354 --> 1:05:34.683 That means you only do synthetic analysts like you do a pasture or so, or at the semantic 1:05:34.683 --> 1:05:37.848 level where you do a semantic parsing frame. 1:05:38.638 --> 1:05:51.489 Then there is an interlingua based approach where you don't do any transfer anymore, but 1:05:51.489 --> 1:05:55.099 you only do an analysis. 1:05:57.437 --> 1:06:02.790 So how does now the direct transfer, the direct translation? 1:06:03.043 --> 1:06:07.031 Look like it's one of the earliest approaches. 1:06:07.327 --> 1:06:18.485 So you do maybe some morphological analysts, but not a lot, and then you do this bilingual 1:06:18.485 --> 1:06:20.202 word mapping. 1:06:20.540 --> 1:06:25.067 You might do some here in generations. 1:06:25.067 --> 1:06:32.148 These two things are not really big, but you are working on. 1:06:32.672 --> 1:06:39.237 And of course this might be a first easy solution about all the challenges we have seen that 1:06:39.237 --> 1:06:41.214 the structure is different. 1:06:41.214 --> 1:06:45.449 That you have to reorder, look at the agreement, then work. 1:06:45.449 --> 1:06:47.638 That's why the first approach. 1:06:47.827 --> 1:06:54.618 So if we have different word order, structural shifts or idiomatic expressions that doesn't 1:06:54.618 --> 1:06:55.208 really. 1:06:57.797 --> 1:07:05.034 Then there are these rule based approaches which were more commonly used. 1:07:05.034 --> 1:07:15.249 They might still be somewhere: Mean most commonly they are now used by neural networks but wouldn't 1:07:15.249 --> 1:07:19.254 be sure there is no system out there but. 1:07:19.719 --> 1:07:25.936 And in this transfer based approach we have these steps there nicely visualized in the. 1:07:26.406 --> 1:07:32.397 Triangle, so we have the analytic of the sur sentence where we then get some type of abstract 1:07:32.397 --> 1:07:33.416 representation. 1:07:33.693 --> 1:07:40.010 Then we are doing the transfer of the representation of the source sentence into the representation 1:07:40.010 --> 1:07:40.263 of. 1:07:40.580 --> 1:07:46.754 And then we have the generation where we take this abstract representation and do then the 1:07:46.754 --> 1:07:47.772 surface forms. 1:07:47.772 --> 1:07:54.217 For example, it might be that there is no morphological variants in the episode representation 1:07:54.217 --> 1:07:56.524 and we have to do this agreement. 1:07:56.656 --> 1:08:00.077 Which components do you they need? 1:08:01.061 --> 1:08:08.854 You need monolingual source and target lexicon and the corresponding grammars in order to 1:08:08.854 --> 1:08:12.318 do both the analyst and the generation. 1:08:12.412 --> 1:08:18.584 Then you need the bilingual dictionary in order to do the lexical translation and the 1:08:18.584 --> 1:08:25.116 bilingual transfer rules in order to transfer the grammar, for example in German, into the 1:08:25.116 --> 1:08:28.920 grammar in English, and that enables you to do that. 1:08:29.269 --> 1:08:32.579 So an example is is something like this here. 1:08:32.579 --> 1:08:38.193 So if you're doing a syntactic transfer it means you're starting with John E. 1:08:38.193 --> 1:08:38.408 Z. 1:08:38.408 --> 1:08:43.014 Apple you do the analyst then you have this type of graph here. 1:08:43.014 --> 1:08:48.340 Therefore you need your monolingual lexicon and your monolingual grammar. 1:08:48.748 --> 1:08:59.113 Then you're doing the transfer where you're transferring this representation into this 1:08:59.113 --> 1:09:01.020 representation. 1:09:01.681 --> 1:09:05.965 So how could this type of translation then look like? 1:09:07.607 --> 1:09:08.276 Style. 1:09:08.276 --> 1:09:14.389 We have the example of a delicious soup and una soup deliciosa. 1:09:14.894 --> 1:09:22.173 This is your source language tree and this is your target language tree and then the rules 1:09:22.173 --> 1:09:26.092 that you need are these ones to do the transfer. 1:09:26.092 --> 1:09:31.211 So if you have a noun phrase that also goes to the noun phrase. 1:09:31.691 --> 1:09:44.609 You see here that the switch is happening, so the second position is here at the first 1:09:44.609 --> 1:09:46.094 position. 1:09:46.146 --> 1:09:52.669 Then you have the translation of determiner of the words, so the dictionary entries. 1:09:53.053 --> 1:10:07.752 And with these types of rules you can then do these mappings and do the transfer between 1:10:07.752 --> 1:10:11.056 the representation. 1:10:25.705 --> 1:10:32.505 Think it more depends on the amount of expertise you have in representing them. 1:10:32.505 --> 1:10:35.480 The rules will get more difficult. 1:10:36.136 --> 1:10:42.445 For example, these rule based were, so I think it more depends on how difficult the structure 1:10:42.445 --> 1:10:42.713 is. 1:10:42.713 --> 1:10:48.619 So for German generating German they were quite long, quite successful because modeling 1:10:48.619 --> 1:10:52.579 all the German phenomena which are in there was difficult. 1:10:52.953 --> 1:10:56.786 And that can be done there, and it wasn't easy to learn that just from data. 1:10:59.019 --> 1:11:07.716 Think even if you think about Chinese and English or so, if you have the trees there 1:11:07.716 --> 1:11:10.172 is quite some rule and. 1:11:15.775 --> 1:11:23.370 Another thing is you can also try to do something like that on the semantic, which means this 1:11:23.370 --> 1:11:24.905 gets more complex. 1:11:25.645 --> 1:11:31.047 This gets maybe a bit easier because this representation, the semantic representation 1:11:31.047 --> 1:11:36.198 between languages, are more similar and therefore this gets more difficult again. 1:11:36.496 --> 1:11:45.869 So typically if you go higher in your triangle this is more work while this is less work. 1:11:49.729 --> 1:11:56.023 So it can be then, for example, like in Gusta, we have again that the the the order changes. 1:11:56.023 --> 1:12:02.182 So you see the transfer rule for like is that the first argument is here and the second is 1:12:02.182 --> 1:12:06.514 there, while on the on the Gusta side here the second argument. 1:12:06.466 --> 1:12:11.232 It is in the first position and the first argument is in the second position. 1:12:11.511 --> 1:12:14.061 So that you do yeah, and also there you're ordering,. 1:12:14.354 --> 1:12:20.767 From the principle it is more like you have a different type of formalism of representing 1:12:20.767 --> 1:12:27.038 your sentence and therefore you need to do more on one side and less on the other side. 1:12:32.852 --> 1:12:42.365 Then so in general transfer based approaches are you have to first select how to represent 1:12:42.365 --> 1:12:44.769 a synthetic structure. 1:12:45.165 --> 1:12:55.147 There's like these variable abstraction levels and then you have the three components: The 1:12:55.147 --> 1:13:04.652 disadvantage is that on the one hand you need normally a lot of experts monolingual experts 1:13:04.652 --> 1:13:08.371 who analyze how to do the transfer. 1:13:08.868 --> 1:13:18.860 And if you're doing a new language, you have to do analyst transfer in generation and the 1:13:18.860 --> 1:13:19.970 transfer. 1:13:20.400 --> 1:13:27.074 So if you need one language, add one language in existing systems, of course you have to 1:13:27.074 --> 1:13:29.624 do transfer to all the languages. 1:13:32.752 --> 1:13:39.297 Therefore, the other idea which people were interested in is the interlingua based machine 1:13:39.297 --> 1:13:40.232 translation. 1:13:40.560 --> 1:13:47.321 Where the idea is that we have this intermediate language with this abstract language independent 1:13:47.321 --> 1:13:53.530 representation and so the important thing is it's language independent so it's really the 1:13:53.530 --> 1:13:59.188 same for all language and it's a pure meaning and there is no ambiguity in there. 1:14:00.100 --> 1:14:05.833 That allows this nice translation without transfer, so you just do an analysis into your 1:14:05.833 --> 1:14:11.695 representation, and there afterwards you do the generation into the other target language. 1:14:13.293 --> 1:14:16.953 And that of course makes especially multilingual. 1:14:16.953 --> 1:14:19.150 It's like somehow is a dream. 1:14:19.150 --> 1:14:25.519 If you want to add a language you just need to add one analyst tool and one generation 1:14:25.519 --> 1:14:25.959 tool. 1:14:29.249 --> 1:14:32.279 Which is not the case in the other scenario. 1:14:33.193 --> 1:14:40.547 However, the big challenge is in this case the interlingua based representation because 1:14:40.547 --> 1:14:47.651 you need to represent all different types of knowledge in there in order to do that. 1:14:47.807 --> 1:14:54.371 And also like world knowledge, so something like an apple is a fruit and property is a 1:14:54.371 --> 1:14:57.993 fruit, so they are eatable and stuff like that. 1:14:58.578 --> 1:15:06.286 So that is why this is typically always only done for small amounts of data. 1:15:06.326 --> 1:15:13.106 So what people have done for special applications like hotel reservation people have looked into 1:15:13.106 --> 1:15:18.348 that, but they have typically not done it for any possibility of doing it. 1:15:18.718 --> 1:15:31.640 So the advantage is you need to represent all the world knowledge in your interlingua. 1:15:32.092 --> 1:15:40.198 And that is not possible at the moment or never was possible so far. 1:15:40.198 --> 1:15:47.364 Typically they were for small domains for hotel reservation. 1:15:51.431 --> 1:15:57.926 But of course this idea of doing that and that's why some people are interested in is 1:15:57.926 --> 1:16:04.950 like if you now do a neural system where you learn the representation in your neural network 1:16:04.950 --> 1:16:07.442 is that some type of artificial. 1:16:08.848 --> 1:16:09.620 Interlingua. 1:16:09.620 --> 1:16:15.025 However, what we at least found out until now is that there's often very language specific 1:16:15.025 --> 1:16:15.975 information in. 1:16:16.196 --> 1:16:19.648 And they might be important and essential. 1:16:19.648 --> 1:16:26.552 You don't have all the information in your input, so you typically can't do resolving 1:16:26.552 --> 1:16:32.412 all ambiguities inside there because you might not have all information. 1:16:32.652 --> 1:16:37.870 So in English you don't know if it's a living fish or the fish which you're eating, and if 1:16:37.870 --> 1:16:43.087 you're translating to Germany you also don't have to resolve this problem because you have 1:16:43.087 --> 1:16:45.610 the same ambiguity in your target language. 1:16:45.610 --> 1:16:50.828 So why would you put in our effort in finding out if it's a dish or the other fish if it's 1:16:50.828 --> 1:16:52.089 not necessary at all? 1:16:54.774 --> 1:16:59.509 Yeah Yeah. 1:17:05.585 --> 1:17:15.019 The semantic transfer is not the same for both languages, so you still represent the 1:17:15.019 --> 1:17:17.127 semantic language. 1:17:17.377 --> 1:17:23.685 So you have the like semantic representation in the Gusta, but that's not the same as semantic 1:17:23.685 --> 1:17:28.134 representation for both languages, and that's the main difference. 1:17:35.515 --> 1:17:44.707 Okay, then these are the most important things for today: what is language and how our rule 1:17:44.707 --> 1:17:46.205 based systems. 1:17:46.926 --> 1:17:59.337 And if there is no more questions thank you for joining, we have today a bit of a shorter 1:17:59.337 --> 1:18:00.578 lecture.