WEBVTT

0:00:01.561 --> 0:00:05.186
Okay So Um.

0:00:08.268 --> 0:00:17.655
Welcome to today's presentation of the second
class and machine translation where we'll today

0:00:17.655 --> 0:00:25.044
do a bit of a specific topic and we'll talk
about linguistic backgrounds.

0:00:26.226 --> 0:00:34.851
Will cover their three different parts of
the lecture.

0:00:35.615 --> 0:00:42.538
We'll do first a very, very brief introduction
about linguistic background in a way that what

0:00:42.538 --> 0:00:49.608
is language, what are ways of describing language,
what are a bit serious behind it, very, very

0:00:49.608 --> 0:00:50.123
short.

0:00:50.410 --> 0:00:57.669
Don't know some of you have listened, think
to NLP in the last semester or so.

0:00:58.598 --> 0:01:02.553
So there we did a lot longer explanation.

0:01:02.553 --> 0:01:08.862
Here is just because we are not talking about
machine translation.

0:01:09.109 --> 0:01:15.461
So it's really focused on the parts which
are important when we talk about machine translation.

0:01:15.755 --> 0:01:19.377
Though for everybody who has listened to that
already, it's a bit of a repetition.

0:01:19.377 --> 0:01:19.683
Maybe.

0:01:19.980 --> 0:01:23.415
But it's really trying to look.

0:01:23.415 --> 0:01:31.358
These are properties of languages and how
can they influence translation.

0:01:31.671 --> 0:01:38.928
We'll use that in the second part to discuss
why is machine translation more from what we

0:01:38.928 --> 0:01:40.621
know about language.

0:01:40.940 --> 0:01:47.044
We will see that I mean there's two main things
is that the language might express ideas and

0:01:47.044 --> 0:01:53.279
information differently, and if they are expressed
different in different languages we have to

0:01:53.279 --> 0:01:54.920
do somehow the transfer.

0:01:55.135 --> 0:02:02.771
And it's not purely that we know there's words
used for it, but it's not that simple and very

0:02:02.771 --> 0:02:03.664
different.

0:02:04.084 --> 0:02:10.088
And the other problem we mentioned last time
about biases is that there's not always the

0:02:10.088 --> 0:02:12.179
same amount of information in.

0:02:12.592 --> 0:02:18.206
So it can be that there's some more information
in the one or you can't express that few information

0:02:18.206 --> 0:02:19.039
on the target.

0:02:19.039 --> 0:02:24.264
We had that also, for example, with the example
with the rice plant in Germany, we would just

0:02:24.264 --> 0:02:24.820
say rice.

0:02:24.904 --> 0:02:33.178
Or in English, while in other countries you
have to distinguish between rice plant or rice

0:02:33.178 --> 0:02:33.724
as a.

0:02:34.194 --> 0:02:40.446
And then it's not always possible to directly
infer this on the surface.

0:02:41.781 --> 0:02:48.501
And if we make it to the last point otherwise
we'll do that next Tuesday or we'll partly

0:02:48.501 --> 0:02:55.447
do it only here is like we'll describe briefly
the three main approaches on a rule based so

0:02:55.447 --> 0:02:59.675
linguistic motivated ways of doing machine
translation.

0:02:59.779 --> 0:03:03.680
We mentioned them last time like the direct
translation.

0:03:03.680 --> 0:03:10.318
The translation by transfer the lingua interlingua
bass will do that a bit more in detail today.

0:03:10.590 --> 0:03:27.400
But very briefly because this is not a focus
of this class and then next week because.

0:03:29.569 --> 0:03:31.757
Why do we think this is important?

0:03:31.757 --> 0:03:37.259
On the one hand, of course, we are dealing
with natural language, so therefore it might

0:03:37.259 --> 0:03:43.074
be good to spend a bit of time in understanding
what we are really dealing with because this

0:03:43.074 --> 0:03:45.387
is challenging these other problems.

0:03:45.785 --> 0:03:50.890
And on the other hand, this was the first
way of how we're doing machine translation.

0:03:51.271 --> 0:04:01.520
Therefore, it's interesting to understand
what was the idea behind that and also to later

0:04:01.520 --> 0:04:08.922
see what is done differently and to understand
when some models.

0:04:13.453 --> 0:04:20.213
When we're talking about linguistics, we can
of course do that on different levels and there's

0:04:20.213 --> 0:04:21.352
different ways.

0:04:21.521 --> 0:04:26.841
On the right side here you are seeing the
basic levels of linguistics.

0:04:27.007 --> 0:04:31.431
So we have at the bottom the phonetics and
phonology.

0:04:31.431 --> 0:04:38.477
Phones will not cover this year because we
are mainly focusing on text input where we

0:04:38.477 --> 0:04:42.163
are directly having directors and then work.

0:04:42.642 --> 0:04:52.646
Then what we touch today, at least mention
what it is, is a morphology which is the first

0:04:52.646 --> 0:04:53.424
level.

0:04:53.833 --> 0:04:59.654
Already mentioned it a bit on Tuesday that
of course there are some languages where this

0:04:59.654 --> 0:05:05.343
is very, very basic and there is not really
a lot of rules of how you can build words.

0:05:05.343 --> 0:05:11.099
But since I assume you all have some basic
knowledge of German there is like a lot more

0:05:11.099 --> 0:05:12.537
challenges than that.

0:05:13.473 --> 0:05:20.030
You know, maybe if you're a native speaker
that's quite easy and everything is clear,

0:05:20.030 --> 0:05:26.969
but if you have to learn it like the endings
of a word, we are famous for doing compositar

0:05:26.969 --> 0:05:29.103
and putting words together.

0:05:29.103 --> 0:05:31.467
So this is like the first lab.

0:05:32.332 --> 0:05:40.268
Then we have the syntax, which is both on
the word and on the sentence level, and that's

0:05:40.268 --> 0:05:43.567
about the structure of the sentence.

0:05:43.567 --> 0:05:46.955
What are the functions of some words?

0:05:47.127 --> 0:05:51.757
You might remember part of speech text from
From Your High School Time.

0:05:51.757 --> 0:05:57.481
There is like noun and adjective and and things
like that and this is something helpful.

0:05:57.737 --> 0:06:03.933
Just imagine in the beginning that it was
not only used for rule based but for statistical

0:06:03.933 --> 0:06:10.538
machine translation, for example, the reordering
between languages was quite a challenging task.

0:06:10.770 --> 0:06:16.330
Especially if you have long range reorderings
and their part of speech information is very

0:06:16.330 --> 0:06:16.880
helpful.

0:06:16.880 --> 0:06:20.301
You know, in German you have to move the word
the verb.

0:06:20.260 --> 0:06:26.599
To the second position, if you have Spanish
you have to change the noun and the adjective

0:06:26.599 --> 0:06:30.120
so information from part of speech could be
very.

0:06:30.410 --> 0:06:38.621
Then you have a syntax base structure where
you have a full syntax tree in the beginning

0:06:38.621 --> 0:06:43.695
and then it came into statistical machine translation.

0:06:44.224 --> 0:06:50.930
And it got more and more important for statistical
machine translation that you are really trying

0:06:50.930 --> 0:06:53.461
to model the whole syntax tree of a.

0:06:53.413 --> 0:06:57.574
Sentence in order to better match how to do
that in UM.

0:06:57.574 --> 0:07:04.335
In the target language, a bit yeah, the syntax
based statistical machine translation had a

0:07:04.335 --> 0:07:05.896
bitter of a problem.

0:07:05.896 --> 0:07:08.422
It got better and better and was.

0:07:08.368 --> 0:07:13.349
Just on the way of getting better in some
languages than traditional statistical models.

0:07:13.349 --> 0:07:18.219
But then the neural models came up and they
were just so much better in modelling that

0:07:18.219 --> 0:07:19.115
all implicitly.

0:07:19.339 --> 0:07:23.847
So that they are never were used in practice
so much.

0:07:24.304 --> 0:07:34.262
And then we'll talk about the semantics, so
what is the meaning of the words?

0:07:34.262 --> 0:07:40.007
Last time words can have different meanings.

0:07:40.260 --> 0:07:46.033
And yeah, how you represent meaning of cause
is very challenging.

0:07:45.966 --> 0:07:53.043
And normally that like formalizing this is
typically done in quite limited domains because

0:07:53.043 --> 0:08:00.043
like doing that for like all possible words
has not really been achieved yet in this very

0:08:00.043 --> 0:08:00.898
challenge.

0:08:02.882 --> 0:08:09.436
About pragmatics, so pragmatics is then what
is meaning in the context of the current situation.

0:08:09.789 --> 0:08:16.202
So one famous example is there, for example,
if you say the light is red.

0:08:16.716 --> 0:08:21.795
The traffic light is red so that typically
not you don't want to tell the other person

0:08:21.795 --> 0:08:27.458
if you're sitting in a car that it's surprising
oh the light is red but typically you're meaning

0:08:27.458 --> 0:08:30.668
okay you should stop and you shouldn't pass
the light.

0:08:30.850 --> 0:08:40.994
So the meaning of this sentence, the light,
is red in the context of sitting in the car.

0:08:42.762 --> 0:08:51.080
So let's start with the morphology so that
with the things we are starting there and one

0:08:51.080 --> 0:08:53.977
easy and first thing is there.

0:08:53.977 --> 0:09:02.575
Of course we have to split the sentence into
words or joint directors so that we have word.

0:09:02.942 --> 0:09:09.017
Because in most of our work we'll deal like
machine translation with some type of words.

0:09:09.449 --> 0:09:15.970
In neuromachine translation, people are working
also on director based and subwords, but a

0:09:15.970 --> 0:09:20.772
basic unique words of the sentence is a very
important first step.

0:09:21.421 --> 0:09:32.379
And for many languages that is quite simple
in German, it's not that hard to determine

0:09:32.379 --> 0:09:33.639
the word.

0:09:34.234 --> 0:09:46.265
In tokenization, the main challenge is if
we are doing corpus-based methods that we are

0:09:46.265 --> 0:09:50.366
also dealing as normal words.

0:09:50.770 --> 0:10:06.115
And there of course it's getting a bit more
challenging.

0:10:13.173 --> 0:10:17.426
So that is maybe the main thing where, for
example, in Germany, if you think of German

0:10:17.426 --> 0:10:19.528
tokenization, it's easy to get every word.

0:10:19.779 --> 0:10:26.159
You split it at a space, but then you would
have the dots at the end join to the last word,

0:10:26.159 --> 0:10:30.666
and of course that you don't want because it's
a different word.

0:10:30.666 --> 0:10:37.046
The last word would not be go, but go dot,
but what you can do is split up the dots always.

0:10:37.677 --> 0:10:45.390
Can you really do that always or it might
be sometimes better to keep the dot as a point?

0:10:47.807 --> 0:10:51.001
For example, email addresses or abbreviations
here.

0:10:51.001 --> 0:10:56.284
For example, doctor, maybe it doesn't make
sense to split up the dot because then you

0:10:56.284 --> 0:11:01.382
would assume all year starts a new sentence,
but it's just the DR dot from doctor.

0:11:01.721 --> 0:11:08.797
Or if you have numbers like he's a seventh
person like the zipter, then you don't want

0:11:08.797 --> 0:11:09.610
to split.

0:11:09.669 --> 0:11:15.333
So there are some things where it could be
a bit more difficult, but it's not really challenging.

0:11:16.796 --> 0:11:23.318
In other languages it's getting a lot more
challenging, especially in Asian languages

0:11:23.318 --> 0:11:26.882
where often there are no spaces between words.

0:11:27.147 --> 0:11:32.775
So you just have the sequence of characters.

0:11:32.775 --> 0:11:38.403
The quick brown fox jumps over the lazy dog.

0:11:38.999 --> 0:11:44.569
And then it still might be helpful to work
on something like words.

0:11:44.569 --> 0:11:48.009
Then you need to have a bit more complex.

0:11:48.328 --> 0:11:55.782
And here you see we are again having our typical
problem.

0:11:55.782 --> 0:12:00.408
That means that there is ambiguity.

0:12:00.600 --> 0:12:02.104
So you're seeing here.

0:12:02.104 --> 0:12:08.056
We have exactly the same sequence of characters
or here, but depending on how we split it,

0:12:08.056 --> 0:12:12.437
it means he is your servant or he is the one
who used your things.

0:12:12.437 --> 0:12:15.380
Or here we have round eyes and take the air.

0:12:15.895 --> 0:12:22.953
So then of course yeah this type of tokenization
gets more important because you could introduce

0:12:22.953 --> 0:12:27.756
already arrows and you can imagine if you're
doing it here wrong.

0:12:27.756 --> 0:12:34.086
If you once do a wrong decision it's quite
difficult to recover from a wrong decision.

0:12:34.634 --> 0:12:47.088
And so in these cases looking about how we're
doing tokenization is an important issue.

0:12:47.127 --> 0:12:54.424
And then it might be helpful to do things
like director based models where we treat each

0:12:54.424 --> 0:12:56.228
director as a symbol.

0:12:56.228 --> 0:13:01.803
For example, do this decision in the later
or never really do this?

0:13:06.306 --> 0:13:12.033
The other thing is that if we have words we
might, it might not be the optimal unit to

0:13:12.033 --> 0:13:18.155
work with because it can be that we should
look into the internal structure of words because

0:13:18.155 --> 0:13:20.986
if we have a morphological rich language,.

0:13:21.141 --> 0:13:27.100
That means we have a lot of different types
of words, and if you have a lot of many different

0:13:27.100 --> 0:13:32.552
types of words, it on the other hand means
of course each of these words we have seen

0:13:32.552 --> 0:13:33.757
very infrequently.

0:13:33.793 --> 0:13:39.681
So if you only have ten words and you have
a large corpus, each word occurs more often.

0:13:39.681 --> 0:13:45.301
If you have three million different words,
then each of them will occur less often.

0:13:45.301 --> 0:13:51.055
Hopefully you know, from machine learning,
it's helpful if you have seen each example

0:13:51.055 --> 0:13:51.858
very often.

0:13:52.552 --> 0:13:54.524
And so why does it help?

0:13:54.524 --> 0:13:56.495
Why does it help happen?

0:13:56.495 --> 0:14:02.410
Yeah, in some languages we have quite a complex
information inside a word.

0:14:02.410 --> 0:14:09.271
So here's a word from a finish talosanikiko
or something like that, and it means in my

0:14:09.271 --> 0:14:10.769
house to question.

0:14:11.491 --> 0:14:15.690
So you have all these information attached
to the word.

0:14:16.036 --> 0:14:20.326
And that of course in extreme case that's
why typically, for example, Finnish is the

0:14:20.326 --> 0:14:20.831
language.

0:14:20.820 --> 0:14:26.725
Where machine translation quality is less
good because generating all these different

0:14:26.725 --> 0:14:33.110
morphological variants is is a challenge and
the additional challenge is typically in finish

0:14:33.110 --> 0:14:39.564
not really low resource but for in low resource
languages you quite often have more difficult

0:14:39.564 --> 0:14:40.388
morphology.

0:14:40.440 --> 0:14:43.949
Mean English is an example of a relatively
easy one.

0:14:46.066 --> 0:14:54.230
And so in general we can say that words are
composed of more themes, and more themes are

0:14:54.230 --> 0:15:03.069
the smallest meaning carrying unit, so normally
it means: All morphine should have some type

0:15:03.069 --> 0:15:04.218
of meaning.

0:15:04.218 --> 0:15:09.004
For example, here does not really have a meaning.

0:15:09.289 --> 0:15:12.005
Bian has some type of meaning.

0:15:12.005 --> 0:15:14.371
It's changing the meaning.

0:15:14.371 --> 0:15:21.468
The NES has the meaning that it's making out
of an adjective, a noun, and happy.

0:15:21.701 --> 0:15:31.215
So each of these parts conveys some meaning,
but you cannot split them further up and have

0:15:31.215 --> 0:15:32.156
somehow.

0:15:32.312 --> 0:15:36.589
You see that of course a little bit more is
happening.

0:15:36.589 --> 0:15:43.511
Typically the Y is going into an E so there
can be some variation, but these are typical

0:15:43.511 --> 0:15:46.544
examples of what we have as morphines.

0:16:02.963 --> 0:16:08.804
That is, of course, a problem and that's the
question why how you do your splitting.

0:16:08.804 --> 0:16:15.057
But that problem we have anyway always because
even full words can have different meanings

0:16:15.057 --> 0:16:17.806
depending on the context they're using.

0:16:18.038 --> 0:16:24.328
So we always have to somewhat have a model
which can infer or represent the meaning of

0:16:24.328 --> 0:16:25.557
the word in the.

0:16:25.825 --> 0:16:30.917
But you are right that this problem might
get even more severe if you're splitting up.

0:16:30.917 --> 0:16:36.126
Therefore, it might not be the best to go
for the very extreme and represent each letter

0:16:36.126 --> 0:16:41.920
and have a model which is only on letters because,
of course, a letter can have a lot of different

0:16:41.920 --> 0:16:44.202
meanings depending on where it's used.

0:16:44.524 --> 0:16:50.061
And yeah, there is no right solution like
what is the right splitting.

0:16:50.061 --> 0:16:56.613
It depends on the language and the application
on the amount of data you're having.

0:16:56.613 --> 0:17:01.058
For example, typically it means the fewer
data you have.

0:17:01.301 --> 0:17:12.351
The more splitting you should do, if you have
more data, then you can be better distinguish.

0:17:13.653 --> 0:17:19.065
Then there are different types of morphines:
So we have typically one stemmed theme: It's

0:17:19.065 --> 0:17:21.746
like house or tish, so the main meaning.

0:17:21.941 --> 0:17:29.131
And then you can have functional or bound
morphemes which can be f which can be prefix,

0:17:29.131 --> 0:17:34.115
suffix, infix or circumfix so it can be before
can be after.

0:17:34.114 --> 0:17:39.416
It can be inside or it can be around it, something
like a coughed there.

0:17:39.416 --> 0:17:45.736
Typically you would say that it's not like
two more themes, G and T, because they both

0:17:45.736 --> 0:17:50.603
describe the function, but together G and T
are marking the cough.

0:17:53.733 --> 0:18:01.209
For what are people using them you can use
them for inflection to describe something like

0:18:01.209 --> 0:18:03.286
tense count person case.

0:18:04.604 --> 0:18:09.238
That is yeah, if you know German, this is
commonly used in German.

0:18:10.991 --> 0:18:16.749
But of course there is a lot more complicated
things: I think in in some languages it also.

0:18:16.749 --> 0:18:21.431
I mean, in Germany it only depends counting
person on the subject.

0:18:21.431 --> 0:18:27.650
For the word, for example, in other languages
it can also determine the first and on the

0:18:27.650 --> 0:18:28.698
second object.

0:18:28.908 --> 0:18:35.776
So that it like if you buy an apple or an
house, that not only the, the, the.

0:18:35.776 --> 0:18:43.435
Kauft depends on on me like in German, but
it can also depend on whether it's an apple

0:18:43.435 --> 0:18:44.492
or a house.

0:18:44.724 --> 0:18:48.305
And then of course you have an exploding number
of web fronts.

0:18:49.409 --> 0:19:04.731
Furthermore, it can be used to do derivations
so you can make other types of words from it.

0:19:05.165 --> 0:19:06.254
And then yeah.

0:19:06.254 --> 0:19:12.645
This is like creating new words by joining
them like rainbow waterproof but for example

0:19:12.645 --> 0:19:19.254
in German like Einköw's Wagen, Ice Cult and
so on where you can join where you can do that

0:19:19.254 --> 0:19:22.014
with nouns and German adjectives and.

0:19:22.282 --> 0:19:29.077
Then of course you might have additional challenges
like the Fugan where you have to add this one.

0:19:32.452 --> 0:19:39.021
Yeah, then there is a yeah of course additional
special things.

0:19:39.639 --> 0:19:48.537
You have to sometimes put extra stuff because
of phonology, so it's dig the plural, not plural.

0:19:48.537 --> 0:19:56.508
The third person singular, as in English,
is normally S, but by Goes, for example, is

0:19:56.508 --> 0:19:57.249
an E S.

0:19:57.277 --> 0:20:04.321
In German you can also have other things that
like Osmutta gets Mutter so you're changing

0:20:04.321 --> 0:20:11.758
the Umlaud in order to express the plural and
in other languages for example the vowel harmony

0:20:11.758 --> 0:20:17.315
where the vowels inside are changing depending
on which form you have.

0:20:17.657 --> 0:20:23.793
Which makes things more difficult than splitting
a word into its part doesn't really work anymore.

0:20:23.793 --> 0:20:28.070
So like for Muta and Muta, for example, that
is not really possible.

0:20:28.348 --> 0:20:36.520
The nice thing is, of course, more like a
general thing, but often irregular things are

0:20:36.520 --> 0:20:39.492
happening as words which occur.

0:20:39.839 --> 0:20:52.177
So that you can have enough examples, while
the regular things you can do by some type

0:20:52.177 --> 0:20:53.595
of rules.

0:20:55.655 --> 0:20:57.326
Yeah, This Can Be Done.

0:20:57.557 --> 0:21:02.849
So there are tasks on this: how to do automatic
inflection, how to analyze them.

0:21:02.849 --> 0:21:04.548
So you give it a word to.

0:21:04.548 --> 0:21:10.427
It's telling you what are the possible forms
of that, like how they are built, and so on.

0:21:10.427 --> 0:21:15.654
And for the at least Ah Iris shoes language,
there are a lot of tools for that.

0:21:15.654 --> 0:21:18.463
Of course, if you now want to do that for.

0:21:18.558 --> 0:21:24.281
Some language which is very low resourced
might be very difficult and there might be

0:21:24.281 --> 0:21:25.492
no tool for them.

0:21:28.368 --> 0:21:37.652
Good before we are going for the next part
about part of speech, are there any questions

0:21:37.652 --> 0:21:38.382
about?

0:22:01.781 --> 0:22:03.187
Yeah, we'll come to that a bit.

0:22:03.483 --> 0:22:09.108
So it's a very good question and difficult
and especially we'll see that later if you

0:22:09.108 --> 0:22:14.994
just put in words it would be very bad because
words are put into neural networks just as

0:22:14.994 --> 0:22:15.844
some digits.

0:22:15.844 --> 0:22:21.534
Each word is mapped into a jitter and you
put it in so it doesn't really know any more

0:22:21.534 --> 0:22:22.908
about the structure.

0:22:23.543 --> 0:22:29.898
What we will see therefore the most successful
approach which is mostly done is a subword

0:22:29.898 --> 0:22:34.730
unit where we split: But we will do this.

0:22:34.730 --> 0:22:40.154
Don't know if you have been in advanced.

0:22:40.154 --> 0:22:44.256
We'll cover this on a Tuesday.

0:22:44.364 --> 0:22:52.316
So there is an algorithm called bite pairing
coding, which is about splitting words into

0:22:52.316 --> 0:22:52.942
parts.

0:22:53.293 --> 0:23:00.078
So it's doing the splitting of words but not
morphologically motivated but more based on

0:23:00.078 --> 0:23:00.916
frequency.

0:23:00.940 --> 0:23:11.312
However, it performs very good and that's
why it's used and there is a bit of correlation.

0:23:11.312 --> 0:23:15.529
Sometimes they agree on count based.

0:23:15.695 --> 0:23:20.709
So we're splitting words and we're splitting
especially words which are infrequent and that's

0:23:20.709 --> 0:23:23.962
maybe a good motivation why that's good for
neural networks.

0:23:23.962 --> 0:23:28.709
That means if you have seen a word very often
you don't need to split it and it's easier

0:23:28.709 --> 0:23:30.043
to just process it fast.

0:23:30.690 --> 0:23:39.218
While if you have seen the words infrequently,
it is good to split it into parts so it can

0:23:39.218 --> 0:23:39.593
do.

0:23:39.779 --> 0:23:47.729
So there is some way of doing it, but linguists
would say this is not a morphological analyst.

0:23:47.729 --> 0:23:53.837
That is true, but we are spitting words into
parts if they are not seen.

0:23:59.699 --> 0:24:06.324
Yes, so another important thing about words
are the paddle speech text.

0:24:06.324 --> 0:24:14.881
These are the common ones: noun, verb, adjective,
verb, determine, pronoun, proposition, and

0:24:14.881 --> 0:24:16.077
conjunction.

0:24:16.077 --> 0:24:26.880
There are some more: They are not the same
in all language, but for example there is this

0:24:26.880 --> 0:24:38.104
universal grammar which tries to do this type
of part of speech text for many languages.

0:24:38.258 --> 0:24:42.018
And then, of course, it's helping you for
generalization.

0:24:42.018 --> 0:24:48.373
There are some language deals with verbs and
nouns, especially if you look at sentence structure.

0:24:48.688 --> 0:24:55.332
And so if you know the part of speech tag
you can easily generalize and do get these

0:24:55.332 --> 0:24:58.459
rules or apply these rules as you know.

0:24:58.459 --> 0:25:02.680
The verb in English is always at the second
position.

0:25:03.043 --> 0:25:10.084
So you know how to deal with verbs independently
of which words you are now really looking at.

0:25:12.272 --> 0:25:18.551
And that again can be done is ambiguous.

0:25:18.598 --> 0:25:27.171
So there are some words which can have several
pot of speech text.

0:25:27.171 --> 0:25:38.686
Example are the word can, for example, which
can be the can of beans or can do something.

0:25:38.959 --> 0:25:46.021
Often is also in English related work.

0:25:46.021 --> 0:25:55.256
Access can be to excess or to access to something.

0:25:56.836 --> 0:26:02.877
Most words have only one single part of speech
tag, but they are some where it's a bit more

0:26:02.877 --> 0:26:03.731
challenging.

0:26:03.731 --> 0:26:09.640
The nice thing is the ones which are in big
are often more words, which occur more often,

0:26:09.640 --> 0:26:12.858
while for really ware words it's not that often.

0:26:13.473 --> 0:26:23.159
If you look at these classes you can distinguish
open classes where new words can happen so

0:26:23.159 --> 0:26:25.790
we can invent new nouns.

0:26:26.926 --> 0:26:31.461
But then there are the close classes which
I think are determined or pronoun.

0:26:31.461 --> 0:26:35.414
For example, it's not that you can easily
develop your new pronoun.

0:26:35.414 --> 0:26:38.901
So there is a fixed list of pronouns and we
are using that.

0:26:38.901 --> 0:26:44.075
So it's not like that or tomorrow there is
something happening and then people are using

0:26:44.075 --> 0:26:44.482
a new.

0:26:45.085 --> 0:26:52.426
Pronoun or new conjectures, so it's like end,
because it's not that you normally invent a

0:26:52.426 --> 0:26:52.834
new.

0:27:00.120 --> 0:27:03.391
And additional to part of speech text.

0:27:03.391 --> 0:27:09.012
Then some of these part of speech texts have
different properties.

0:27:09.389 --> 0:27:21.813
So, for example, for nouns and adjectives
we can have a singular plural: In other languages,

0:27:21.813 --> 0:27:29.351
there is a duel so that a word is not only
like a single or in plural, but also like a

0:27:29.351 --> 0:27:31.257
duel if it's meaning.

0:27:31.631 --> 0:27:36.246
You have the gender and masculine feminine
neutre we know.

0:27:36.246 --> 0:27:43.912
In other language there is animated and inanimated
and you have the cases like in German you have

0:27:43.912 --> 0:27:46.884
no maternative guinetive acquisitive.

0:27:47.467 --> 0:27:57.201
So here and then in other languages you also
have Latin with the upper teeth.

0:27:57.497 --> 0:28:03.729
So there's like more, it's just like yeah,
and there you have no one to one correspondence,

0:28:03.729 --> 0:28:09.961
so it can be that there are some cases which
are only in the one language and do not happen

0:28:09.961 --> 0:28:11.519
in the other language.

0:28:13.473 --> 0:28:20.373
For whorps we have tenses of course like walk
is walking walked have walked head walked will

0:28:20.373 --> 0:28:21.560
walk and so on.

0:28:21.560 --> 0:28:28.015
Interestingly for example in Japanese this
can also happen for adjectives though there

0:28:28.015 --> 0:28:32.987
is a difference between something is white
or something was white.

0:28:35.635 --> 0:28:41.496
There is this continuous thing which should
not really have that commonly in German and

0:28:41.496 --> 0:28:47.423
I guess that's if you're German and learning
English that's something like she sings and

0:28:47.423 --> 0:28:53.350
she is singing and of course we can express
that but it's not commonly used and normally

0:28:53.350 --> 0:28:55.281
we're not doing this aspect.

0:28:55.455 --> 0:28:57.240
Also about tenses.

0:28:57.240 --> 0:29:05.505
If you use pasts in English you will also
use past tenses in German, so we have similar

0:29:05.505 --> 0:29:09.263
tenses, but the use might be different.

0:29:14.214 --> 0:29:20.710
There is uncertainty like the mood in there
indicative.

0:29:20.710 --> 0:29:26.742
If he were here, there's voices active and
passive.

0:29:27.607 --> 0:29:34.024
That you know, that is like both in German
and English there, but there is something in

0:29:34.024 --> 0:29:35.628
the Middle and Greek.

0:29:35.628 --> 0:29:42.555
I get myself taught, so there is other phenomens
than which might only happen in one language.

0:29:42.762 --> 0:29:50.101
This is, like yeah, the different synthetic
structures that you can can have in the language,

0:29:50.101 --> 0:29:57.361
and where there's the two things, so it might
be that some only are in some language, others

0:29:57.361 --> 0:29:58.376
don't exist.

0:29:58.358 --> 0:30:05.219
And on the other hand there is also matching,
so it might be that in some situations you

0:30:05.219 --> 0:30:07.224
use different structures.

0:30:10.730 --> 0:30:13.759
The next would be then about semantics.

0:30:13.759 --> 0:30:16.712
Do you have any questions before that?

0:30:19.819 --> 0:30:31.326
I'll just continue, but if something is unclear
beside the structure, we typically have more

0:30:31.326 --> 0:30:39.863
ambiguities, so it can be that words itself
have different meanings.

0:30:40.200 --> 0:30:48.115
And we are typically talking about polysemy
and homonyme, where polysemy means that a word

0:30:48.115 --> 0:30:50.637
can have different meanings.

0:30:50.690 --> 0:30:58.464
So if you have the English word interest,
it can be that you are interested in something.

0:30:58.598 --> 0:31:07.051
Or it can be like the interest rate financial,
but it is somehow related because if you are

0:31:07.051 --> 0:31:11.002
getting some interest rates there is some.

0:31:11.531 --> 0:31:18.158
Are, but there is a homophemer where they
really are not related.

0:31:18.458 --> 0:31:24.086
So you can and can doesn't really have anything
in common, so it's really very different.

0:31:24.324 --> 0:31:29.527
And of course that's not completely clear
so there is not a clear definition so for example

0:31:29.527 --> 0:31:34.730
for the bank it can be that you say it's related
but it can also be other can argue that so

0:31:34.730 --> 0:31:39.876
there are some clear things which is interest
there are some which is vague and then there

0:31:39.876 --> 0:31:43.439
are some where it's very clear again that there
are different.

0:31:45.065 --> 0:31:49.994
And in order to translate them, of course,
we might need the context to disambiguate.

0:31:49.994 --> 0:31:54.981
That's typically where we can disambiguate,
and that's not only for lexical semantics,

0:31:54.981 --> 0:32:00.198
that's generally very often that if you want
to disambiguate, context can be very helpful.

0:32:00.198 --> 0:32:03.981
So in which sentence and which general knowledge
who is speaking?

0:32:04.944 --> 0:32:09.867
You can do that externally by some disinvigration
task.

0:32:09.867 --> 0:32:14.702
Machine translation system will also do it
internally.

0:32:16.156 --> 0:32:21.485
And sometimes you're lucky and you don't need
to do it because you just have the same ambiguity

0:32:21.485 --> 0:32:23.651
in the source and the target language.

0:32:23.651 --> 0:32:26.815
And then it doesn't matter if you think about
the mouse.

0:32:26.815 --> 0:32:31.812
As I said, you don't really need to know if
it's a computer mouse or the living mouse you

0:32:31.812 --> 0:32:36.031
translate from German to English because it
has exactly the same ambiguity.

0:32:40.400 --> 0:32:46.764
There's also relations between words like
synonyms, antonyms, hipponomes, like the is

0:32:46.764 --> 0:32:50.019
a relation and the part of like Dora House.

0:32:50.019 --> 0:32:55.569
Big small is an antonym and synonym is like
which needs something similar.

0:32:56.396 --> 0:33:03.252
There are resources which try to express all
these linguistic information like word net

0:33:03.252 --> 0:33:10.107
or German net where you have a graph with words
and how they are related to each other.

0:33:11.131 --> 0:33:12.602
Which can be helpful.

0:33:12.602 --> 0:33:18.690
Typically these things were more used in tasks
where there is fewer data, so there's a lot

0:33:18.690 --> 0:33:24.510
of tasks in NLP where you have very limited
data because you really need to hand align

0:33:24.510 --> 0:33:24.911
that.

0:33:25.125 --> 0:33:28.024
Machine translation has a big advantage.

0:33:28.024 --> 0:33:31.842
There's naturally a lot of text translated
out there.

0:33:32.212 --> 0:33:39.519
Typically in machine translation we have compared
to other tasks significantly amount of data.

0:33:39.519 --> 0:33:46.212
People have looked into integrating wordnet
or things like that, but it is rarely used

0:33:46.212 --> 0:33:49.366
in like commercial systems or something.

0:33:52.692 --> 0:33:55.626
So this was based on the words.

0:33:55.626 --> 0:34:03.877
We have morphology, syntax, and semantics,
and then of course it makes sense to also look

0:34:03.877 --> 0:34:06.169
at the bigger structure.

0:34:06.169 --> 0:34:08.920
That means information about.

0:34:08.948 --> 0:34:17.822
Of course, we don't have a really morphology
there because morphology about the structure

0:34:17.822 --> 0:34:26.104
of words, but we have syntax on the sentence
level and the semantic representation.

0:34:28.548 --> 0:34:35.637
When we are thinking about the sentence structure,
then the sentence is, of course, first a sequence

0:34:35.637 --> 0:34:37.742
of words terminated by a dot.

0:34:37.742 --> 0:34:42.515
Jane bought the house and we can say something
about the structure.

0:34:42.515 --> 0:34:47.077
It's typically its subject work and then one
or several objects.

0:34:47.367 --> 0:34:51.996
And the number of objects, for example, is
then determined by the word.

0:34:52.232 --> 0:34:54.317
It's Called the Valency.

0:34:54.354 --> 0:35:01.410
So you have intransitive verbs which don't
get any object, it's just to sleep.

0:35:02.622 --> 0:35:05.912
For example, there is no object sleep beds.

0:35:05.912 --> 0:35:14.857
You cannot say that: And there are transitive
verbs where you have to put one or more objects,

0:35:14.857 --> 0:35:16.221
and you always.

0:35:16.636 --> 0:35:19.248
Sentence is not correct if you don't put the
object.

0:35:19.599 --> 0:35:33.909
So if you have to buy something you have to
say bought this or give someone something then.

0:35:34.194 --> 0:35:40.683
Here you see a bit that may be interesting
the relation between word order and morphology.

0:35:40.683 --> 0:35:47.243
Of course it's not that strong, but for example
in English you always have to first say who

0:35:47.243 --> 0:35:49.453
you gave it and what you gave.

0:35:49.453 --> 0:35:53.304
So the structure is very clear and cannot
be changed.

0:35:54.154 --> 0:36:00.801
German, for example, has a possibility of
determining what you gave and whom you gave

0:36:00.801 --> 0:36:07.913
it because there is a morphology and you can
do what you gave a different form than to whom

0:36:07.913 --> 0:36:08.685
you gave.

0:36:11.691 --> 0:36:18.477
And that is a general tendency that if you
have morphology then typically the word order

0:36:18.477 --> 0:36:25.262
is more free and possible, while in English
you cannot express these information through

0:36:25.262 --> 0:36:26.482
the morphology.

0:36:26.706 --> 0:36:30.238
You typically have to express them through
the word order.

0:36:30.238 --> 0:36:32.872
It's not as free, but it's more restricted.

0:36:35.015 --> 0:36:40.060
Yeah, the first part is typically the noun
phrase, the subject, and that can not only

0:36:40.060 --> 0:36:43.521
be a single noun, but of course it can be a
longer phrase.

0:36:43.521 --> 0:36:48.860
So if you have Jane the woman, it can be Jane,
it can be the woman, it can a woman, it can

0:36:48.860 --> 0:36:52.791
be the young woman or the young woman who lives
across the street.

0:36:53.073 --> 0:36:56.890
All of these are the subjects, so this can
be already very, very long.

0:36:57.257 --> 0:36:58.921
And they also put this.

0:36:58.921 --> 0:37:05.092
The verb is on the second position in a bit
more complicated way because if you have now

0:37:05.092 --> 0:37:11.262
the young woman who lives across the street
runs to somewhere or so then yeah runs is at

0:37:11.262 --> 0:37:16.185
the second position in this tree but the first
position is quite long.

0:37:16.476 --> 0:37:19.277
And so it's not just counting okay.

0:37:19.277 --> 0:37:22.700
The second word is always is always a word.

0:37:26.306 --> 0:37:32.681
Additional to these simple things, there's
more complex stuff.

0:37:32.681 --> 0:37:43.104
Jane bought the house from Jim without hesitation,
or Jane bought the house in the pushed neighborhood

0:37:43.104 --> 0:37:44.925
across the river.

0:37:45.145 --> 0:37:51.694
And these often lead to additional ambiguities
because it's not always completely clear to

0:37:51.694 --> 0:37:53.565
which this prepositional.

0:37:54.054 --> 0:37:59.076
So that we'll see and you have, of course,
subclasses and so on.

0:38:01.061 --> 0:38:09.926
And then there is a theory behind it which
was very important for rule based machine translation

0:38:09.926 --> 0:38:14.314
because that's exactly what you're doing there.

0:38:14.314 --> 0:38:18.609
You would take the sentence, do the syntactic.

0:38:18.979 --> 0:38:28.432
So that we can have this constituents which
like describe the basic parts of the language.

0:38:28.468 --> 0:38:35.268
And we can create the sentence structure as
a context free grammar, which you hopefully

0:38:35.268 --> 0:38:42.223
remember from basic computer science, which
is a pair of non terminals, terminal symbols,

0:38:42.223 --> 0:38:44.001
production rules, and.

0:38:43.943 --> 0:38:50.218
And the star symbol, and you can then describe
a sentence by this phrase structure grammar:

0:38:51.751 --> 0:38:59.628
So a simple example would be something like
that: you have a lexicon, Jane is a noun, Frays

0:38:59.628 --> 0:39:02.367
is a noun, Telescope is a noun.

0:39:02.782 --> 0:39:10.318
And then you have these production rules sentences:
a noun phrase in the web phrase.

0:39:10.318 --> 0:39:18.918
The noun phrase can either be a determinized
noun or it can be a noun phrase and a propositional

0:39:18.918 --> 0:39:19.628
phrase.

0:39:19.919 --> 0:39:25.569
Or a prepositional phrase and a prepositional
phrase is a preposition and a non phrase.

0:39:26.426 --> 0:39:27.622
We're looking at this.

0:39:27.622 --> 0:39:30.482
What is the valency of the word we're describing
here?

0:39:33.513 --> 0:39:36.330
How many objects would in this case the world
have?

0:39:46.706 --> 0:39:48.810
We're looking at the web phrase.

0:39:48.810 --> 0:39:54.358
The web phrase is a verb and a noun phrase,
so one object here, so this would be for a

0:39:54.358 --> 0:39:55.378
balance of one.

0:39:55.378 --> 0:40:00.925
If you have intransitive verbs, it would be
verb phrases, just a word, and if you have

0:40:00.925 --> 0:40:03.667
two, it would be noun phrase, noun phrase.

0:40:08.088 --> 0:40:15.348
And yeah, then the, the, the challenge or
what you have to do is like this: Given a natural

0:40:15.348 --> 0:40:23.657
language sentence, you want to parse it to
get this type of pastry from programming languages

0:40:23.657 --> 0:40:30.198
where you also need to parse the code in order
to get the representation.

0:40:30.330 --> 0:40:39.356
However, there is one challenge if you parse
natural language compared to computer language.

0:40:43.823 --> 0:40:56.209
So there are different ways of how you can
express things and there are different pastures

0:40:56.209 --> 0:41:00.156
belonging to the same input.

0:41:00.740 --> 0:41:05.241
So if you have Jane buys a horse, how's that
an easy example?

0:41:05.241 --> 0:41:07.491
So you do the lexicon look up.

0:41:07.491 --> 0:41:13.806
Jane can be a noun phrase, a bias is a verb,
a is a determiner, and a house is a noun.

0:41:15.215 --> 0:41:18.098
And then you can now use the grammar rules
of here.

0:41:18.098 --> 0:41:19.594
There is no rule for that.

0:41:20.080 --> 0:41:23.564
Here we have no rules, but here we have a
rule.

0:41:23.564 --> 0:41:27.920
A noun is a non-phrase, so we have mapped
that to the noun.

0:41:28.268 --> 0:41:34.012
Then we can map this to the web phrase.

0:41:34.012 --> 0:41:47.510
We have a verb noun phrase to web phrase and
then we can map this to a sentence representing:

0:41:49.069 --> 0:41:53.042
We can have that even more complex.

0:41:53.042 --> 0:42:01.431
The woman who won the lottery yesterday bought
the house across the street.

0:42:01.431 --> 0:42:05.515
The structure gets more complicated.

0:42:05.685 --> 0:42:12.103
You now see that the word phrase is at the
second position, but the noun phrase is quite.

0:42:12.052 --> 0:42:18.655
Quite big in here and the p p phrases, it's
sometimes difficult where to put them because

0:42:18.655 --> 0:42:25.038
they can be put to the noun phrase, but in
other sentences they can also be put to the

0:42:25.038 --> 0:42:25.919
web phrase.

0:42:36.496 --> 0:42:38.250
Yeah.

0:42:43.883 --> 0:42:50.321
Yes, so then either it can have two tags,
noun or noun phrase, or you can have the extra

0:42:50.321 --> 0:42:50.755
rule.

0:42:50.755 --> 0:42:57.409
The noun phrase can not only be a determiner
in the noun, but it can also be a noun phrase.

0:42:57.717 --> 0:43:04.360
Then of course either you introduce additional
rules when what is possible or the problem

0:43:04.360 --> 0:43:11.446
that if you do pastures which are not correct
and then you have to add some type of probability

0:43:11.446 --> 0:43:13.587
which type is more probable.

0:43:16.876 --> 0:43:23.280
But of course some things also can't really
model easily with this type of cheese.

0:43:23.923 --> 0:43:32.095
There, for example, the agreement is not straightforward
to do so that in subject and work you can check

0:43:32.095 --> 0:43:38.866
that the person, the agreement, the number
in person, the number agreement is correct,

0:43:38.866 --> 0:43:41.279
but if it's a singular object.

0:43:41.561 --> 0:43:44.191
A singular verb, it's also a singular.

0:43:44.604 --> 0:43:49.242
Non-subject, and if it's a plural subject,
it's a plural work.

0:43:49.489 --> 0:43:56.519
Things like that are yeah, the agreement in
determining action driven now, so they also

0:43:56.519 --> 0:43:57.717
have to agree.

0:43:57.877 --> 0:44:05.549
Things like that cannot be easily done with
this type of grammar or this subcategorization

0:44:05.549 --> 0:44:13.221
that you check whether the verb is transitive
or intransitive, and that Jane sleeps is OK,

0:44:13.221 --> 0:44:16.340
but Jane sleeps the house is not OK.

0:44:16.436 --> 0:44:21.073
And Jane Walterhouse is okay, but Jane Walterhouse
is not okay.

0:44:23.183 --> 0:44:29.285
Furthermore, this long range dependency might
be difficult and which word orders are allowed

0:44:29.285 --> 0:44:31.056
and which are not allowed.

0:44:31.571 --> 0:44:40.011
This is also not directly so you can say Maria
give de man das bourg, de man give Maria das

0:44:40.011 --> 0:44:47.258
bourg, das bourg give Maria, de man aber Maria,
de man give des bourg is some.

0:44:47.227 --> 0:44:55.191
One yeah, which one from this one is possible
and not is sometimes not possible to model,

0:44:55.191 --> 0:44:56.164
is simple.

0:44:56.876 --> 0:45:05.842
Therefore, people have done more complex stuff
like this unification grammar and tried to

0:45:05.842 --> 0:45:09.328
model both the categories of verb.

0:45:09.529 --> 0:45:13.367
The agreement has to be that it's person and
single.

0:45:13.367 --> 0:45:20.028
You're joining that so you're annotating this
thing with more information and then you have

0:45:20.028 --> 0:45:25.097
more complex synthetic structures in order
to model also these types.

0:45:28.948 --> 0:45:33.137
Yeah, why is this difficult?

0:45:33.873 --> 0:45:39.783
We have different ambiguities and that makes
it different, so words have different part

0:45:39.783 --> 0:45:43.610
of speech text and if you have time flies like
an error.

0:45:43.583 --> 0:45:53.554
It can mean that sometimes the animal L look
like an arrow and or it can mean that the time

0:45:53.554 --> 0:45:59.948
is flying very fast is going away very fast
like an error.

0:46:00.220 --> 0:46:10.473
And if you want to do a pastry, these two
meanings have a different part of speech text,

0:46:10.473 --> 0:46:13.008
so flies is the verb.

0:46:13.373 --> 0:46:17.999
And of course that is a different semantic,
and so that is very different.

0:46:19.499 --> 0:46:23.361
And otherwise a structural.

0:46:23.243 --> 0:46:32.419
Ambiguity so that like some part of the sentence
can have different rules, so the famous thing

0:46:32.419 --> 0:46:34.350
is this attachment.

0:46:34.514 --> 0:46:39.724
So the cops saw the Bulgara with a binoculars.

0:46:39.724 --> 0:46:48.038
Then with a binocular can be attached to saw
or it can be attached to the.

0:46:48.448 --> 0:46:59.897
And so in the first two it's more probable
that he saw the theft, and not that the theft

0:46:59.897 --> 0:47:01.570
has the one.

0:47:01.982 --> 0:47:13.356
And this, of course, makes things difficult
while parsing and doing structure implicitly

0:47:13.356 --> 0:47:16.424
defining the semantics.

0:47:20.120 --> 0:47:29.736
Therefore, we would then go directly to semantics,
but maybe some questions about spintax and

0:47:29.736 --> 0:47:31.373
how that works.

0:47:33.113 --> 0:47:46.647
Then we'll do a bit more about semantics,
so now we only describe the structure of the

0:47:46.647 --> 0:47:48.203
sentence.

0:47:48.408 --> 0:47:55.584
And for the meaning of the sentence we typically
have the compositionality of meaning.

0:47:55.584 --> 0:48:03.091
The meaning of the full sentence is determined
by the meaning of the individual words, and

0:48:03.091 --> 0:48:06.308
they together form the meaning of the.

0:48:06.686 --> 0:48:17.936
For words that is partly true but not always
mean for things like rainbow, jointly rain

0:48:17.936 --> 0:48:19.086
and bow.

0:48:19.319 --> 0:48:26.020
But this is not always a case, while for sentences
typically that is happening because you can't

0:48:26.020 --> 0:48:30.579
directly determine the full meaning, but you
split it into parts.

0:48:30.590 --> 0:48:36.164
Sometimes only in some parts like kick the
bucket the expression.

0:48:36.164 --> 0:48:43.596
Of course you cannot get the meaning of kick
the bucket by looking at the individual or

0:48:43.596 --> 0:48:46.130
in German abyss in its grass.

0:48:47.207 --> 0:48:53.763
You cannot get that he died by looking at
the individual words of Bis ins grass, but

0:48:53.763 --> 0:48:54.611
they have.

0:48:55.195 --> 0:49:10.264
And there are different ways of describing
that some people have tried that more commonly

0:49:10.264 --> 0:49:13.781
used for some tasks.

0:49:14.654 --> 0:49:20.073
Will come to so the first thing would be something
like first order logic.

0:49:20.073 --> 0:49:27.297
If you have Peter loves Jane then you have
this meaning and you're having the end of representation

0:49:27.297 --> 0:49:33.005
that you have a love property between Peter
and Jane and you try to construct.

0:49:32.953 --> 0:49:40.606
That you will see this a lot more complex
than directly than only doing syntax but also

0:49:40.606 --> 0:49:43.650
doing this type of representation.

0:49:44.164 --> 0:49:47.761
The other thing is to try to do frame semantics.

0:49:47.867 --> 0:49:55.094
That means that you try to represent the knowledge
about the world and you have these ah frames.

0:49:55.094 --> 0:49:58.372
For example, you might have a frame to buy.

0:49:58.418 --> 0:50:05.030
And the meaning is that you have a commercial
transaction.

0:50:05.030 --> 0:50:08.840
You have a person who is selling.

0:50:08.969 --> 0:50:10.725
You Have a Person Who's Buying.

0:50:11.411 --> 0:50:16.123
You have something that is priced, you might
have a price, and so on.

0:50:17.237 --> 0:50:22.698
And then what you are doing in semantic parsing
with frame semantics you first try to determine.

0:50:22.902 --> 0:50:30.494
Which frames are happening in the sentence,
so if it's something with Bowie buying you

0:50:30.494 --> 0:50:33.025
would try to first identify.

0:50:33.025 --> 0:50:40.704
Oh, here we have to try Brain B, which does
not always have to be indicated by the verb

0:50:40.704 --> 0:50:42.449
cell or other ways.

0:50:42.582 --> 0:50:52.515
And then you try to find out which elements
of these frame are in the sentence and try

0:50:52.515 --> 0:50:54.228
to align them.

0:50:56.856 --> 0:51:01.121
Yeah, you have, for example, to buy and sell.

0:51:01.121 --> 0:51:07.239
If you have a model that has frames, they
have the same elements.

0:51:09.829 --> 0:51:15.018
In addition over like sentence, then you have
also a phenomenon beyond sentence level.

0:51:15.018 --> 0:51:20.088
We're coming to this later because it's a
special challenge for machine translation.

0:51:20.088 --> 0:51:22.295
There is, for example, co reference.

0:51:22.295 --> 0:51:27.186
That means if you first mention it, it's like
the President of the United States.

0:51:27.467 --> 0:51:30.107
And later you would refer to him maybe as
he.

0:51:30.510 --> 0:51:36.966
And that is especially challenging in machine
translation because you're not always using

0:51:36.966 --> 0:51:38.114
the same thing.

0:51:38.114 --> 0:51:44.355
Of course, for the president, it's he and
air in German, but for other things it might

0:51:44.355 --> 0:51:49.521
be different depending on the gender in languages
that you refer to it.

0:51:55.435 --> 0:52:03.866
So much for the background and the next, we
want to look based on the knowledge we have

0:52:03.866 --> 0:52:04.345
now.

0:52:04.345 --> 0:52:10.285
Why is machine translation difficult before
we have any more?

0:52:16.316 --> 0:52:22.471
The first type of problem is what we refer
to as translation divers.

0:52:22.471 --> 0:52:30.588
That means that we have the same information
in source and target, but the problem is that

0:52:30.588 --> 0:52:33.442
they are expressed differently.

0:52:33.713 --> 0:52:42.222
So it is not the same way, and we have to
translate these things more easily by just

0:52:42.222 --> 0:52:44.924
having a bit more complex.

0:52:45.325 --> 0:52:51.324
So example is if it's only a structure in
English, the delicious.

0:52:51.324 --> 0:52:59.141
The adjective is before the noun, while in
Spanish you have to put it after the noun,

0:52:59.141 --> 0:53:02.413
and so you have to change the word.

0:53:02.983 --> 0:53:10.281
So there are different ways of divergence,
so there can be structural divergence, which

0:53:10.281 --> 0:53:10.613
is.

0:53:10.550 --> 0:53:16.121
The word orders so that the order is different,
so in German we have that especially in the

0:53:16.121 --> 0:53:19.451
in the sub clause, while in English in the
sub clause.

0:53:19.451 --> 0:53:24.718
The verb is also at the second position, in
German it's at the end, and so you have to

0:53:24.718 --> 0:53:25.506
move it all.

0:53:25.465 --> 0:53:27.222
Um All Over.

0:53:27.487 --> 0:53:32.978
It can be that that it's a complete different
grammatical role.

0:53:33.253 --> 0:53:35.080
So,.

0:53:35.595 --> 0:53:37.458
You Have You Like Her.

0:53:38.238 --> 0:53:41.472
And eh in in.

0:53:41.261 --> 0:53:47.708
English: In Spanish it's a la ti gusta which
means she so now she is no longer like object

0:53:47.708 --> 0:53:54.509
but she is subject here and you are now acquisitive
and then pleases or like yeah so you really

0:53:54.509 --> 0:53:58.689
use a different sentence structure and you
have to change.

0:53:59.139 --> 0:54:03.624
Can also be the head switch.

0:54:03.624 --> 0:54:09.501
In English you say the baby just ate.

0:54:09.501 --> 0:54:16.771
In Spanish literary you say the baby finishes.

0:54:16.997 --> 0:54:20.803
So the is no longer the word, but the finishing
is the word.

0:54:21.241 --> 0:54:30.859
So you have to learn so you cannot always
have the same structures in your input and

0:54:30.859 --> 0:54:31.764
output.

0:54:36.856 --> 0:54:42.318
Lexical things like to swim across or to cross
swimming.

0:54:43.243 --> 0:54:57.397
You have categorical like an adjective gets
into a noun, so you have a little bread to

0:54:57.397 --> 0:55:00.162
make a decision.

0:55:00.480 --> 0:55:15.427
That is the one challenge and the even bigger
challenge is referred to as translation.

0:55:17.017 --> 0:55:19.301
That can be their lexical mismatch.

0:55:19.301 --> 0:55:21.395
That's the fish we talked about.

0:55:21.395 --> 0:55:27.169
If it's like the, the fish you eat or the
fish which is living is the two different worlds

0:55:27.169 --> 0:55:27.931
in Spanish.

0:55:28.108 --> 0:55:34.334
And then that's partly sometimes even not
known, so even the human might not be able

0:55:34.334 --> 0:55:34.627
to.

0:55:34.774 --> 0:55:40.242
Infer that you maybe need to see the context
you maybe need to have the sentences around,

0:55:40.242 --> 0:55:45.770
so one problem is that at least traditional
machine translation works on a sentence level,

0:55:45.770 --> 0:55:51.663
so we take each sentence and translate it independent
of everything else, but that's, of course,

0:55:51.663 --> 0:55:52.453
not correct.

0:55:52.532 --> 0:55:59.901
Will look into some ways of looking at and
doing document-based machine translation, but.

0:56:00.380 --> 0:56:06.793
There's gender information might be a problem,
so in English it's player and you don't know

0:56:06.793 --> 0:56:10.139
if it's Spieler Spielerin or if it's not known.

0:56:10.330 --> 0:56:15.770
But in the English, if you now generate German,
you should know is the reader.

0:56:15.770 --> 0:56:21.830
Does he know the gender or does he not know
the gender and then generate the right one?

0:56:22.082 --> 0:56:38.333
So just imagine a commentator if he's talking
about the player and you can see if it's male

0:56:38.333 --> 0:56:40.276
or female.

0:56:40.540 --> 0:56:47.801
So in generally the problem is that if you
have less information and you need more information

0:56:47.801 --> 0:56:51.928
in your target, this translation doesn't really
work.

0:56:55.175 --> 0:56:59.180
Another problem is we just talked about the
the.

0:56:59.119 --> 0:57:01.429
The co reference.

0:57:01.641 --> 0:57:08.818
So if you refer to an object and that can
be across sentence boundaries then you have

0:57:08.818 --> 0:57:14.492
to use the right pronoun and you cannot just
translate the pronoun.

0:57:14.492 --> 0:57:18.581
If the baby does not thrive on raw milk boil
it.

0:57:19.079 --> 0:57:28.279
And if you are now using it and just take
the typical translation, it will be: And That

0:57:28.279 --> 0:57:31.065
Will Be Ah Wrong.

0:57:31.291 --> 0:57:35.784
No, that will be even right because it is
dust baby.

0:57:35.784 --> 0:57:42.650
Yes, but I mean, you have to determine that
and it might be wrong at some point.

0:57:42.650 --> 0:57:48.753
So getting this this um yeah, it will be wrong
yes, that is right yeah.

0:57:48.908 --> 0:57:55.469
Because in English both are baby and milk,
and baby are both referred to it, so if you

0:57:55.469 --> 0:58:02.180
do S it will be to the first one referred to,
so it's correct, but in Germany it will be

0:58:02.180 --> 0:58:06.101
S, and so if you translate it as S it will
be baby.

0:58:06.546 --> 0:58:13.808
But you have to do Z because milk is female,
although that is really very uncommon because

0:58:13.808 --> 0:58:18.037
maybe a model is an object and so it should
be more.

0:58:18.358 --> 0:58:25.176
Of course, I agree there might be a situation
which is a bit created and not a common thing,

0:58:25.176 --> 0:58:29.062
but you can see that these things are not that
easy.

0:58:29.069 --> 0:58:31.779
Another example is this: Dr.

0:58:31.779 --> 0:58:37.855
McLean often brings his dog champion to visit
with his patients.

0:58:37.855 --> 0:58:41.594
He loves to give big wets loppy kisses.

0:58:42.122 --> 0:58:58.371
And there, of course, it's also important
if he refers to the dog or to the doctor.

0:58:59.779 --> 0:59:11.260
Another example of challenging is that we
don't have a fixed language and that was referred

0:59:11.260 --> 0:59:16.501
to morphology and we can build new words.

0:59:16.496 --> 0:59:23.787
So we can in all languages build new words
by just concatinating part of it like braxits,

0:59:23.787 --> 0:59:30.570
some things like: And then, of course, also
words don't exist in languages, don't exist

0:59:30.570 --> 0:59:31.578
in isolations.

0:59:32.012 --> 0:59:41.591
In Germany you can now use the word download
somewhere and you can also use a morphological

0:59:41.591 --> 0:59:43.570
operation on that.

0:59:43.570 --> 0:59:48.152
I guess there is even not the correct word.

0:59:48.508 --> 0:59:55.575
But so you have to deal with these things,
and yeah, in social meters.

0:59:55.996 --> 1:00:00.215
This word is maybe most of you have forgotten
already.

1:00:00.215 --> 1:00:02.517
This was ten years ago or so.

1:00:02.517 --> 1:00:08.885
I don't know there was a volcano in Iceland
which stopped Europeans flying around.

1:00:09.929 --> 1:00:14.706
So there is always new words coming up and
you have to deal with.

1:00:18.278 --> 1:00:24.041
Yeah, one last thing, so some of these examples
we have seen are a bit artificial.

1:00:24.041 --> 1:00:30.429
So one example what is very common with machine
translation doesn't really work is this box

1:00:30.429 --> 1:00:31.540
was in the pen.

1:00:32.192 --> 1:00:36.887
And maybe you would be surprised, at least
when read it.

1:00:36.887 --> 1:00:39.441
How can a box be inside a pen?

1:00:40.320 --> 1:00:44.175
Does anybody have a solution for that while
the sentence is still correct?

1:00:47.367 --> 1:00:51.692
Maybe it's directly clear for you, maybe your
English was aside, yeah.

1:00:54.654 --> 1:01:07.377
Yes, like at a farm or for small children,
and that is also called a pen or a pen on a

1:01:07.377 --> 1:01:08.254
farm.

1:01:08.368 --> 1:01:12.056
And then this is, and so you can mean okay.

1:01:12.056 --> 1:01:16.079
To infer these two meanings is quite difficult.

1:01:16.436 --> 1:01:23.620
But at least when I saw it, I wasn't completely
convinced because it's maybe not the sentence

1:01:23.620 --> 1:01:29.505
you're using in your daily life, and some of
these constructions seem to be.

1:01:29.509 --> 1:01:35.155
They are very good in showing where the problem
is, but the question is, does it really imply

1:01:35.155 --> 1:01:35.995
in real life?

1:01:35.996 --> 1:01:42.349
And therefore here some examples also that
we had here with a lecture translator that

1:01:42.349 --> 1:01:43.605
really occurred.

1:01:43.605 --> 1:01:49.663
They maybe looked simple, but you will see
that some of them still are happening.

1:01:50.050 --> 1:01:53.948
And they are partly about spitting words,
and then they are happening.

1:01:54.294 --> 1:01:56.816
So Um.

1:01:56.596 --> 1:02:03.087
We had a text about the numeral system in
German, the Silen system, which got splitted

1:02:03.087 --> 1:02:07.041
into sub parts because otherwise we can't translate.

1:02:07.367 --> 1:02:14.927
And then he did only a proximate match and
was talking about the binary payment system

1:02:14.927 --> 1:02:23.270
because the payment system was a lot more common
in the training data than the Thailand system.

1:02:23.823 --> 1:02:29.900
And so there you see like rare words, which
don't occur that often.

1:02:29.900 --> 1:02:38.211
They are very challenging to deal with because
we are good and inferring that sometimes, but

1:02:38.211 --> 1:02:41.250
for others that's very difficult.

1:02:44.344 --> 1:02:49.605
Another challenge is that, of course, the
context is very difficult.

1:02:50.010 --> 1:02:56.448
This is also an example a bit older from also
the lecture translators we were translating

1:02:56.448 --> 1:03:01.813
in mass lecture, and he was always talking
about the omens of the numbers.

1:03:02.322 --> 1:03:11.063
Which doesn't make any sense at all, but the
German word fortsizing can of course mean the

1:03:11.063 --> 1:03:12.408
sign and the.

1:03:12.732 --> 1:03:22.703
And if you not have the right to main knowledge
in there and encode it, it might use the main

1:03:22.703 --> 1:03:23.869
knowledge.

1:03:25.705 --> 1:03:31.205
A more recent version of that is like here
from a paper where it's about translating.

1:03:31.205 --> 1:03:36.833
We had this pivot based translation where
you translate maybe to English and to another

1:03:36.833 --> 1:03:39.583
because you have not enough training data.

1:03:40.880 --> 1:03:48.051
And we did that from Dutch to German guess
if you don't understand Dutch, if you speak

1:03:48.051 --> 1:03:48.710
German.

1:03:48.908 --> 1:03:56.939
So we have this raven forebuilt, which means
to geben in English.

1:03:56.939 --> 1:04:05.417
It's correctly in setting an example: However,
if we're then translate to German, he didn't

1:04:05.417 --> 1:04:11.524
get the full context, and in German you normally
don't set an example, but you give an example,

1:04:11.524 --> 1:04:16.740
and so yes, going through another language
you introduce their additional errors.

1:04:19.919 --> 1:04:27.568
Good so much for this are there more questions
about why this is difficult.

1:04:30.730 --> 1:04:35.606
Then we'll start with this one.

1:04:35.606 --> 1:04:44.596
I have to leave a bit early today in a quarter
of an hour.

1:04:44.904 --> 1:04:58.403
If you look about linguistic approaches to
machine translation, they are typically described

1:04:58.403 --> 1:05:03.599
by: So we can do a direct translation, so you
take the Suez language.

1:05:03.599 --> 1:05:09.452
Do not apply a lot of the analysis we were
discussing today about syntax representation,

1:05:09.452 --> 1:05:11.096
semantic representation.

1:05:11.551 --> 1:05:14.678
But you directly translate to your target
text.

1:05:14.678 --> 1:05:16.241
That's here the direct.

1:05:16.516 --> 1:05:19.285
Then there is a transfer based approach.

1:05:19.285 --> 1:05:23.811
Then you transfer everything over and you
do the text translation.

1:05:24.064 --> 1:05:28.354
And you can do that at two levels, more at
the syntax level.

1:05:28.354 --> 1:05:34.683
That means you only do synthetic analysts
like you do a pasture or so, or at the semantic

1:05:34.683 --> 1:05:37.848
level where you do a semantic parsing frame.

1:05:38.638 --> 1:05:51.489
Then there is an interlingua based approach
where you don't do any transfer anymore, but

1:05:51.489 --> 1:05:55.099
you only do an analysis.

1:05:57.437 --> 1:06:02.790
So how does now the direct transfer, the direct
translation?

1:06:03.043 --> 1:06:07.031
Look like it's one of the earliest approaches.

1:06:07.327 --> 1:06:18.485
So you do maybe some morphological analysts,
but not a lot, and then you do this bilingual

1:06:18.485 --> 1:06:20.202
word mapping.

1:06:20.540 --> 1:06:25.067
You might do some here in generations.

1:06:25.067 --> 1:06:32.148
These two things are not really big, but you
are working on.

1:06:32.672 --> 1:06:39.237
And of course this might be a first easy solution
about all the challenges we have seen that

1:06:39.237 --> 1:06:41.214
the structure is different.

1:06:41.214 --> 1:06:45.449
That you have to reorder, look at the agreement,
then work.

1:06:45.449 --> 1:06:47.638
That's why the first approach.

1:06:47.827 --> 1:06:54.618
So if we have different word order, structural
shifts or idiomatic expressions that doesn't

1:06:54.618 --> 1:06:55.208
really.

1:06:57.797 --> 1:07:05.034
Then there are these rule based approaches
which were more commonly used.

1:07:05.034 --> 1:07:15.249
They might still be somewhere: Mean most commonly
they are now used by neural networks but wouldn't

1:07:15.249 --> 1:07:19.254
be sure there is no system out there but.

1:07:19.719 --> 1:07:25.936
And in this transfer based approach we have
these steps there nicely visualized in the.

1:07:26.406 --> 1:07:32.397
Triangle, so we have the analytic of the sur
sentence where we then get some type of abstract

1:07:32.397 --> 1:07:33.416
representation.

1:07:33.693 --> 1:07:40.010
Then we are doing the transfer of the representation
of the source sentence into the representation

1:07:40.010 --> 1:07:40.263
of.

1:07:40.580 --> 1:07:46.754
And then we have the generation where we take
this abstract representation and do then the

1:07:46.754 --> 1:07:47.772
surface forms.

1:07:47.772 --> 1:07:54.217
For example, it might be that there is no
morphological variants in the episode representation

1:07:54.217 --> 1:07:56.524
and we have to do this agreement.

1:07:56.656 --> 1:08:00.077
Which components do you they need?

1:08:01.061 --> 1:08:08.854
You need monolingual source and target lexicon
and the corresponding grammars in order to

1:08:08.854 --> 1:08:12.318
do both the analyst and the generation.

1:08:12.412 --> 1:08:18.584
Then you need the bilingual dictionary in
order to do the lexical translation and the

1:08:18.584 --> 1:08:25.116
bilingual transfer rules in order to transfer
the grammar, for example in German, into the

1:08:25.116 --> 1:08:28.920
grammar in English, and that enables you to
do that.

1:08:29.269 --> 1:08:32.579
So an example is is something like this here.

1:08:32.579 --> 1:08:38.193
So if you're doing a syntactic transfer it
means you're starting with John E.

1:08:38.193 --> 1:08:38.408
Z.

1:08:38.408 --> 1:08:43.014
Apple you do the analyst then you have this
type of graph here.

1:08:43.014 --> 1:08:48.340
Therefore you need your monolingual lexicon
and your monolingual grammar.

1:08:48.748 --> 1:08:59.113
Then you're doing the transfer where you're
transferring this representation into this

1:08:59.113 --> 1:09:01.020
representation.

1:09:01.681 --> 1:09:05.965
So how could this type of translation then
look like?

1:09:07.607 --> 1:09:08.276
Style.

1:09:08.276 --> 1:09:14.389
We have the example of a delicious soup and
una soup deliciosa.

1:09:14.894 --> 1:09:22.173
This is your source language tree and this
is your target language tree and then the rules

1:09:22.173 --> 1:09:26.092
that you need are these ones to do the transfer.

1:09:26.092 --> 1:09:31.211
So if you have a noun phrase that also goes
to the noun phrase.

1:09:31.691 --> 1:09:44.609
You see here that the switch is happening,
so the second position is here at the first

1:09:44.609 --> 1:09:46.094
position.

1:09:46.146 --> 1:09:52.669
Then you have the translation of determiner
of the words, so the dictionary entries.

1:09:53.053 --> 1:10:07.752
And with these types of rules you can then
do these mappings and do the transfer between

1:10:07.752 --> 1:10:11.056
the representation.

1:10:25.705 --> 1:10:32.505
Think it more depends on the amount of expertise
you have in representing them.

1:10:32.505 --> 1:10:35.480
The rules will get more difficult.

1:10:36.136 --> 1:10:42.445
For example, these rule based were, so I think
it more depends on how difficult the structure

1:10:42.445 --> 1:10:42.713
is.

1:10:42.713 --> 1:10:48.619
So for German generating German they were
quite long, quite successful because modeling

1:10:48.619 --> 1:10:52.579
all the German phenomena which are in there
was difficult.

1:10:52.953 --> 1:10:56.786
And that can be done there, and it wasn't
easy to learn that just from data.

1:10:59.019 --> 1:11:07.716
Think even if you think about Chinese and
English or so, if you have the trees there

1:11:07.716 --> 1:11:10.172
is quite some rule and.

1:11:15.775 --> 1:11:23.370
Another thing is you can also try to do something
like that on the semantic, which means this

1:11:23.370 --> 1:11:24.905
gets more complex.

1:11:25.645 --> 1:11:31.047
This gets maybe a bit easier because this
representation, the semantic representation

1:11:31.047 --> 1:11:36.198
between languages, are more similar and therefore
this gets more difficult again.

1:11:36.496 --> 1:11:45.869
So typically if you go higher in your triangle
this is more work while this is less work.

1:11:49.729 --> 1:11:56.023
So it can be then, for example, like in Gusta,
we have again that the the the order changes.

1:11:56.023 --> 1:12:02.182
So you see the transfer rule for like is that
the first argument is here and the second is

1:12:02.182 --> 1:12:06.514
there, while on the on the Gusta side here
the second argument.

1:12:06.466 --> 1:12:11.232
It is in the first position and the first
argument is in the second position.

1:12:11.511 --> 1:12:14.061
So that you do yeah, and also there you're
ordering,.

1:12:14.354 --> 1:12:20.767
From the principle it is more like you have
a different type of formalism of representing

1:12:20.767 --> 1:12:27.038
your sentence and therefore you need to do
more on one side and less on the other side.

1:12:32.852 --> 1:12:42.365
Then so in general transfer based approaches
are you have to first select how to represent

1:12:42.365 --> 1:12:44.769
a synthetic structure.

1:12:45.165 --> 1:12:55.147
There's like these variable abstraction levels
and then you have the three components: The

1:12:55.147 --> 1:13:04.652
disadvantage is that on the one hand you need
normally a lot of experts monolingual experts

1:13:04.652 --> 1:13:08.371
who analyze how to do the transfer.

1:13:08.868 --> 1:13:18.860
And if you're doing a new language, you have
to do analyst transfer in generation and the

1:13:18.860 --> 1:13:19.970
transfer.

1:13:20.400 --> 1:13:27.074
So if you need one language, add one language
in existing systems, of course you have to

1:13:27.074 --> 1:13:29.624
do transfer to all the languages.

1:13:32.752 --> 1:13:39.297
Therefore, the other idea which people were
interested in is the interlingua based machine

1:13:39.297 --> 1:13:40.232
translation.

1:13:40.560 --> 1:13:47.321
Where the idea is that we have this intermediate
language with this abstract language independent

1:13:47.321 --> 1:13:53.530
representation and so the important thing is
it's language independent so it's really the

1:13:53.530 --> 1:13:59.188
same for all language and it's a pure meaning
and there is no ambiguity in there.

1:14:00.100 --> 1:14:05.833
That allows this nice translation without
transfer, so you just do an analysis into your

1:14:05.833 --> 1:14:11.695
representation, and there afterwards you do
the generation into the other target language.

1:14:13.293 --> 1:14:16.953
And that of course makes especially multilingual.

1:14:16.953 --> 1:14:19.150
It's like somehow is a dream.

1:14:19.150 --> 1:14:25.519
If you want to add a language you just need
to add one analyst tool and one generation

1:14:25.519 --> 1:14:25.959
tool.

1:14:29.249 --> 1:14:32.279
Which is not the case in the other scenario.

1:14:33.193 --> 1:14:40.547
However, the big challenge is in this case
the interlingua based representation because

1:14:40.547 --> 1:14:47.651
you need to represent all different types of
knowledge in there in order to do that.

1:14:47.807 --> 1:14:54.371
And also like world knowledge, so something
like an apple is a fruit and property is a

1:14:54.371 --> 1:14:57.993
fruit, so they are eatable and stuff like that.

1:14:58.578 --> 1:15:06.286
So that is why this is typically always only
done for small amounts of data.

1:15:06.326 --> 1:15:13.106
So what people have done for special applications
like hotel reservation people have looked into

1:15:13.106 --> 1:15:18.348
that, but they have typically not done it for
any possibility of doing it.

1:15:18.718 --> 1:15:31.640
So the advantage is you need to represent
all the world knowledge in your interlingua.

1:15:32.092 --> 1:15:40.198
And that is not possible at the moment or
never was possible so far.

1:15:40.198 --> 1:15:47.364
Typically they were for small domains for
hotel reservation.

1:15:51.431 --> 1:15:57.926
But of course this idea of doing that and
that's why some people are interested in is

1:15:57.926 --> 1:16:04.950
like if you now do a neural system where you
learn the representation in your neural network

1:16:04.950 --> 1:16:07.442
is that some type of artificial.

1:16:08.848 --> 1:16:09.620
Interlingua.

1:16:09.620 --> 1:16:15.025
However, what we at least found out until
now is that there's often very language specific

1:16:15.025 --> 1:16:15.975
information in.

1:16:16.196 --> 1:16:19.648
And they might be important and essential.

1:16:19.648 --> 1:16:26.552
You don't have all the information in your
input, so you typically can't do resolving

1:16:26.552 --> 1:16:32.412
all ambiguities inside there because you might
not have all information.

1:16:32.652 --> 1:16:37.870
So in English you don't know if it's a living
fish or the fish which you're eating, and if

1:16:37.870 --> 1:16:43.087
you're translating to Germany you also don't
have to resolve this problem because you have

1:16:43.087 --> 1:16:45.610
the same ambiguity in your target language.

1:16:45.610 --> 1:16:50.828
So why would you put in our effort in finding
out if it's a dish or the other fish if it's

1:16:50.828 --> 1:16:52.089
not necessary at all?

1:16:54.774 --> 1:16:59.509
Yeah Yeah.

1:17:05.585 --> 1:17:15.019
The semantic transfer is not the same for
both languages, so you still represent the

1:17:15.019 --> 1:17:17.127
semantic language.

1:17:17.377 --> 1:17:23.685
So you have the like semantic representation
in the Gusta, but that's not the same as semantic

1:17:23.685 --> 1:17:28.134
representation for both languages, and that's
the main difference.

1:17:35.515 --> 1:17:44.707
Okay, then these are the most important things
for today: what is language and how our rule

1:17:44.707 --> 1:17:46.205
based systems.

1:17:46.926 --> 1:17:59.337
And if there is no more questions thank you
for joining, we have today a bit of a shorter

1:17:59.337 --> 1:18:00.578
lecture.