diff --git "a/data/session_01.html" "b/data/session_01.html"
new file mode 100644--- /dev/null
+++ "b/data/session_01.html"
@@ -0,0 +1,4497 @@
+
+
+ 0:00
+ The following content is
+ provided under a Creative
+
+
0:02Commons license.
+
0:04Your support will help
+ MIT OpenCourseWare
+
0:06continue to offer high quality
+ educational resources for free.
+
0:10To make a donation or
+ view additional materials
+
0:13from hundreds of MIT courses,
+ visit MIT OpenCourseWare
+
0:17at ocw.mit.edu.
+
0:21ERIK DEMAINE: Welcome to 6.851
+ Advanced Data Structures.
+
0:25I am Erik Demaine.
+
0:26You can call me Erik.
+
0:28We have two TAs, Tom
+ Morgan and Justin Zhang.
+
0:31Tom's back there.
+
0:32Justin is late.
+
0:36And this class is
+ about all kinds
+
0:38of very cool data structures.
+
0:40You should have already
+ seen basic data structures
+
0:42like balance binary
+ search trees and things
+
0:45like that, log n
+ time to do wherever
+
0:48you want in one dimension.
+
0:50And here we're going to turn
+ all those data structures
+
0:52on their head and
+ consider them in all sorts
+
0:54of different models and
+ additional cool problems.
+
0:56Today we're going to talk about
+ time travel or temporal data
+
0:59structures, where
+ you're manipulating time
+
1:02as any good time
+ traveler should.
+
1:05Then we'll do geometry where
+ we have higher dimensional
+
1:07data, more than one dimension.
+
1:09Then we'll look at
+ a problem called
+
1:11dynamic optimality, which is,
+ is there one best binary search
+
1:14tree that rules them all?
+
1:16Then we'll look at something
+ called memory hierarchy, which
+
1:19is a way to model more realistic
+ computers which have cache
+
1:22and then more cache
+ and then main memory
+
1:24and then dish and all
+ these different levels.
+
1:26How do you optimize for that?
+
1:28Hashing is probably the most
+ famous, and most popular,
+
1:31most used data structure
+ in computer science.
+
1:33We'll do a little bit on that.
+
1:35Integers, when you know that
+ your data is integers and not
+
1:39just arbitrary black boxes that
+ you can compare or do whatever,
+
1:42you can do a lot
+ better with integers.
+
1:43You usually beat log n time.
+
1:45Often you can get constant time.
+
1:47For example, if you want
+ to do priority queues,
+
1:49you can do square
+ root log log n time.
+
1:51That's the best
+ known randomized.
+
1:55Dynamic graphs, you have
+ a graph you want to store,
+
1:58and the edges are being added
+ and maybe deleted, like you're
+
2:01representing a social network.
+
2:02And people are friending
+ and de-friending.
+
2:04You want to maintain some
+ interesting information
+
2:06about that graph.
+
2:07Strings, you have
+ a piece of text,
+
2:10such as the entire
+ worldwide web.
+
2:12And you want to search
+ for a substring.
+
2:14How do you do that efficiently?
+
2:15It's sort of the Google problem.
+
2:17Or you searching through
+ DNA for patterns, whenever.
+
2:20And finally succinct
+ data structures,
+
2:22which is all about
+ taking what we normally
+
2:24consider optimal
+ space or n space
+
2:26and reducing it down to the very
+ bare minimum of bits of space.
+
2:30Usually if you want
+ to store something
+
2:32where there's 2 to
+ the n possibilities,
+
2:34you want to get away
+ with n bits of space,
+
2:36maybe plus square root of
+ n or something very tiny.
+
2:40So that's the sync
+ data structures.
+
2:42So that's an overview
+ of the entire class.
+
2:44And these are sort of the
+ sections we'll be following.
+
2:47Let me give you a quick
+ administrative overview
+
2:50of what we're doing.
+
2:54Requirements for the class--
+
2:55I guess, first,
+ attending lecture.
+
2:58Obviously if you
+ don't attend lecture,
+
2:59there'll be videos online.
+
3:00So that's resolvable.
+
3:02But let me know if you're
+ not going to make it.
+
3:05We're going to have problems
+ sets roughly every week.
+
3:07If you're taking the
+ class for credit,
+
3:09they have a very simple rule
+ of one page in, one page out.
+
3:12This is more constraint on
+ us to write problems that
+
3:15have easy or short answers.
+
3:16You probably need
+ to think about them
+
3:18a little bit before they're
+ transparent, but then easy
+
3:21to write up.
+
3:23And then scribing lectures--
+ so we have a scribe for today,
+
3:26I hope.
+
3:27Here?
+
3:28Yes, good.
+
3:30So most of the
+ lectures have already
+
3:32been scribed in some
+ version, and your goal
+
3:34is to revise that scribe
+ notes that if you don't like
+
3:38handwritten notes, which
+ are also online, then easier
+
3:42for people to read.
+
3:44Let's see.
+
3:45Listeners welcome.
+
3:46We're going to have an
+ open problem session.
+
3:48I really like open problems.
+
3:49I really like solving
+ open problems.
+
3:50So we've done this every time
+ this class has been offered.
+
3:53So if you're interested in
+ also solving open problems,
+
3:55it's optional.
+
3:56I will organize-- in
+ a couple of weeks,
+
3:59we'll have a weekly
+ open problem session
+
4:03and try to solve
+ all the things that
+
4:06push the frontier of
+ advanced data structures.
+
4:09So in classes, we'll see
+ the state of the art.
+
4:11And then we'll change the state
+ of the art in those sessions.
+
4:15I think that's it.
+
4:16Any questions about
+ the class before we
+
4:18get into the fun stuff?
+
4:22All right.
+
4:23Let's do some time traveling.
+
4:27Before I get to time
+ traveling, though, I
+
4:28need to define our
+ model of computation.
+
4:32A theme in this class is that
+ the model of computation you're
+
4:35working with matters.
+
4:36Models matter.
+
4:38And there's lots of different
+ models of computation.
+
4:40We'll see a few of the
+ main ones in this class.
+
4:45And the starting
+ point, and the one
+
4:48we'll be using throughout today,
+ is called a pointer machine.
+
4:53It's an old one from the '80s.
+
4:57And it corresponds to
+ what you might think about
+
4:59if you've done a lot of
+ object-oriented programming,
+
5:02and before that,
+ structure-oriented programming,
+
5:04I guess.
+
5:05So you have a bunch of nodes.
+
5:08They have some fields in them,
+ a constant number of fields.
+
5:13You can think of these
+ as objects or strucs
+
5:16in c It used to be records
+ back in Pascal days,
+
5:20so a lot of the papers
+ call them records.
+
5:22You could just have a
+ constant number of fields.
+
5:24You could think of
+ those numbered, labeled.
+
5:25It doesn't really matter
+ because there's only
+
5:27a constant number of them.
+
5:28Each of the fields could be
+ a pointer to another node,
+
5:32could be a null pointer, or
+ could have some data in it.
+
5:35So I'll just assume that
+ all my data is integers.
+
5:43You can have a
+ pointer to yourself.
+
5:45You can have a pointer over
+ here, whatever you want.
+
5:48A pointer machine would
+ look something like this.
+
5:52In any moment, this is the
+ state of the pointer machine.
+
5:55So you think this as the memory
+ of your computer storing.
+
5:59And then you have
+ some operations
+
6:02that you're allowed to do.
+
6:03That's the computation
+ part of the model.
+
6:07You can think of this
+ as the memory model.
+
6:10What you're allowed to
+ do are create nodes.
+
6:13You can say something
+ like, x equals new node.
+
6:19You can, I don't
+ know, look at fields.
+
6:24You can do x equals y.field.
+
6:28You can set fields,
+ x.field equals y.
+
6:33You can compute on these
+ data, so you can add 5 and 7,
+
6:37do things like that.
+
6:39I'm not going to worry about--
+
6:40I'll just write et cetera.
+
6:43This is more a model
+ about how everything's
+
6:45organized in memory, not so
+ much about what you're allowed
+
6:48to do to the data items.
+
6:49In this lecture, it
+ won't matter what
+
6:50you're doing to the data items.
+
6:52We never touch them.
+
6:53We just copy them around.
+
6:56So am I missing anything?
+
6:59Probably.
+
7:01I guess you could destroy
+ nodes if you felt like it.
+
7:03But we won't have
+ to today, because we
+
7:06don't want to
+ throw anything away
+
7:07when you're time traveling.
+
7:09It's too dangerous.
+
7:12And then the one catch
+ here is, what are x and y?
+
7:18There's going to be one
+ node in this data structure
+
7:21or in your memory
+ called the root node.
+
7:24And you could think of that
+ as that's the thing you always
+
7:26have in your head.
+
7:27This is like your
+ cache, if you will.
+
7:29It's just got a constant
+ number of things, just
+
7:31like any other node.
+
7:32And x and y are
+ fields of the root.
+
7:40So that sort of
+ ties things down.
+
7:42You're always working
+ relative to the root.
+
7:45But you can look at the data,
+ basically follow this pointer,
+
7:50by looking at the field.
+
7:53You could set one
+ of these pointers--
+
7:55I think I probably need
+ another operation here,
+
7:58like x equals y.field1,
+ field2, that sort of thing,
+
8:03and maybe the reverse.
+
8:06But you can manipulate
+ all nodes sort
+
8:09of via the root is the idea.
+
8:10You follow pointers,
+ do whatever.
+
8:12So pretty obvious,
+ slightly annoying
+
8:14to write down formally.
+
8:16But that is pointer machine.
+
8:23And what we're going to
+ be talking about today
+
8:26in time travel is
+ suppose someone
+
8:28gives me a pointer machine
+ data structure, for example,
+
8:31balanced binary search
+ tree, linked list.
+
8:33A lot of data structures,
+ especially classic data
+
8:36structures, follow
+ pointer machine model.
+
8:39What we'd like to do is
+ transform that data structure
+
8:41or make a new
+ pointer machine data
+
8:42structure that does
+ extra cool things,
+
8:45namely travel through time.
+
8:47So that's what
+ we're going to do.
+
8:53There's two senses of time
+ travel or temporal data
+
8:57structures that we're going
+ to cover in this class.
+
9:02The one for today is
+ called persistence,
+
9:05where you don't forget
+ anything, like an elephant.
+
9:08And the other one
+ is retroactivity.
+
9:15Persistence will be today.
+
9:16Retroactivity is next class.
+
9:19Basically, these correspond
+ to two models of time travel.
+
9:21Persistence is the branching
+ universe time travel model,
+
9:24where if you make a
+ change in the past,
+
9:25you get a new universe.
+
9:27You never destroy old universes.
+
9:29Retroactivity is more
+ like Back to the Future,
+
9:33when you go back, make
+ a change, and then you
+
9:35can return to the present
+ and see what happened.
+
9:37This is a lot harder to do.
+
9:39And we'll work on
+ that next class.
+
9:42Persistence is what
+ we will do today.
+
9:46So persistence.
+
9:57The general idea of persistence
+ is to remember everything--
+
10:01the general goal, I would say.
+
10:05And by everything, I
+ mean different versions
+
10:07of the data structure.
+
10:08So you're doing data
+ structures in general.
+
10:11We have update operations
+ and query operations.
+
10:14We're mainly concerned
+ about updates here.
+
10:16Every time you do an
+ update, you think of it
+
10:18as taking a version of the data
+ structure and making a new one.
+
10:21And you never want to
+ destroy old versions.
+
10:23So even though an update
+ like an insert or something
+
10:26changes the data structure, we
+ want to remember that past data
+
10:29as well.
+
10:30And then let's make
+ this reasonable.
+
10:34All data structure operations
+ are relative to a specified
+
10:37version.
+
10:47So an update makes and
+ returns a new version.
+
11:05So when you do an
+ insert, you specify
+
11:08a version of your data
+ structure and the thing
+
11:10you want to insert.
+
11:11And the output is a new version.
+
11:13So then you could insert into
+ that new version, keep going,
+
11:16or maybe go back to the
+ old version, modify that.
+
11:20I haven't said exactly
+ what's allowed here,
+
11:22but this is sort of
+ the general goal.
+
11:25And then there are four
+ levels of persistence
+
11:30that you might want to get.
+
11:33First level is called
+ partial persistence.
+
11:37This is the easiest to obtain.
+
11:44And in partial
+ persistence, you're
+
11:47only allowed to update
+ the latest version, which
+
11:55means the versions
+ are linearly ordered.
+
12:01This is the easiest
+ to think about.
+
12:03And time travel can easily get
+ confusing, so start simple.
+
12:09We have a timeline of
+ various versions on it.
+
12:14This is the latest.
+
12:17And what we can do is
+ update that version.
+
12:19We'll get a new version, and
+ then our latest is this one.
+
12:23What this allows is looking back
+ at the past to an old version
+
12:27and querying that version.
+
12:29So you can still ask questions
+ about the old version,
+
12:31if you want to be able to do
+ a search on any of these data
+
12:33structures.
+
12:34But you can't change them.
+
12:35You can only change the
+ most recent version.
+
12:38So that's nice.
+
12:39It's kind of like time
+ machine on Mac, I guess.
+
12:44If you've ever seen the
+ movie Deja Vu, which is not
+
12:46very common, but it's
+ a good time travel
+
12:48movie, in the first half of
+ the movie, all they can do
+
12:51is look back at the past.
+
12:52Later they discover
+ that actually they
+
12:54have a full persistence model.
+
12:57It takes a while
+ for dramatic effect.
+
13:04In full persistence, you can
+ update anything you want--
+
13:08so update any version.
+
13:18and so then the
+ versions form a tree.
+
13:27OK.
+
13:28So in this model,
+ maybe you initially
+
13:30have a nice line of versions.
+
13:32But now if I go back to
+ this version and update it,
+
13:34I branch, get a
+ new version here.
+
13:37And then I might keep modifying
+ that version sometimes.
+
13:40Any of these guys can branch.
+
13:44So this is why I call it the
+ branching universe model, when
+
13:47you update your branch.
+
13:52So no version ever
+ gets destroyed here.
+
13:54Again, you can
+ query all versions.
+
13:56But now you can also
+ update any version.
+
13:59But you just make a new version.
+
14:00It's a totally new world.
+
14:02When I update this
+ version, this version
+
14:04knows nothing about all the--
+
14:06this doesn't know
+ about this future.
+
14:07It's created its own future.
+
14:10There's no way to sort of
+ merge those universes together.
+
14:14It's kind of sad.
+
14:16That's why we have the
+ third level of persistence,
+
14:22which lets us merge timelines.
+
14:24It's great for lots
+ of fiction out there.
+
14:35If you've seen the
+ old TV show Sliders,
+
14:38that would be
+ confluent persistence.
+
14:50So confluent persistence,
+ you can combine two versions
+
15:01to create a new version.
+
15:09And in this case, again, you
+ can't destroy old versions.
+
15:13In persistence, you
+ never destroy versions.
+
15:16So now the versions form a
+ DAG, directed acyclic graph.
+
15:22So now we're allowing--
+
15:24OK, you make some
+ changes, whatever.
+
15:25You branch your universe,
+ make some changes.
+
15:30And now I can say, OK, take
+ this version of the data
+
15:32structure and this version
+ and recombine them.
+
15:35Get a new version, and then
+ maybe make some more changes.
+
15:38OK, what does combine mean?
+
15:40Well, it depends on
+ your data structure.
+
15:42A lot of data structures
+ have combine operations
+
15:44like if you have linked lists,
+ you have two linked lists,
+
15:48you can concatenate them.
+
15:49That's an easy operation.
+
15:50Even if you have
+ binary search trees,
+
15:51you can concatenate
+ them reasonably easy
+
15:53and combine it into one
+ big binary search tree.
+
15:56So if your data structure
+ has an operation that
+
15:59takes as input two
+ data structures,
+
16:01then what we're saying is now
+ it can take two versions, which
+
16:05is more general.
+
16:06So I could take the
+ same data structure,
+
16:08make some changes in
+ one way, separately make
+
16:10some changes in a
+ different way, and then
+
16:12try to concatenate them
+ or do something crazy.
+
16:14This is hard to
+ do, and most of it
+
16:16is an open problem
+ whether it can be done.
+
16:19But I'll tell you about it.
+
16:21Then there's another level
+ even more than confluent
+
16:24persistence.
+
16:26This is hard to interpret
+ in the time travel world,
+
16:30but it would be functional
+ data structures.
+
16:32If you've ever programmed
+ in a functional programming
+
16:34language, it's a little bit
+ annoying from an algorithm's
+
16:36perspective, because it
+ constrains you to work
+
16:39in a purely functional world.
+
16:43You can never modify anything.
+
16:45OK.
+
16:46Now, we don't want
+ to modify versions.
+
16:49That's fine.
+
16:49But in a functional
+ data structure,
+
16:51you're not allowed to
+ modify any nodes ever.
+
16:54All you can do is
+ make new notes.
+
17:03This is constraining,
+ and you can't always
+
17:07get optimal running times
+ in the functional world.
+
17:09But if you can get a
+ functional data structure,
+
17:11you have all these
+ things, because you
+
17:13can't destroy anything.
+
17:14If you can't destroy
+ nodes, then in particular,
+
17:16you can't destroy versions.
+
17:17And all of these things
+ just work for free.
+
17:19And so a bunch of
+ special cases are known,
+
17:22interesting special
+ cases, like search trees
+
17:24you can do in the
+ functional world.
+
17:26And that makes all
+ of these things easy.
+
17:29So the rest of this
+ lecture is going
+
17:30to be general techniques for
+ doing partial full persistence,
+
17:34what we know about
+ confluent, and what
+
17:36we know about functional,
+ brief overview.
+
17:41Any questions about those
+ goals, problem definitions?
+
17:47Yeah.
+
17:48AUDIENCE: I'm still confused
+ about functional, because--
+
17:50ERIK DEMAINE: What
+ does functional mean?
+
17:52AUDIENCE: [INAUDIBLE]
+
17:55ERIK DEMAINE: Yeah, I
+ guess you'll see what--
+
17:57functional looks like all
+ the other things, I agree.
+
18:00You'll see in a moment
+ how we actually implement
+
18:02partial and persistence.
+
18:03We're going to be
+ changing nodes a lot.
+
18:07As long as we still
+ represent the same data
+
18:10in the old versions, we
+ don't have to represent it
+
18:13in the same way.
+
18:14That lets us do things
+ more efficiently.
+
18:15Whereas in functional,
+ you have to represent
+
18:17all the old versions
+ in exactly the way
+
18:19you used to represent them.
+
18:20Here we can kind of
+ mangle things around
+
18:22and it makes things faster.
+
18:23Yeah, good question.
+
18:25So it seems almost the same,
+ but it's nodes versus versions.
+
18:29I haven't really
+ defined a version.
+
18:31But it's just that all the
+ queries answer the same way.
+
18:34That's what you need
+ for persistence.
+
18:38Other questions?
+
18:40All right.
+
18:43Well, let's do some
+ real data structures.
+
18:49We start with
+ partial persistence.
+
18:55This is the easiest.
+
18:59For both partial and
+ full persistence,
+
19:02there is the following result.
+ Any pointer machine data
+
19:06structure, one catch with a
+ constant number of pointers
+
19:19to any node--
+
19:22so this is constant n degree.
+
19:27In a pointer machine, you
+ always have a constant number
+
19:30of pointers out
+ of a node at most.
+
19:32But for this result
+ to hold, we also
+
19:34need a constant number of
+ pointers into any node.
+
19:37So this is an extra constraint.
+
19:42Can be transformed into
+ another data structure that
+
19:47is partially persistent and does
+ all the things it used to do--
+
19:53so I'll just say, can be
+ made partially persistent.
+
20:00You have to pay something, but
+ you have to pay very little--
+
20:03constant amortized
+ factor overhead,
+
20:12multiplicative overhead
+ and constant amount
+
20:21of additive space per change
+ in the data structure.
+
20:30So every time you do a
+ modification in your pointer
+
20:33machine-- you set one of
+ the fields to something--
+
20:36you have to store that forever.
+
20:37So, I mean, this is the
+ best you could hope to do.
+
20:39You've got to store
+ everything that happened.
+
20:43You pay a constant
+ factor overhead, eh.
+
20:45We're theoreticians.
+
20:46That doesn't matter.
+
20:48Then you get any data
+ structure in this world
+
20:50can be made
+ partially persistent.
+
20:53That's nice.
+
20:54Let's prove it.
+
21:00OK, the idea is pretty simple.
+
21:04Pointer machines are all
+ about nodes and fields.
+
21:06So we just need to simulate
+ whatever the data structure is
+
21:09doing to those nodes
+ and fields in a way
+
21:11that we don't lose
+ all the information
+
21:13and we can still
+ search it very quickly.
+
21:17First idea is to
+ store back pointers.
+
21:21And this is why we need the
+ constant n degree constraint.
+
21:27So if we have a node--
+
21:31how do I want to
+ draw a node here?
+
21:35So maybe these are the
+ three fields of the node.
+
21:38I want to also store
+ some back pointers.
+
21:42Whenever there is a node
+ that points to this node,
+
21:48I want to have a
+ back pointer that
+
21:50points back so I know where
+ all the pointers came from.
+
21:54If there's only p pointers,
+ then this is fine.
+
21:57There'll be p fields here.
+
22:00So still constant, still in
+ the pointier machine model.
+
22:03OK, I'm going to need
+ some other stuff too.
+
22:08So this is a simple thing,
+ definitely want this.
+
22:11Because if my nodes
+ ever move around,
+
22:13I've got to update
+ the pointers to them.
+
22:15And where are those pointers?
+
22:16Well, the back pointers
+ tell you where they are.
+
22:20Nodes will still
+ be constant size,
+
22:22remain in pointer
+ machine data structure.
+
22:25OK.
+
22:26That's idea one.
+
22:28Idea two is this part.
+
22:35This is going to store
+ something called mods.
+
22:39It could stand for something,
+ but I'll leave it as mods.
+
22:44So these are two of the
+ fields of the data structure.
+
22:56Ah, one convenience here
+ is for back pointers,
+
23:01I'm only going to store it for
+ the latest version of the data
+
23:04structure.
+
23:16Sorry.
+
23:16I forgot about that.
+
23:19We'll come back to that later.
+
23:21And then the idea is to
+ store these modifications.
+
23:23How many modifications?
+
23:25Let's say up to p, twice p.
+
23:34p was the bound on the
+ n degree of a node.
+
23:38So I'm going to allow 2p
+ modifications over here.
+
23:43And what's a
+ modification look like?
+
23:48It's going to consist
+ of three things--
+
23:50get them in the right order--
+
23:53the version in which
+ something was changed,
+
23:56the field that got changed,
+ and the value it go changed to.
+
24:02So the idea is that these
+ are the fields here.
+
24:07We're not going to touch those.
+
24:09Once they're set to something--
+
24:11or, I mean, whatever
+ they are initially,
+
24:13they will stay that way.
+
24:15And so instead of actually
+ changing things like the data
+
24:18structure normally
+ would, we're just
+
24:19going to add modifications
+ here to say, oh,
+
24:21well at this time, this field
+ changed to the value of 5.
+
24:25And then later on, it
+ changed to the value 7.
+
24:27And then later on, this one
+ changed to the value 23,
+
24:31whatever.
+
24:32So that's what
+ they'll look like.
+
24:36There's a limit to how many--
+
24:37we can only store a constant
+ number of mods to each node.
+
24:40And our constant will be 2p.
+
24:44OK.
+
24:45Those are the
+ ideas, and now it's
+
24:46just a matter of
+ making this all work
+
24:47and analyzing that it's
+ constant amortized overhead.
+
24:53So first thing is if you
+ want to read a field,
+
25:06how would I read a field?
+
25:08This is really easy.
+
25:11First you look at what the
+ field is in the node itself.
+
25:15But then it might
+ have been changed.
+
25:17And so remember when
+ I say read the field,
+
25:19I actually mean while
+ I'm given some version,
+
25:21v, I want to know what is the
+ value of this field at version
+
25:24v, because I want to be able
+ to look at any of the old data
+
25:28structures too.
+
25:29So this would be at
+ version v. I just
+
25:37look through all
+ the modifications.
+
25:38There's constantly many,
+ so it takes constant time
+
25:40to just flip through them
+ and say, well, what changes
+
25:43have happened up to version v?
+
25:46So I look at mods
+ with version less than
+
25:56or equal to v. That will be all
+ the changes that happened up
+
25:59to this point.
+
26:00I see, did this field change?
+
26:02I look at the latest one.
+
26:03That will be how I read
+ the field of the node, so
+
26:07constant time.
+
26:08There's lots of ways to make
+ this efficient in practice.
+
26:10But for our purposes,
+ it doesn't matter.
+
26:13It's constant.
+
26:16The hard part is how
+ do you change a field?
+
26:18Because there might not be
+ any room in the mod structure.
+
26:35So to modify, say we want to
+ set node.field equal to x.
+
26:47What we do is first
+ we check, is there
+
26:52any space in the mod structure?
+
26:54If there's any blank mods,
+ so if the node is not full,
+
27:03we just add a mod.
+
27:06So a mod will look
+ like now field x.
+
27:13Just throw that in there.
+
27:15Because right at this moment--
+
27:17so we maintain a time
+ counter, just increment
+
27:19it ever time we do a change.
+
27:21This field changed that value.
+
27:23So that's the easy case.
+
27:24The trouble, of course,
+ is if the node is full--
+
27:29the moment you've
+ all been waiting for.
+
27:31So what we're going to do
+ here is make a new node.
+
27:33We've ran out of space.
+
27:34So we need to make a new node.
+
27:36We're not going to touch the old
+ node, just going to let it sit.
+
27:38It still maintains all
+ those old versions.
+
27:40Now we want a new node
+ that represents the latest
+
27:43and greatest of this node.
+
27:44OK.
+
27:45So make a new node.
+
27:51I'll call it node prime to
+ distinguish from node, where
+
27:57with all the mods, and this
+ modification in particular,
+
28:04applied.
+
28:07OK, so we make a new
+ version of this node.
+
28:11It's going to have some
+ different fields, whatever
+
28:15was the latest version
+ represented by those mods.
+
28:18It's still going to
+ have back pointers,
+
28:20so we have to maintain
+ all those back pointers.
+
28:24And now the mod,
+ initially, is going
+
28:26to be empty, because we
+ just applied them all.
+
28:29So this new node doesn't
+ have any recent mods.
+
28:32Old node represents
+ the old versions.
+
28:34This node is going to
+ represent the new versions.
+
28:37What's wrong with this picture?
+
28:39AUDIENCE: Update pointers.
+
28:40ERIK DEMAINE: Update pointers.
+
28:42Yeah, there's pointers
+ to the old version
+
28:43of the node, which are fine for
+ the old versions of the data
+
28:47structure.
+
28:48But for the latest version
+ of the data structure,
+
28:50this node has moved
+ to this new location.
+
28:53So if there are any old
+ pointers to that node,
+
28:56we've got to update them
+ in the current version.
+
28:58We have to update them to
+ point to this node instead.
+
29:00The old versions are fine, but
+ the new version is in trouble.
+
29:04Other questions or
+ all the same answer?
+
29:06Yeah.
+
29:06AUDIENCE: So if you wanted
+ to read an old version
+
29:10but you just have the
+ new version, [INAUDIBLE]?
+
29:15ERIK DEMAINE: OK--
+
29:16AUDIENCE: [INAUDIBLE]
+
29:17ERIK DEMAINE: The
+ question is essentially,
+
29:19how do we hold on to versions?
+
29:22Essentially, you can think of
+ a version of the data structure
+
29:24as where the root node is.
+
29:26That's probably the easiest.
+
29:27I mean, in general, we're
+ representing versions
+
29:29by a number, v. But we
+ always start at the root.
+
29:33And so you've given
+ the data structure,
+
29:35which is represented
+ by the root node.
+
29:36And you say, search
+ for the value 5.
+
29:40Is it in this binary
+ search tree or whatever?
+
29:43And then you just start
+ navigating from the root,
+
29:45but you know I'm inversion
+ a million or whatever.
+
29:49I know what version
+ I'm looking for.
+
29:51So you start with the root,
+ which never changes, let's say.
+
29:56And then you follow
+ pointers that
+
29:58essentially tell
+ you for that version
+
30:00where you should be going.
+
30:01I guess at the root version,
+ it's a little trickier.
+
30:03You probably want a little array
+ that says for this version,
+
30:07here's the root node.
+
30:08But that's a special case.
+
30:11Yeah.
+
30:11Another question?
+
30:13AUDIENCE: So on the
+ new node that you
+
30:15created, the fields that you
+ copied, you also have to have
+
30:19a version for them, right?
+
30:20Because [INAUDIBLE]?
+
30:26ERIK DEMAINE: These--
+
30:27AUDIENCE: Or do you
+ version the whole node?
+
30:30ERIK DEMAINE: Here we're
+ versioning the whole node.
+
30:32The original field
+ values represent
+
30:34what was originally there,
+ whenever this node was created.
+
30:37Then the mods specify what
+ time the fields change.
+
30:40So I don't think
+ we need times here.
+
30:44All right.
+
30:45So we've got to update
+ two kinds of pointers.
+
30:47There's regular
+ pointers, which live
+
30:48in the fields, which are
+ things pointing to the node.
+
30:52But then there's
+ also back pointers.
+
30:53Because if this is
+ a pointer to a node,
+
30:55then there'll be a back
+ pointer back to the node.
+
30:57And all of those have to change.
+
31:00Conveniently, the back
+ pointers are easy.
+
31:11So if they're back
+ pointers to the node,
+
31:13we change them to
+ the node prime.
+
31:14How do we find
+ the back pointers?
+
31:16Well, we just follow
+ all the pointers
+
31:17and then there will be
+ back pointers there.
+
31:21Because I said we're
+ only maintaining
+
31:23backed pointers for
+ the latest version,
+
31:25I don't need to preserve
+ the old versions
+
31:28of those backed pointers.
+
31:29So I just go in
+ and I change them.
+
31:31It takes constant time,
+ because the constant number
+
31:33of things I point to, each
+ one as a back pointer.
+
31:35So this is cheap.
+
31:37There's no persistence here.
+
31:39That's an advantage of
+ partial persistence.
+
31:41The hard part is
+ updating the pointers
+
31:44because those live in fields.
+
31:45I need to remember the old
+ versions of those fields.
+
31:47And that we do recursively.
+
31:58Because to change
+ those pointers,
+
32:00that's a field update.
+
32:01That's something
+ exactly of this form.
+
32:02So that's the same operation
+ but on a different node.
+
32:05So I just do that.
+
32:07I claim this is good.
+
32:08That's the end of the algorithm.
+
32:11Now we need to analyze it.
+
32:24How do we analyze it?
+
32:25Any guesses?
+
32:29AUDIENCE: Amortize it.
+
32:30ERIK DEMAINE: Amortized
+ analysis, exactly
+
32:32the answer I was looking for.
+
32:33OK.
+
32:34[INAUDIBLE] amortization.
+
32:36The most powerful
+ technique in amortization
+
32:38is probably the
+ potential method.
+
32:40So we're going to use that.
+
32:42There's a sort of more--
+
32:44you'll see a charging
+ argument in a moment.
+
32:50We want the potential function
+ to represent when this data
+
32:53structure is in a bad state.
+
32:55Intuitively, it's in a bad state
+ when a lot of nodes are full.
+
32:58Because then as soon as
+ you make a change in them,
+
33:00they will burst, and you have
+ to do all this crazy recursion
+
33:03and stuff.
+
33:04This case is nice and cheap.
+
33:05We just add a modification,
+ constant time.
+
33:08This case, not so nice
+ because we recurse.
+
33:10And then that's going
+ to cause more recursions
+
33:12and all sorts of
+ chaos could happen.
+
33:16So there's probably a few
+ different potential functions
+
33:20that would work here.
+
33:21And an old version
+ of these nodes I said
+
33:23should be the number
+ of full nodes.
+
33:25But I think we can make
+ life a little bit easier
+
33:27by the following.
+
33:32Basically, the total
+ number of modifications--
+
33:36not quite the total,
+ almost the total.
+
33:39So I'm going to do c times
+ the sum of the number of mods
+
33:49in latest version nodes.
+
33:59OK.
+
34:00So because we sort
+ of really only
+
34:02care about-- we're only
+ changing the latest version,
+
34:05so I really only
+ care about nodes that
+
34:07live in the latest version.
+
34:08What do I mean by this?
+
34:09Well, when I made
+ this new node prime,
+
34:11this becomes the new
+ representation of that node.
+
34:14The old version is dead.
+
34:15We will never change it again.
+
34:18If we're modifying, we will
+ never even look at it again.
+
34:21Because now everything
+ points to here.
+
34:24So I don't really
+ care about that node.
+
34:26It's got a ton of mods.
+
34:27But what's nice is that when
+ I create this new node, now
+
34:30the mod list is empty.
+
34:31So I start from scratch,
+ just like reinstalling
+
34:33your operating system.
+
34:34It's a good feeling.
+
34:38And so the potential goes down
+ by, I guess, c times 2 times p.
+
34:45When I do this change, potential
+ goes down by basically p.
+
34:49AUDIENCE: Is c any constant or--
+
34:52ERIK DEMAINE: c will be a
+ constant to be determined.
+
34:55I mean, it could be 1.
+
34:57It depends how you
+ want to define it.
+
34:58I'm going to use the CLRS
+ notion of amortized cost, which
+
35:02is actual cost plus
+ change in potential.
+
35:06And then I need a
+ constant here, because I'm
+
35:08measuring a running time versus
+ some combinatorial quantity.
+
35:12So this will be to match the
+ running time that we'll get to.
+
35:17OK.
+
35:17So what is amortized cost?
+
35:22There's sort of two
+ cases modification.
+
35:24There's the cheap case
+ and the not so cheap case.
+
35:28In general, amortized cost--
+
35:34in both cases, it's
+ going to be at most--
+
35:37well, first of all, we
+ do some constant work
+
35:39just to figure out all this
+ stuff, make copies, whatever.
+
35:44So that's some constant time.
+
35:49That's the part that I don't
+ want to try to measure.
+
35:52Then potentially,
+ we add a new mod.
+
35:55If we add a mod, that
+ increases the potential by c.
+
35:59Because we're just counting
+ mods, multiplying by c.
+
36:02So we might get plus 1 mod.
+
36:04This is going to
+ be an upper bound.
+
36:06We don't always add 1, but
+ worst case, we always had 1,
+
36:09let's say.
+
36:11And then there's
+ this annoying part.
+
36:14And this might happen,
+ might not happen.
+
36:16So then there's a plus maybe.
+
36:20If this happens, we
+ decrease the potential
+
36:23because we empty out the
+ mods for that node in terms
+
36:26of the latest version.
+
36:27So then we get a negative
+ 2cp, change in potential.
+
36:34And then we'd have to pay
+ I guess up to p recursions.
+
36:49Because we have to--
+
36:51how many pointers
+ are there to me?
+
36:53Well, at most p of them, because
+ there are at most p pointers
+
36:58to any node.
+
37:02OK.
+
37:03This is kind of a weird--
+
37:05it's not exactly algebra here.
+
37:06I have this thing, recursions.
+
37:09But if you think about
+ how this would expand,
+
37:11all right, this
+ is constant time.
+
37:13That's good.
+
37:14And then if we do this--
+
37:15I'll put a question mark here.
+
37:16It might be here.
+
37:16It might not.
+
37:18If it's not here, find constant.
+
37:19If it is here, then this gets
+ expanded into this thing.
+
37:24It's a weird way to
+ write a recurrence.
+
37:26But we get p times whatever
+ is in this right hand side.
+
37:30OK.
+
37:31But then there's this minus 2cp.
+
37:33So we're going to
+ get p times 2c here.
+
37:36That's the initial cost.
+
37:37So that will cancel with this.
+
37:40And then we might get
+ another recursion.
+
37:41But every time we get a
+ recursion, all the terms
+
37:43cancel.
+
37:44So it doesn't matter
+ whether this is here or not.
+
37:46You get 0, which is great.
+
37:49And you're left with
+ the original 2c.
+
37:53Constant.
+
37:55OK.
+
37:56[INAUDIBLE] potential functions
+ are always a little crazy.
+
37:59What's happening here is
+ that, OK, maybe you add a mod.
+
38:03That's cheap.
+
38:05But when we have to do this
+ work and we have to do this
+
38:08recursion-- this is up to
+ 2p updates or recursions--
+
38:14we are charging it to the
+ emptying of this node.
+
38:17The number of mods
+ went from 2p down to 0.
+
38:21And so we're just
+ charging this update cost
+
38:22to that modification.
+
38:24So if you like charging schemes,
+ this is much more intuitive.
+
38:26But with charging schemes,
+ it's always a little careful.
+
38:28You have to make sure
+ you're not double charging.
+
38:30Here it's obvious that
+ you're not double charging.
+
38:34Kind of a cool and magical.
+
38:37This is a paper by
+ Driscoll, Sarnak, Sleator,
+
38:42Tarjan from 1989.
+
38:43So it's very early
+ days of amortization.
+
38:45But they knew how to do it.
+
38:47Question?
+
38:48AUDIENCE: [INAUDIBLE]
+
38:50ERIK DEMAINE: What happens
+ if you overflow the root?
+
38:52Yeah, I never thought about
+ the root before today.
+
38:54But I think the way
+ to fix the root is
+
38:57just you have one big table
+ that says, for a given version--
+
39:02I guess a simple
+ way would be to say,
+
39:04not only is a version
+ a number, but it's also
+
39:06a pointer to the root.
+
39:07There we go.
+
39:07Pointer machine.
+
39:09So that way you're just
+ always explicitly maintaining
+
39:11the root copy or the pointer.
+
39:15Because otherwise,
+ you're in trouble.
+
39:18AUDIENCE: Then can you
+ go back to [INAUDIBLE].
+
39:21ERIK DEMAINE: So in order
+ to refer to an old version,
+
39:24you have to have the
+ pointer to that root node.
+
39:26If you want to do it just
+ from a version number,
+
39:29look at the data structure.
+
39:30Just from a version
+ number, you would
+
39:31need some kind of
+ lookup table, which
+
39:33is outside the pointer machine.
+
39:34So you could do it
+ in a real computer,
+
39:36but a pointer machine is
+ not technically allowed.
+
39:39So it's slightly awkward.
+
39:40No arrays are allowed
+ in pointer machines,
+
39:42in case that wasn't clear.
+
39:43Another question?
+
39:44AUDIENCE: [INAUDIBLE] constant
+ space to store for [INAUDIBLE].
+
39:48And also, what if we have
+ really big numbers [INAUDIBLE]?
+
39:54ERIK DEMAINE: In this model,
+ in the pointer machine model,
+
39:56we're assuming that whatever
+ the data is in the items
+
39:58take constant space each.
+
40:01If you want to know about
+ bigger things in here,
+
40:03then refer to future lectures.
+
40:05This is time travel, after all.
+
40:06Just go to a future
+ class and then come back.
+
40:09[LAUGHS] So we'll get
+ there, but right now,
+
40:11we're not thinking
+ about what's in here.
+
40:15Whatever big thing
+ you're trying to store,
+
40:16you reduce it down to
+ constant size things.
+
40:19And then you spread them around
+ nodes of a pointer machine.
+
40:22How you do that, that's
+ up to the data structure.
+
40:25We're just transforming the
+ data structure to be persistent.
+
40:28OK, you could ask about other
+ models than pointer machines,
+
40:30but we're going to stick
+ to pointer machines here.
+
40:34All right.
+
40:36That was partial persistence.
+
40:38Let's do full persistence.
+
40:41That was too easy.
+
40:46Same paper does
+ full persistence.
+
40:48Systems That was just a warm up.
+
40:50Full persistence is actually
+ not that much harder.
+
40:55So let me tell you
+ basically what changes.
+
41:04There are two issues.
+
41:05One is that everything here
+ has to change and not by much.
+
41:09We're still going to
+ use back pointers.
+
41:11We're still going
+ to have my mods.
+
41:12The number of mods is going
+ to be slightly different
+
41:15but basically the same.
+
41:16Back pointers no longer just
+ refer to the latest version.
+
41:19We have to maintain back
+ pointers in all versions.
+
41:21So that's annoying.
+
41:22But hey, that's life.
+
41:24The amortization, the
+ potential function
+
41:25will change slightly
+ but basically not much.
+
41:30Sort of the bigger issue you
+ might first wonder about,
+
41:33and it's actually the most
+ challenging technically,
+
41:35is versions are
+ no longer numbers.
+
41:37Because it's not a line.
+
41:39Versions are nodes in a tree.
+
41:41You should probably
+ call them vertices
+
41:42in a tree to distinguish them
+ from nodes in the pointer
+
41:45machine.
+
41:46OK, so you've got
+ this tree of versions.
+
41:48And then versions are just
+ some point on that tree.
+
41:53This is annoying
+ because we like lines.
+
41:57We don't like trees as much.
+
41:58So what we're going to
+ do is linearize the tree.
+
42:04Like, when in doubt, cheat.
+
42:12How do we do this?
+
42:13With tree traversal.
+
42:15Imagine I'm going to draw
+ a super complicated tree
+
42:18of versions.
+
42:19Say there are three versions.
+
42:21OK.
+
42:22I don't want to number
+ them, because that would be
+
42:24kind of begging the question.
+
42:26So let's just call
+ them x, y, and z.
+
42:33All right.
+
42:34I mean, it's a directed
+ tree, because we
+
42:36have the older versions.
+
42:37This is like the
+ original version.
+
42:38And we made a change.
+
42:39We made a different change
+ on the same version.
+
42:42What I'd like to do is a
+ traversal of that tree,
+
42:45like a regular, as if you're
+ going to sort those nodes.
+
42:48Actually, let me use
+ color, high def here.
+
42:53So here's our
+ traversal of the tree.
+
42:59And I want to look at the
+ first and the last time I
+
43:01visit each node.
+
43:02So here's the first
+ time I visit x.
+
43:05So I'll write this is
+ the beginning of x.
+
43:09Capital X. Then this is
+ the first time I visit y,
+
43:13so it's beginning of y.
+
43:15And then this is the last time
+ I visit y, so it's the end of y.
+
43:19And then, don't care.
+
43:20Then this is the beginning of z.
+
43:24And this is the end of z.
+
43:27And then this is the end x.
+
43:29If I write those sequentially,
+ I get bxbyeybzez,
+
43:38because this is so easy, ex.
+
43:42OK, you can think of these
+ as parentheses, right?
+
43:45For whatever reason I chose b
+ and e for beginning and ending,
+
43:48but this is like open
+ parens, close parens.
+
43:50This is easy to
+ do in linear time.
+
43:52I think you all know how.
+
43:53Except it's not
+ a static problem.
+
43:55Versions are changing
+ all the time.
+
43:56We're adding versions.
+
43:57We're never deleting
+ versions, but we're always
+
43:59adding stuff to here.
+
44:00It's a little
+ awkward, but the idea
+
44:01is I want to
+ maintain this order,
+
44:05maintain the begin and
+ the end of each you
+
44:16might say subtree of versions.
+
44:23This string, from
+ bx to ex, represents
+
44:25all of the stuff in x's
+ subtree, in the rooted tree
+
44:29starting at x.
+
44:33How do I maintain that?
+
44:40Using a data structure.
+
44:56So we're going to use something,
+ a data structure we haven't yet
+
45:00seen.
+
45:02It will be in lecture 8.
+
45:04This is a time travel
+ data structure,
+
45:06so I'm allowed to do that.
+
45:10So order maintenance
+ data structure.
+
45:14You can think of this as
+ a magical linked list.
+
45:16Let me tell you what the
+ magical linked list can do.
+
45:19You can insert--
+
45:22I'm going to call it
+ an item, because node
+
45:24would be kind of confusing
+ given where we are right now.
+
45:28You can insert a new item in
+ the list immediately before
+
45:32or after a given item.
+
45:37OK.
+
45:37This is like a
+ regular linked list.
+
45:41Here's a regular linked list.
+
45:44And if I'm given a particular
+ item like this one,
+
45:48I can say, well, insert
+ a new item right here.
+
45:51You say, OK.
+
45:51Fine.
+
45:52I'll just make a new node and
+ relink here, relink there.
+
45:57Constant time, right?
+
45:58So in an order maintenance
+ data structure,
+
46:00you can do this
+ in constant time.
+
46:01Wow!
+
46:02So amazing.
+
46:05OK, catch is the second
+ operation you can do.
+
46:08Maybe I'll number these.
+
46:09This is the update.
+
46:10Then there's the query.
+
46:13The query is, what
+ is the relative order
+
46:17of two notes, of two items?
+
46:24x and y.
+
46:27So now I give you this
+ node and this node.
+
46:29And I say, which is to the left?
+
46:32Which is earlier in the order?
+
46:34I want to know, is x
+ basically less than y
+
46:36in terms of the
+ order in the list?
+
46:37Or is y less than x?
+
46:41And an order maintenance
+ data structure
+
46:42can do this in constant time.
+
46:45Now it doesn't look like your
+ mother's linked list, I guess.
+
46:50It's not the link list
+ you learned in school.
+
46:52It's a magical linked
+ list that can somehow
+
46:54answer these queries.
+
46:55How?
+
46:56Go to lecture 7.
+
46:58OK.
+
46:59Forward reference,
+ lecture 8, sorry.
+
47:03For now, we're just going to
+ assume that this magical data
+
47:05structure exists.
+
47:06So in constant
+ time, this is great.
+
47:09Because if we're maintaining
+ these b's and e's, we
+
47:11want to maintain the order
+ that these things appear in.
+
47:16If we want to create
+ a new version,
+
47:17like suppose we were
+ just creating version z,
+
47:20well, it used to be everything
+ without this bz, ez.
+
47:23And we'd just insert two
+ items in here, bz and ez.
+
47:27They're right next
+ to each other.
+
47:28And if we were given version
+ x, we could just say,
+
47:30oh, we'll look at ex and insert
+ two items right before it.
+
47:34Or you can put them
+ right after bx.
+
47:36I mean, there's no
+ actual order here.
+
47:37So it could have been y first
+ and then z or z first and then
+
47:40y.
+
47:42So it's really easy to add a
+ new version in constant time.
+
47:44You just do two of
+ these insert operations.
+
47:47And now you have this magical
+ order operation, which
+
47:50if I'm given two versions--
+
47:54I don't know, v and w--
+
47:56and I want to know is
+ v an ancestor of w,
+
48:00now I can do it
+ in constant time.
+
48:02So this lets me do a third
+ operation, which is, is version
+
48:09v an ancestor of version w?
+
48:21Because that's going to
+ be true if and only if bv
+
48:26is an ev nest around bw and ew.
+
48:39OK.
+
48:40So that's just three tests.
+
48:41They're probably not
+ all even necessary.
+
48:43This one always holds.
+
48:45But if these guys fit in between
+ these guys, then you know--
+
48:50now, what this tells us,
+ what we care about here,
+
48:54is reading fields.
+
48:58When we read a field,
+ we said, oh, we'll
+
49:00apply all the modifications
+ that apply to version
+
49:02v. Before that, that
+ was a linear order.
+
49:04So it's just all versions
+ less than or equal to v. Now
+
49:06it's all versions that are
+ ancestors of v. Given a mod,
+
49:10we need to know, does this
+ mod apply to my version?
+
49:13And now I tell you, I can
+ do that in constant time
+
49:16through magic.
+
49:17I just test these
+ order relations.
+
49:20If they hold, then that
+ mod applies to my version.
+
49:24So w's the version
+ we're testing.
+
49:27v is some version in the mod.
+
49:29And I want to know, am
+ descendant of that version?
+
49:32If so, the mod applies.
+
49:34And I update what the field is.
+
49:36I can do all pairwise ancestor
+ checks and figure out,
+
49:39what is the most recent
+ version in my ancestor history
+
49:43that modified a given field?
+
49:44That lets me read a
+ field in constant time.
+
49:47Constants are getting
+ kind of big at this point,
+
49:49but it can be done.
+
49:53Clear?
+
49:54A little bit of
+ a black box here.
+
49:56But now we've gotten
+ as far as reading.
+
50:01And we don't need
+ to change much else.
+
50:04So this is good news
+
50:11Maybe I'll give you
+ a bit of a diff.
+
50:15So full persistence,
+ fully persistent theorem--
+
50:26done.
+
50:27OK.
+
50:27Same theorem just
+ with full persistence.
+
50:30How do we do it?
+
50:31We store back pointers
+ now for all versions.
+
50:35It's a little bit annoying.
+
50:36But how many mods do we use?
+
50:40There's lots of ways
+ to get this to work,
+
50:42but I'm going to
+ change this number
+
50:44to 2 times d plus p plus 1.
+
50:51Wait, what's d? d is the
+ number of fields here.
+
50:56OK.
+
50:57We said it was
+ constant number fields.
+
50:59I never said what that constant
+ is. d for out degree, I guess.
+
51:03So p is in degree, max in
+ degree. d is max out degree.
+
51:09So just slightly more--
+ that main reason for this
+
51:11is because back pointers now
+ are treated like everyone else.
+
51:14We have to treat both the out
+ pointers and the in pointers
+
51:17as basically the same.
+
51:18So instead of p,
+ we have d plus p.
+
51:19And there's a plus
+ 1 just for safety.
+
51:23It gets my amortization
+ to work, hopefully.
+
51:28OK.
+
51:29Not much else-- this
+ page is all the same.
+
51:32Mods are still, you give
+ versions, fields, values,
+
51:35reading.
+
51:36OK, well, this is no longer
+ less than or equal to v. But
+
51:41this is now with a version, sort
+ of the nearest version, that's
+
51:47an ancestor of v.
+
51:50That's what we were
+ just talking about.
+
51:52So that can be done
+ in constant time.
+
51:54Check it for all of
+ them, constant work.
+
51:57OK.
+
51:58That was the first part.
+
52:04Now we get to the hard
+ part, which is modification.
+
52:07This is going to be different.
+
52:08Maybe you I should just erase--
+
52:10yeah, I think I'll
+ erase everything,
+
52:13except the first clause.
+
52:24OK.
+
52:24If a node is not
+ full, we'll just
+
52:26add a mod, just like before.
+
52:28What changes is
+ when a node is full.
+
52:36Here we have to do something
+ completely different.
+
52:38Why?
+
52:38Because if we just
+ make a new version
+
52:41of this node that has empty
+ mods, this one's still full.
+
52:45And I can keep modifying
+ the same version.
+
52:48This new node that I just erased
+ represents some new version.
+
52:52But if I keep modifying
+ an old version, which
+
52:54I can do in full persistence,
+ this node keeps being full.
+
52:57And I keep paying
+ potentially huge cost.
+
53:00If all the nodes were full,
+ and when I make this change
+
53:02every node gets
+ copied, and then I
+
53:04make a change to
+ the same version,
+
53:06every node gets copied again.
+
53:07This is going to take
+ linear time per operation.
+
53:09So I can't do the old strategy.
+
53:11I need to somehow make
+ this node less full.
+
53:15This is where we're
+ definitely not functional.
+
53:17None of this was
+ functional, but now I'm
+
53:19going to change an old node, not
+ just make a new one in a more
+
53:24drastic way.
+
53:25Before I was adding a mod.
+
53:27That's not a
+ functional operation.
+
53:28Now I'm actually going to remove
+ mods from a node to rebalance.
+
53:33So what I'd like to do is
+ split the node into two halves.
+
53:43OK.
+
53:43So I had some big
+ node that was--
+
53:46I'll draw it-- completely full.
+
53:50Now I'm going to make two nodes.
+
53:52Here we go.
+
53:59This one is going
+ to be half full.
+
54:01This one's going to
+ be half full of mods.
+
54:04OK.
+
54:05The only question left is, what
+ do I do with all these things?
+
54:12Basically what I'd like
+ to do is have the--
+
54:14on the one hand, I want
+ to have the old node.
+
54:18It's just where it used to be.
+
54:20I've just removed half of
+ the mods, the second half,
+
54:23the later half.
+
54:25What does that mean?
+
54:26I don't know.
+
54:27Figure it out.
+
54:29It's linearized.
+
54:31I haven't thought
+ deeply about that.
+
54:32Now we're going to make a
+ new node with the second half
+
54:36of the mods.
+
54:40It's more painful
+ than I thought.
+
54:41In reality, these mods represent
+ a tree of modifications.
+
54:45And what you need to do is
+ find a partition of that tree
+
54:48into two roughly equal halves.
+
54:51You can actually do a
+ one third, 2/3 split.
+
54:52That's also in a future lecture,
+ which whose number I forget.
+
54:57So really, you're
+ splitting this tree
+
54:58into two roughly
+ balanced halves.
+
55:01And so this 2 might actually
+ need to change to a 3,
+
55:03but it's a constant.
+
55:06OK.
+
55:07What I want is for
+ this to represent
+
55:09a subtree of versions.
+
55:10Let me draw the picture.
+
55:11So here's a tree of versions
+ represented by the old mods.
+
55:15I'd like to cut out a
+ subtree rooted at some node.
+
55:18So let's just assume
+ for now this has exactly
+
55:21half the nodes.
+
55:22And this has half the nodes.
+
55:25In reality, I think it
+ can be one third, 2/3.
+
55:29OK.
+
55:29But let's keep it convenient.
+
55:32So I want the new
+ node to represent
+
55:34this subtree and this node
+ to represent everything else.
+
55:37This node is as if this
+ stuff hasn't happened yet.
+
55:41I mean, so it represents all
+ these old versions that do not,
+
55:44that are not in the subtree.
+
55:45This represents all
+ the latest stuff.
+
55:47So what I'm going to
+ do is like before, I
+
55:49want to apply some
+ mods to these fields.
+
55:54And whatever minds were
+ relevant at this point, whatever
+
55:58had been applied, I apply
+ those to the fields here.
+
56:02And so that means I can
+ remove all of these mods.
+
56:06I only cared about these ones.
+
56:09Update these fields accordingly.
+
56:11I still have the other mods to
+ represent all the other changes
+
56:14that could be in that subtree.
+
56:16OK.
+
56:16So we actually split the tree,
+ and we apply mods to new nodes.
+
56:38Anything else I need to say?
+
56:42Oh, now we need to
+ update pointers.
+
56:44That's always the fun part.
+
56:49Let's go over here.
+
57:05So old node hasn't moved.
+
57:07But this new node has moved.
+
57:09So for all of these
+ versions, I want
+
57:13to change the pointer that
+ used to point to old node
+
57:18should now point to new node.
+
57:20In this version, it's fine.
+
57:21It should still
+ point to old node,
+
57:23because this represents
+ all those old versions.
+
57:25But for the new version,
+ that version in the subtree,
+
57:28I've got to point here instead.
+
57:30OK.
+
57:31So how many pointers could
+ there be to this node
+
57:37that need to change.
+
57:38That's a tricky part
+ in this analysis.
+
57:41Think about it for a while.
+
57:45I mean, in this
+ new node, whatever
+
57:47is pointed to by either here or
+ here in the new node also has
+
57:50a return pointer.
+
57:50All pointers are bidirectional.
+
57:52So we don't really care
+ about whether they're forward
+
57:54or backward.
+
57:54How many pointers
+ are there here?
+
57:56Well, there's d here
+ and there's p here.
+
57:59But then there's also
+ some additional pointers
+
58:01represented over here.
+
58:02How many?
+
58:04Well, if we assume this
+ magical 50/50 split,
+
58:06there's right now d plus p plus
+ 1 mods over here, half of them.
+
58:12Each of them might be a pointer
+ to some other place, which
+
58:16has a return pointer
+ in that version.
+
58:18So number of back pointers
+ that we need to update
+
58:23is going to be this, 2
+ times d 2 times p plus 1.
+
58:30So recursively update at
+ most 2 times d plus 2 times p
+
58:41plus 1 pointers to the node.
+
58:50The good news is this is
+ really only half of them
+
58:52or some fraction of them.
+
58:54It used to be--
+
58:57well, there were
+ more pointers before.
+
58:59We don't have to
+ deal with these ones.
+
59:00That's where we're
+ saving, and that's
+
59:01why this amortization works.
+
59:03Let me give you a potential
+ function that makes this work--
+
59:12is minus c times sum of the
+ number of empty mod slots.
+
59:23It's kind of the same
+ potential but before
+
59:26we had this notion of
+ dead and alive nodes.
+
59:28Now everything's alive
+ because everything
+
59:30could change at any moment.
+
59:31So instead, I'm going to
+ measure how much room I have
+
59:36in each node.
+
59:37Before I had no
+ room in this node.
+
59:38Now I have half the
+ space in both nodes.
+
59:41So that's good news.
+
59:44Whenever we have
+ this recursion, we
+
59:48can charge it to a
+ potential decrease.
+
59:56Fee goes down by--
+
1:00:01because I have a
+ negative sign here--
+
1:00:03c times, oh man, 2 times
+ d plus p plus 1, I think.
+
1:00:13Because there's d plus
+ p plus 1 space here,
+
1:00:15d plus p plus 1 space here.
+
1:00:17I mean, we added
+ one whole new node.
+
1:00:18And total capacity
+ of a node in mods
+
1:00:20is 2 times d plus p plus 1.
+
1:00:23So we get that times c.
+
1:00:26And this is basically
+ just enough,
+
1:00:28because this is 2 times
+ d plus 2 times p plus 2.
+
1:00:32And here we have a plus 1.
+
1:00:34And so the recursion gets
+ annihilated by 2 times d plus
+
1:00:392 times p plus 1.
+
1:00:41And then there's one
+ c left over to absorb
+
1:00:43whatever constant cost there
+ was to do all this other work.
+
1:00:47So I got the constants
+ just to work,
+
1:00:51except that I cheated and it's
+ really a one third, 2/3 split.
+
1:00:54So probably all of these
+ constants have to change,
+
1:00:57such is life.
+
1:00:58But I think you get the idea.
+
1:01:01Any questions about
+ full persistence?
+
1:01:07This is fun stuff, time travel.
+
1:01:10Yeah?
+
1:01:11AUDIENCE: So in the first
+ half of the thing where
+
1:01:14the if, there's room
+ you can put it in.
+
1:01:16ERIK DEMAINE: Right.
+
1:01:17AUDIENCE: I have a
+ question about how
+
1:01:17we represent the version.
+
1:01:19Because before when we said
+ restore now [INAUDIBLE].
+
1:01:23It made more sense if now was
+ like a timestamp or something.
+
1:01:25ERIK DEMAINE: OK.
+
1:01:26Right, so how do we represent a
+ version even here or anywhere?
+
1:01:31When we do a modification, an
+ update, in the data structure,
+
1:01:34we want to return
+ the new version.
+
1:01:36Basically, we're going
+ to actually store
+
1:01:39the DAG of versions.
+
1:01:41And a version is going to
+ be represented by a pointer
+
1:01:43into this DAG.
+
1:01:44One of the nodes in this
+ DAG becomes a version.
+
1:01:47Every node in this DAG is
+ going to store a pointer
+
1:01:50to the corresponding b character
+ and a corresponding e character
+
1:01:53in this data
+ structure, which then
+
1:01:56lets you do anything you want.
+
1:01:57Then you can query
+ against that version,
+
1:01:59whether it's an ancestor
+ of another version.
+
1:02:01So yeah, I didn't mention that.
+
1:02:02Versions are nodes in here.
+
1:02:04Nodes in here have pointers
+ to the b's and e's.
+
1:02:06And vice versa, the b's
+ and e's have pointers back
+
1:02:08to the corresponding
+ version node.
+
1:02:10And then you can keep
+ track of everything.
+
1:02:12Good question.
+
1:02:14Yeah?
+
1:02:15AUDIENCE: [INAUDIBLE] question.
+
1:02:16Remind me what d is in this.
+
1:02:17ERIK DEMAINE: Oh, d was
+ the maximum out degree.
+
1:02:19It's the number of fields in
+ a node, as defined right here.
+
1:02:26Other questions?
+
1:02:29Whew.
+
1:02:30OK, a little breather.
+
1:02:31That was partial persistence,
+ full persistence.
+
1:02:33This is, unfortunately, the
+ end of the really good results.
+
1:02:36As long as we have
+ constant degree nodes,
+
1:02:38in and out degree,
+ we can do all.
+
1:02:41We can do for
+ persistence for free.
+
1:02:44Obviously there are practical
+ constants involved here.
+
1:02:47But in theory, you
+ can do this perfectly.
+
1:02:53Before we go on to
+ confluence, there
+
1:02:54is one positive result,
+ which is what if you
+
1:02:58don't like amortize bounds.
+
1:03:00There are various reasons
+ amortize bounds might not
+
1:03:02be good.
+
1:03:03Maybe you really care
+ about every operation
+
1:03:04being no slower than it was
+ except by a constant factor.
+
1:03:08We're amortizing here, so some
+ operations get really slow.
+
1:03:11But the others are all
+ fast to compensate.
+
1:03:14You can deamortize, it's called.
+
1:03:22You can get constant
+ worst case slowdown
+
1:03:30for partial persistence.
+
1:03:36This is a result of Garret
+ Brodle from the late '90s, '97.
+
1:03:44For full persistence--
+ so it's an open problem.
+
1:03:47I don't know if people
+ have worked on that.
+
1:03:55All right.
+
1:03:56So some, mostly good results.
+
1:03:59Let's move on to confluent
+ persistence where things
+
1:04:01get a lot more challenging.
+
1:04:17Lots of things go out the window
+ with confluent persistence.
+
1:04:20In particular, your
+ versions are now a DAG.
+
1:04:23It's a lot harder
+ to linearize a DAG.
+
1:04:25Trees are not that
+ far from pads.
+
1:04:28But DAGs are quite far
+ from pads, unfortunately.
+
1:04:33But that's not all
+ that goes wrong.
+
1:04:44Let me first tell you the
+ kind of end effect as a user.
+
1:04:50Imagine you have
+ a data structure.
+
1:04:54Think of it as a
+ list, I guess, which
+
1:04:57is a list of characters
+ in your document.
+
1:04:59You're using vi or Word,
+ your favorite, whatever.
+
1:05:03It's a text editor.
+
1:05:05You've got a string of words.
+
1:05:06And now you like to do
+ things like copy and paste.
+
1:05:09It's a nice operation.
+
1:05:11So you select an interval of
+ the string and you copy it.
+
1:05:16And then you paste
+ it somewhere else.
+
1:05:18So now you've got two
+ copies of that string.
+
1:05:21This is, in some
+ sense, what you might
+
1:05:24call a confluent
+ operation, because--
+
1:05:27yeah, maybe a cleaner way to
+ think of it is the following.
+
1:05:30You have your string.
+
1:05:31Now I have an operation,
+ which is split it.
+
1:05:33So now I have two strings.
+
1:05:35OK.
+
1:05:36And now I have an operation,
+ which is split it.
+
1:05:38Now I have three strings.
+
1:05:40OK.
+
1:05:41Now I have an operation
+ which is concatenate.
+
1:05:44So I can, for
+ example, reconstruct
+
1:05:47the original string-- actually,
+ I have the original string.
+
1:05:49No biggie.
+
1:05:51Let's say-- because
+ I have all versions.
+
1:05:54I never lose them.
+
1:05:55So now instead, I'm going to
+ cut the string here, let's say.
+
1:05:59So now I have this and this.
+
1:06:03And now I can do
+ things like concatenate
+
1:06:06from here to here to here.
+
1:06:10And I will get this
+ plus this plus this.
+
1:06:16OK.
+
1:06:17This guy moved here.
+
1:06:18So that's a copy/paste
+ operation with a constant number
+
1:06:20of splits and concatenates.
+
1:06:22I could also do cut and paste.
+
1:06:23With confluence, I can
+ do crazy cuts and pastes
+
1:06:26in all sorts of ways.
+
1:06:28So what?
+
1:06:29Well, the so what
+ is I can actually
+
1:06:32double the size of
+ my data structure
+
1:06:33in a constant number
+ of operations.
+
1:06:36I can take, for example,
+ the entire string
+
1:06:38and concatenate it to itself.
+
1:06:40That will double the
+ number of characters,
+
1:06:41number of elements in there.
+
1:06:43I can do that again
+ and again and again.
+
1:06:45So in u updates,
+ I can potentially
+
1:06:51get a data structure
+ size 2 to the u.
+
1:06:57Kind of nifty.
+
1:06:58I think this is why
+ confluence is cool.
+
1:07:00It's also why it's hard.
+
1:07:02So not a big surprise.
+
1:07:03But, here we go.
+
1:07:08In that case, the version DAG,
+ for reference, looks like this.
+
1:07:13You're taking the same
+ version, combining it.
+
1:07:16So here I'm assuming I have
+ a concatenate operation.
+
1:07:20And so the effect here,
+ every time I do this,
+
1:07:24I double the size.
+
1:07:44All right.
+
1:07:44What do I want to say about
+ confluent persistence?
+
1:07:46All right.
+
1:07:47Let me start with the
+ most general result, which
+
1:07:53is by Fiat and Kaplan in 2003.
+
1:08:04They define a notion called
+ effective depth of a version.
+
1:08:08Let me just write it down.
+
1:08:21It's kind of like
+ if you took this DAG
+
1:08:24and expanded it out to be a
+ tree of all possible paths.
+
1:08:30Instead of point
+ to the same node,
+
1:08:31you could just
+ duplicate that node
+
1:08:33and then have pointers
+ left and right.
+
1:08:35OK.
+
1:08:35So if I did that, of course,
+ this size grows exponentially.
+
1:08:38It explicitly represents the
+ size of my data structure.
+
1:08:41At the bottom, if
+ I have u things,
+
1:08:42I'm going to have 2 to the
+ u leaves at the bottom.
+
1:08:45But then I can easily
+ measure the number of paths
+
1:08:49from the root to
+ the same version.
+
1:08:50At the bottom, I still
+ label it, oh, those
+
1:08:52are all v. They're all the
+ same version down there.
+
1:08:54So exponential number
+ of paths, if I take log,
+
1:08:56I get what I call
+ effective depth.
+
1:08:58It's like if you somehow
+ could rebalance that tree,
+
1:09:02this is the best you
+ could hope to do.
+
1:09:05It's not really a lower bound.
+
1:09:07But it's a number.
+
1:09:08It's a thing.
+
1:09:09OK.
+
1:09:10Then the result they achieve
+ is that the overhead is
+
1:09:17log the number of
+ updates plus-- this
+
1:09:19is a multiplicative overhead,
+ so you take your running time.
+
1:09:22You multiply it by this.
+
1:09:25And this is a time
+ and a space overhead.
+
1:09:31So maximum effective depth
+ of all versions, maybe even
+
1:09:34sum of effective depths, but
+ we'll just say max to be safe.
+
1:09:39Sorry-- sum over
+ all the operations.
+
1:09:41This is per operation.
+
1:09:43You pay basically
+ the effective depth
+
1:09:44of that operation as a factor.
+
1:09:48Now, the annoying thing is if
+ you have this kind of set up
+
1:09:51where the size
+ grew exponentially,
+
1:09:54then number of paths
+ is exponential.
+
1:09:56Log of the number of
+ paths is linear in u.
+
1:09:59And so this factor could be
+ as much as u, linear slowdown.
+
1:10:06Now, Fiat and Kaplan argue
+ linear slowdown is not
+
1:10:08that bad, because if you weren't
+ even persistent, if you did
+
1:10:13this in the naive way of
+ just recopying the data,
+
1:10:18you were actually spending
+ exponential time to build
+
1:10:21the final data structure.
+
1:10:22It has exponential size.
+
1:10:23Just to represent it explicitly
+ requires exponential time,
+
1:10:26so losing a linear
+ factor to do u operations
+
1:10:29and now u squared time
+ instead of 2 to the u.
+
1:10:31So it's a big
+ improvement to do this.
+
1:10:35The downside of this approach is
+ that even if you have a version
+
1:10:40DAG that looks like this,
+ even if the size of the data
+
1:10:43structure is staying
+ normal, staying linear, so
+
1:10:46this potential, you could
+ be doubling the size.
+
1:10:48But we don't know what
+ this merge operation is.
+
1:10:49Maybe it just throws
+ away one of the versions
+
1:10:51or does something--
+
1:10:53somehow takes half
+ the nodes from one
+
1:10:55side, half the nodes from
+ the other side maybe.
+
1:10:57These operations
+ do preserve size.
+
1:10:58Then there's no great reason why
+ it should be a linear slowdown,
+
1:11:02but it is.
+
1:11:03OK?
+
1:11:04So it's all right but not great.
+
1:11:10And it's the best
+ general result we know.
+
1:11:13They also prove a lower bound.
+
1:11:21So lower bound is some effect
+ of depth, total bits of space.
+
1:11:37OK.
+
1:11:37What does this mean?
+
1:11:40So even if this
+ is not happening,
+
1:11:42the number of bits
+ of space you need
+
1:11:44in the worst case--
+ this does not
+
1:11:45apply to every data structure.
+
1:11:47That's one catch.
+
1:11:49They give a specific
+ data structure
+
1:11:52where you need this much space.
+
1:11:53So it's similar to
+ this kind of picture.
+
1:11:57We'll go into the details.
+
1:11:58And you need this much space.
+
1:12:00Now, this is kind of
+ bad, because if there's
+
1:12:02u operations, and each of these
+ is u, that's u squared space.
+
1:12:06So we actually need a
+ factor u blow up in space.
+
1:12:09It looks like.
+
1:12:11But to be more precise,
+ what this means is
+
1:12:14that you need omega e of
+ v space, and therefore
+
1:12:17time overhead per update, if--
+
1:12:27this is not written
+ in the paper--
+
1:12:29queries are free.
+
1:12:35Implicit here, they just want
+ to slow down and increase space
+
1:12:40for the updates you do,
+ which is pretty natural.
+
1:12:43Normally you think of queries
+ as not increasing space.
+
1:12:46But in order to construct
+ this lower bound,
+
1:12:49they actually do
+ this many queries.
+
1:12:52So they do e of v queries
+ and then one update.
+
1:12:55And they say, oh well, space
+ had to go up by an extra e of v.
+
1:12:59So if you only charge
+ updates for the space,
+
1:13:02then yes, you have
+ to lose potentially
+
1:13:04a linear factor, this effect
+ of death, potentially u.
+
1:13:07But if you also
+ charge the queries,
+
1:13:09it's still constant
+ in their example.
+
1:13:13So open question, for
+ confluent persistence,
+
1:13:18can you achieve
+ constant everything?
+
1:13:21Constant time and
+ space overheads,
+
1:13:27multiplicative
+ factor per operation,
+
1:13:33both updates and queries.
+
1:13:35So if you charge the
+ queries, potentially you
+
1:13:37could get constant everything.
+
1:13:38This is a relatively
+ new realization.
+
1:13:43And no one knows
+ how to do this yet.
+
1:13:47Nice challenge.
+
1:13:47I think maybe we'll work on that
+ in our first problem session.
+
1:13:50I would like to.
+
1:13:53Questions about that result?
+
1:13:54I'm not going to
+ prove the result.
+
1:13:56But it is a fancy
+ rebalancing of those kinds
+
1:13:59of pictures to get this log.
+
1:14:10There are other results
+ I'd like to tell you about.
+
1:14:32So brand new result--
+
1:14:34that was from 2003.
+
1:14:35This is from 2012--
+
1:14:38no, '11, '11, sorry.
+
1:14:42It's SOTO, which is in January,
+ so it's a little confusing.
+
1:14:47Is it '11?
+
1:14:49Maybe '12.
+
1:14:50Actually now I'm not sure.
+
1:14:51It's February already, right?
+
1:14:54A January, either this
+ year or last year.
+
1:15:00It's not as general
+ a transformation.
+
1:15:02It's only going to hold in
+ what's called a disjoint case.
+
1:15:05But it gets a very good bound--
+
1:15:07not quite constant,
+ but logarithmic.
+
1:15:09OK, logarithmic
+ would also be nice.
+
1:15:12Or log, log n, whatever n is.
+
1:15:17Pick your favorite n,
+ number of operations, say.
+
1:15:22OK.
+
1:15:25If you assume that confluent
+ operations are performed only
+
1:15:39on two versions with
+ no shared nodes--
+
1:15:50OK, this would be a way to
+ forbid this kind of behavior
+
1:15:53where I concatenate the
+ data structure with itself.
+
1:15:56All the nodes are common.
+
1:15:58If I guarantee that maybe I, you
+ know, slice this up, slice it,
+
1:16:01dice it, wherever, and
+ then re-emerge them
+
1:16:03in some other order, but
+ I never use two copies
+
1:16:06of the same piece, that
+ would be a valid confluent
+
1:16:10operation over here.
+
1:16:12This is quite a
+ strong restriction
+
1:16:13that you're not allowed.
+
1:16:16If you try to, who
+ knows what happens.
+
1:16:19Behavior's undefined.
+
1:16:19So won't tell you,
+ oh, those two versions
+
1:16:21have this node in common.
+
1:16:22You've got to make
+ a second copy of it.
+
1:16:24So somehow you have to guarantee
+ that control and operations
+
1:16:27never overlap.
+
1:16:29But they can be reordered.
+
1:16:33Then you can get
+ order log n overhead.
+
1:16:39n is the number of operations.
+
1:16:45I have a sketch
+ of a proof of this
+
1:16:46but not very much
+ time to talk about it.
+
1:16:48All right.
+
1:16:49Let me give you a quick picture.
+
1:16:51In general, the
+ versions form a DAG.
+
1:16:55But if you make this assumption,
+ and you look at a single node,
+
1:17:00and look at all the versions
+ where that node appears,
+
1:17:03that is a tree.
+
1:17:05Because you're not allowed
+ to remerge versions
+
1:17:07that have the same node.
+
1:17:08So while the big
+ picture is a DAG,
+
1:17:11the small picture of a
+ single guy is some tree.
+
1:17:17I'm drawing all
+ these wiggly lines
+
1:17:18because there are all
+ these versions where
+
1:17:20the node isn't changing.
+
1:17:21This is the entire version DAG.
+
1:17:23And then some of these nodes--
+
1:17:26some of these versions,
+ I should say--
+
1:17:29that node that we're
+ thinking about changes.
+
1:17:31OK, whenever it
+ branches, it's probably
+
1:17:33because the actual
+ node changed, maybe.
+
1:17:36I don't know.
+
1:17:37Anyway there are some dots
+ here where the version changed,
+
1:17:40some of the leaves,
+ maybe, that changed.
+
1:17:41Maybe some of them haven't yet.
+
1:17:44In fact, let's see.
+
1:17:48Here where it's change, it could
+ be that we destroyed the node.
+
1:17:51Maybe it's gone from the
+ actual data structure.
+
1:17:54But there still may
+ be versions down here.
+
1:17:56It's not really a tree.
+
1:17:57It's a whole DAG of
+ stuff down there.
+
1:17:59So that's kind of ugly.
+
1:18:01Where never the
+ node still exists,
+
1:18:03I guess that is an
+ actual leaf of the DAG.
+
1:18:05So those are OK.
+
1:18:06But as soon as I maybe
+ delete that node,
+
1:18:08then there can be a
+ whole subtree down there.
+
1:18:11OK.
+
1:18:12So now if you look at
+ an arbitrary version,
+
1:18:15so what we're thinking about
+ is how to implement reading,
+
1:18:17let's say.
+
1:18:18Reading and writing are
+ more or less the same.
+
1:18:21I give you a version.
+
1:18:22I give you a node, and
+ I give you a field.
+
1:18:23I want to know, what is the
+ value of that field, that node,
+
1:18:26that version?
+
1:18:27So now where could
+ a version fall?
+
1:18:30Well it has to be
+ in this subtree.
+
1:18:31Because the node has to exist.
+
1:18:36And then it's maybe a pointer.
+
1:18:38A pointer could be to
+ another node, which
+
1:18:42also has this kind of picture.
+
1:18:44They could be overlapping trees.
+
1:18:46In general, there
+ are three cases.
+
1:18:48Either you're lucky, and the
+ version you're talking about
+
1:18:51is a version where
+ the node was changed.
+
1:18:53In that case, the data is
+ just stored right there.
+
1:18:58That's easy.
+
1:18:59So you could just say, oh,
+ how did the node change?
+
1:19:01Oh, that's what the field is.
+
1:19:02OK, follow the pointer.
+
1:19:05A slightly harder
+ case it's a version
+
1:19:08in between two such changes.
+
1:19:09And maybe these are not updates.
+
1:19:11So I sort of want to know, what
+ was the previous version where
+
1:19:17this node changed
+ in constant time?
+
1:19:21It can be done.
+
1:19:22Not constant time,
+ actually, logarithmic time,
+
1:19:25using a data structure
+ called link-cut trees,
+
1:19:28another fun black
+ box for now, which
+
1:19:31we will cover in lecture
+ 19, far in the future.
+
1:19:36OK.
+
1:19:39Well, that's one case.
+
1:19:40There's also the version
+ where maybe a version
+
1:19:43is down here in a subtree.
+
1:19:45I guess then the
+ node didn't exist.
+
1:19:48Well, all these
+ things can happen.
+
1:19:50And that's even harder.
+
1:19:51It's messy.
+
1:19:53They use another trick, which
+ is called fractional cascading,
+
1:19:59which I'm not even going to
+ try to describe what it means.
+
1:20:02But it's got a very cool name.
+
1:20:04Because we'll be
+ covering it in lecture 3.
+
1:20:06So stay tuned for that.
+
1:20:07I'm not going to say how
+ it applies to this setting,
+
1:20:09but it's a necessary
+ step in here.
+
1:20:13In the remaining
+ zero minutes, let
+
1:20:15me tell you a little bit about
+ functional data structures.
+
1:20:17[LAUGHTER]
+
1:20:20Beauty of time travel.
+
1:20:24Functional-- I just
+ want to give you
+
1:20:31some examples of things that
+ can be done functionally.
+
1:20:33There's a whole book about
+ functional data structures
+
1:20:35by Okasaki.
+
1:20:36It's pretty cool.
+
1:20:38A simple example
+ is balanced BSTs.
+
1:20:42So if you just want to get
+ log n time for everything,
+
1:20:44you can do that functionally.
+
1:20:45It's actually really easy.
+
1:20:46You pick your favorite balance
+ BST, like red black trees.
+
1:20:48You implement it top down so you
+ never follow parent pointers.
+
1:20:51So you don't need
+ parent pointers.
+
1:20:52So then as you make changes
+ down the tree, you just copy.
+
1:20:57It's called path copying.
+
1:20:58Whenever you're about
+ to make a change,
+
1:21:00make a copy of that node.
+
1:21:02So you end up copying all
+ the change nodes and all
+
1:21:05their ancestors.
+
1:21:06There's only log n of them,
+ so it takes log n time.
+
1:21:09Clear?
+
1:21:10Easy.
+
1:21:11It's a nice technique.
+
1:21:12Sometimes path copying
+ is very useful.
+
1:21:14Like link-cut
+ trees, for example,
+
1:21:16can be made functional.
+
1:21:17We don't know what they are,
+ but they're basically a BST.
+
1:21:19And you can make
+ them functional.
+
1:21:21We use that in a paper.
+
1:21:23All right.
+
1:21:23Deques.
+
1:21:25These are doubly ended queues.
+
1:21:27So it's like a stack and
+ a queue and everything.
+
1:21:29You can insert and delete from
+ the beginning and the end.
+
1:21:32People start to know
+ what these are now,
+
1:21:34because Python calls him that.
+
1:21:35But you can also
+ do concatenation
+
1:21:41with deques in constant
+ time per operation.
+
1:21:43This is cool.
+
1:21:44Deques are not very
+ hard to make functional.
+
1:21:46But you can do deques and
+ you can concatenate them
+
1:21:48like we were doing in the figure
+ that's right behind this board.
+
1:21:51Constant time split
+ is a little harder.
+
1:21:53That's actually one
+ of my open problems.
+
1:21:56Can you do lists with split and
+ concatenate in constant time--
+
1:22:01functionally or confluently,
+ persistently, or whatever?
+
1:22:05Another example-- oh, you
+ can do a mix of the two.
+
1:22:08You can get log n search in
+ constant time deque operations,
+
1:22:12is you can do tries.
+
1:22:14So a try is a tree
+ with a fixed topology.
+
1:22:17Think of it as a directory tree.
+
1:22:20So maybe you're
+ using Subversion.
+
1:22:21Subversion has time
+ travel operations.
+
1:22:23You can copy an entire
+ subtree from one version
+
1:22:26and stick it into a new
+ version, another version.
+
1:22:30So you get a version DAG.
+
1:22:32It's a confluently
+ persistent data structure--
+
1:22:34not implemented optimally,
+ because we don't necessarily
+
1:22:37know how.
+
1:22:38But there is one paper.
+
1:22:40This actually came from the open
+ problem section of this class
+
1:22:43four years ago, I think.
+
1:22:45It's with Eric Price
+ and Stefan Langerman.
+
1:22:49You can get very good results.
+
1:22:50I won't write them down
+ because it takes a while.
+
1:22:52Basically log the degree
+ of the nodes factor
+
1:22:56and get functional, and
+ you can be even fancier
+
1:22:59and get slightly better
+ bounds like log log the degree
+
1:23:02and get confluently persistent
+ with various tricks,
+
1:23:05including using all of
+ these data structures.
+
1:23:07So if you want to implement
+ subversion optimally,
+
1:23:09that is known how to be done but
+ hasn't actually been done yet.
+
1:23:14Because there are those
+ pesky constant factors.
+
1:23:18I think that's all.
+
1:23:19What is known about functional
+ is there's a log n separation.
+
1:23:23You can be log n
+ away from the best.
+
1:23:26That's the worst
+ separation known,
+
1:23:30between functional and just
+ a regular old data structure.
+
1:23:33It'd be nice to improve that.
+
1:23:34Lots of open problems here.
+
1:23:35Maybe we'll work
+ on them next time.
+
\ No newline at end of file