diff --git "a/data/session_01.html" "b/data/session_01.html" new file mode 100644--- /dev/null +++ "b/data/session_01.html" @@ -0,0 +1,4497 @@ +
+
+ 0:00 + The following content is + provided under a Creative +
+
0:02Commons license.
+
0:04Your support will help + MIT OpenCourseWare
+
0:06continue to offer high quality + educational resources for free.
+
0:10To make a donation or + view additional materials
+
0:13from hundreds of MIT courses, + visit MIT OpenCourseWare
+
0:17at ocw.mit.edu.
+
0:21ERIK DEMAINE: Welcome to 6.851 + Advanced Data Structures.
+
0:25I am Erik Demaine.
+
0:26You can call me Erik.
+
0:28We have two TAs, Tom + Morgan and Justin Zhang.
+
0:31Tom's back there.
+
0:32Justin is late.
+
0:36And this class is + about all kinds
+
0:38of very cool data structures.
+
0:40You should have already + seen basic data structures
+
0:42like balance binary + search trees and things
+
0:45like that, log n + time to do wherever
+
0:48you want in one dimension.
+
0:50And here we're going to turn + all those data structures
+
0:52on their head and + consider them in all sorts
+
0:54of different models and + additional cool problems.
+
0:56Today we're going to talk about + time travel or temporal data
+
0:59structures, where + you're manipulating time
+
1:02as any good time + traveler should.
+
1:05Then we'll do geometry where + we have higher dimensional
+
1:07data, more than one dimension.
+
1:09Then we'll look at + a problem called
+
1:11dynamic optimality, which is, + is there one best binary search
+
1:14tree that rules them all?
+
1:16Then we'll look at something + called memory hierarchy, which
+
1:19is a way to model more realistic + computers which have cache
+
1:22and then more cache + and then main memory
+
1:24and then dish and all + these different levels.
+
1:26How do you optimize for that?
+
1:28Hashing is probably the most + famous, and most popular,
+
1:31most used data structure + in computer science.
+
1:33We'll do a little bit on that.
+
1:35Integers, when you know that + your data is integers and not
+
1:39just arbitrary black boxes that + you can compare or do whatever,
+
1:42you can do a lot + better with integers.
+
1:43You usually beat log n time.
+
1:45Often you can get constant time.
+
1:47For example, if you want + to do priority queues,
+
1:49you can do square + root log log n time.
+
1:51That's the best + known randomized.
+
1:55Dynamic graphs, you have + a graph you want to store,
+
1:58and the edges are being added + and maybe deleted, like you're
+
2:01representing a social network.
+
2:02And people are friending + and de-friending.
+
2:04You want to maintain some + interesting information
+
2:06about that graph.
+
2:07Strings, you have + a piece of text,
+
2:10such as the entire + worldwide web.
+
2:12And you want to search + for a substring.
+
2:14How do you do that efficiently?
+
2:15It's sort of the Google problem.
+
2:17Or you searching through + DNA for patterns, whenever.
+
2:20And finally succinct + data structures,
+
2:22which is all about + taking what we normally
+
2:24consider optimal + space or n space
+
2:26and reducing it down to the very + bare minimum of bits of space.
+
2:30Usually if you want + to store something
+
2:32where there's 2 to + the n possibilities,
+
2:34you want to get away + with n bits of space,
+
2:36maybe plus square root of + n or something very tiny.
+
2:40So that's the sync + data structures.
+
2:42So that's an overview + of the entire class.
+
2:44And these are sort of the + sections we'll be following.
+
2:47Let me give you a quick + administrative overview
+
2:50of what we're doing.
+
2:54Requirements for the class--
+
2:55I guess, first, + attending lecture.
+
2:58Obviously if you + don't attend lecture,
+
2:59there'll be videos online.
+
3:00So that's resolvable.
+
3:02But let me know if you're + not going to make it.
+
3:05We're going to have problems + sets roughly every week.
+
3:07If you're taking the + class for credit,
+
3:09they have a very simple rule + of one page in, one page out.
+
3:12This is more constraint on + us to write problems that
+
3:15have easy or short answers.
+
3:16You probably need + to think about them
+
3:18a little bit before they're + transparent, but then easy
+
3:21to write up.
+
3:23And then scribing lectures-- + so we have a scribe for today,
+
3:26I hope.
+
3:27Here?
+
3:28Yes, good.
+
3:30So most of the + lectures have already
+
3:32been scribed in some + version, and your goal
+
3:34is to revise that scribe + notes that if you don't like
+
3:38handwritten notes, which + are also online, then easier
+
3:42for people to read.
+
3:44Let's see.
+
3:45Listeners welcome.
+
3:46We're going to have an + open problem session.
+
3:48I really like open problems.
+
3:49I really like solving + open problems.
+
3:50So we've done this every time + this class has been offered.
+
3:53So if you're interested in + also solving open problems,
+
3:55it's optional.
+
3:56I will organize-- in + a couple of weeks,
+
3:59we'll have a weekly + open problem session
+
4:03and try to solve + all the things that
+
4:06push the frontier of + advanced data structures.
+
4:09So in classes, we'll see + the state of the art.
+
4:11And then we'll change the state + of the art in those sessions.
+
4:15I think that's it.
+
4:16Any questions about + the class before we
+
4:18get into the fun stuff?
+
4:22All right.
+
4:23Let's do some time traveling.
+
4:27Before I get to time + traveling, though, I
+
4:28need to define our + model of computation.
+
4:32A theme in this class is that + the model of computation you're
+
4:35working with matters.
+
4:36Models matter.
+
4:38And there's lots of different + models of computation.
+
4:40We'll see a few of the + main ones in this class.
+
4:45And the starting + point, and the one
+
4:48we'll be using throughout today, + is called a pointer machine.
+
4:53It's an old one from the '80s.
+
4:57And it corresponds to + what you might think about
+
4:59if you've done a lot of + object-oriented programming,
+
5:02and before that, + structure-oriented programming,
+
5:04I guess.
+
5:05So you have a bunch of nodes.
+
5:08They have some fields in them, + a constant number of fields.
+
5:13You can think of these + as objects or strucs
+
5:16in c It used to be records + back in Pascal days,
+
5:20so a lot of the papers + call them records.
+
5:22You could just have a + constant number of fields.
+
5:24You could think of + those numbered, labeled.
+
5:25It doesn't really matter + because there's only
+
5:27a constant number of them.
+
5:28Each of the fields could be + a pointer to another node,
+
5:32could be a null pointer, or + could have some data in it.
+
5:35So I'll just assume that + all my data is integers.
+
5:43You can have a + pointer to yourself.
+
5:45You can have a pointer over + here, whatever you want.
+
5:48A pointer machine would + look something like this.
+
5:52In any moment, this is the + state of the pointer machine.
+
5:55So you think this as the memory + of your computer storing.
+
5:59And then you have + some operations
+
6:02that you're allowed to do.
+
6:03That's the computation + part of the model.
+
6:07You can think of this + as the memory model.
+
6:10What you're allowed to + do are create nodes.
+
6:13You can say something + like, x equals new node.
+
6:19You can, I don't + know, look at fields.
+
6:24You can do x equals y.field.
+
6:28You can set fields, + x.field equals y.
+
6:33You can compute on these + data, so you can add 5 and 7,
+
6:37do things like that.
+
6:39I'm not going to worry about--
+
6:40I'll just write et cetera.
+
6:43This is more a model + about how everything's
+
6:45organized in memory, not so + much about what you're allowed
+
6:48to do to the data items.
+
6:49In this lecture, it + won't matter what
+
6:50you're doing to the data items.
+
6:52We never touch them.
+
6:53We just copy them around.
+
6:56So am I missing anything?
+
6:59Probably.
+
7:01I guess you could destroy + nodes if you felt like it.
+
7:03But we won't have + to today, because we
+
7:06don't want to + throw anything away
+
7:07when you're time traveling.
+
7:09It's too dangerous.
+
7:12And then the one catch + here is, what are x and y?
+
7:18There's going to be one + node in this data structure
+
7:21or in your memory + called the root node.
+
7:24And you could think of that + as that's the thing you always
+
7:26have in your head.
+
7:27This is like your + cache, if you will.
+
7:29It's just got a constant + number of things, just
+
7:31like any other node.
+
7:32And x and y are + fields of the root.
+
7:40So that sort of + ties things down.
+
7:42You're always working + relative to the root.
+
7:45But you can look at the data, + basically follow this pointer,
+
7:50by looking at the field.
+
7:53You could set one + of these pointers--
+
7:55I think I probably need + another operation here,
+
7:58like x equals y.field1, + field2, that sort of thing,
+
8:03and maybe the reverse.
+
8:06But you can manipulate + all nodes sort
+
8:09of via the root is the idea.
+
8:10You follow pointers, + do whatever.
+
8:12So pretty obvious, + slightly annoying
+
8:14to write down formally.
+
8:16But that is pointer machine.
+
8:23And what we're going to + be talking about today
+
8:26in time travel is + suppose someone
+
8:28gives me a pointer machine + data structure, for example,
+
8:31balanced binary search + tree, linked list.
+
8:33A lot of data structures, + especially classic data
+
8:36structures, follow + pointer machine model.
+
8:39What we'd like to do is + transform that data structure
+
8:41or make a new + pointer machine data
+
8:42structure that does + extra cool things,
+
8:45namely travel through time.
+
8:47So that's what + we're going to do.
+
8:53There's two senses of time + travel or temporal data
+
8:57structures that we're going + to cover in this class.
+
9:02The one for today is + called persistence,
+
9:05where you don't forget + anything, like an elephant.
+
9:08And the other one + is retroactivity.
+
9:15Persistence will be today.
+
9:16Retroactivity is next class.
+
9:19Basically, these correspond + to two models of time travel.
+
9:21Persistence is the branching + universe time travel model,
+
9:24where if you make a + change in the past,
+
9:25you get a new universe.
+
9:27You never destroy old universes.
+
9:29Retroactivity is more + like Back to the Future,
+
9:33when you go back, make + a change, and then you
+
9:35can return to the present + and see what happened.
+
9:37This is a lot harder to do.
+
9:39And we'll work on + that next class.
+
9:42Persistence is what + we will do today.
+
9:46So persistence.
+
9:57The general idea of persistence + is to remember everything--
+
10:01the general goal, I would say.
+
10:05And by everything, I + mean different versions
+
10:07of the data structure.
+
10:08So you're doing data + structures in general.
+
10:11We have update operations + and query operations.
+
10:14We're mainly concerned + about updates here.
+
10:16Every time you do an + update, you think of it
+
10:18as taking a version of the data + structure and making a new one.
+
10:21And you never want to + destroy old versions.
+
10:23So even though an update + like an insert or something
+
10:26changes the data structure, we + want to remember that past data
+
10:29as well.
+
10:30And then let's make + this reasonable.
+
10:34All data structure operations + are relative to a specified
+
10:37version.
+
10:47So an update makes and + returns a new version.
+
11:05So when you do an + insert, you specify
+
11:08a version of your data + structure and the thing
+
11:10you want to insert.
+
11:11And the output is a new version.
+
11:13So then you could insert into + that new version, keep going,
+
11:16or maybe go back to the + old version, modify that.
+
11:20I haven't said exactly + what's allowed here,
+
11:22but this is sort of + the general goal.
+
11:25And then there are four + levels of persistence
+
11:30that you might want to get.
+
11:33First level is called + partial persistence.
+
11:37This is the easiest to obtain.
+
11:44And in partial + persistence, you're
+
11:47only allowed to update + the latest version, which
+
11:55means the versions + are linearly ordered.
+
12:01This is the easiest + to think about.
+
12:03And time travel can easily get + confusing, so start simple.
+
12:09We have a timeline of + various versions on it.
+
12:14This is the latest.
+
12:17And what we can do is + update that version.
+
12:19We'll get a new version, and + then our latest is this one.
+
12:23What this allows is looking back + at the past to an old version
+
12:27and querying that version.
+
12:29So you can still ask questions + about the old version,
+
12:31if you want to be able to do + a search on any of these data
+
12:33structures.
+
12:34But you can't change them.
+
12:35You can only change the + most recent version.
+
12:38So that's nice.
+
12:39It's kind of like time + machine on Mac, I guess.
+
12:44If you've ever seen the + movie Deja Vu, which is not
+
12:46very common, but it's + a good time travel
+
12:48movie, in the first half of + the movie, all they can do
+
12:51is look back at the past.
+
12:52Later they discover + that actually they
+
12:54have a full persistence model.
+
12:57It takes a while + for dramatic effect.
+
13:04In full persistence, you can + update anything you want--
+
13:08so update any version.
+
13:18and so then the + versions form a tree.
+
13:27OK.
+
13:28So in this model, + maybe you initially
+
13:30have a nice line of versions.
+
13:32But now if I go back to + this version and update it,
+
13:34I branch, get a + new version here.
+
13:37And then I might keep modifying + that version sometimes.
+
13:40Any of these guys can branch.
+
13:44So this is why I call it the + branching universe model, when
+
13:47you update your branch.
+
13:52So no version ever + gets destroyed here.
+
13:54Again, you can + query all versions.
+
13:56But now you can also + update any version.
+
13:59But you just make a new version.
+
14:00It's a totally new world.
+
14:02When I update this + version, this version
+
14:04knows nothing about all the--
+
14:06this doesn't know + about this future.
+
14:07It's created its own future.
+
14:10There's no way to sort of + merge those universes together.
+
14:14It's kind of sad.
+
14:16That's why we have the + third level of persistence,
+
14:22which lets us merge timelines.
+
14:24It's great for lots + of fiction out there.
+
14:35If you've seen the + old TV show Sliders,
+
14:38that would be + confluent persistence.
+
14:50So confluent persistence, + you can combine two versions
+
15:01to create a new version.
+
15:09And in this case, again, you + can't destroy old versions.
+
15:13In persistence, you + never destroy versions.
+
15:16So now the versions form a + DAG, directed acyclic graph.
+
15:22So now we're allowing--
+
15:24OK, you make some + changes, whatever.
+
15:25You branch your universe, + make some changes.
+
15:30And now I can say, OK, take + this version of the data
+
15:32structure and this version + and recombine them.
+
15:35Get a new version, and then + maybe make some more changes.
+
15:38OK, what does combine mean?
+
15:40Well, it depends on + your data structure.
+
15:42A lot of data structures + have combine operations
+
15:44like if you have linked lists, + you have two linked lists,
+
15:48you can concatenate them.
+
15:49That's an easy operation.
+
15:50Even if you have + binary search trees,
+
15:51you can concatenate + them reasonably easy
+
15:53and combine it into one + big binary search tree.
+
15:56So if your data structure + has an operation that
+
15:59takes as input two + data structures,
+
16:01then what we're saying is now + it can take two versions, which
+
16:05is more general.
+
16:06So I could take the + same data structure,
+
16:08make some changes in + one way, separately make
+
16:10some changes in a + different way, and then
+
16:12try to concatenate them + or do something crazy.
+
16:14This is hard to + do, and most of it
+
16:16is an open problem + whether it can be done.
+
16:19But I'll tell you about it.
+
16:21Then there's another level + even more than confluent
+
16:24persistence.
+
16:26This is hard to interpret + in the time travel world,
+
16:30but it would be functional + data structures.
+
16:32If you've ever programmed + in a functional programming
+
16:34language, it's a little bit + annoying from an algorithm's
+
16:36perspective, because it + constrains you to work
+
16:39in a purely functional world.
+
16:43You can never modify anything.
+
16:45OK.
+
16:46Now, we don't want + to modify versions.
+
16:49That's fine.
+
16:49But in a functional + data structure,
+
16:51you're not allowed to + modify any nodes ever.
+
16:54All you can do is + make new notes.
+
17:03This is constraining, + and you can't always
+
17:07get optimal running times + in the functional world.
+
17:09But if you can get a + functional data structure,
+
17:11you have all these + things, because you
+
17:13can't destroy anything.
+
17:14If you can't destroy + nodes, then in particular,
+
17:16you can't destroy versions.
+
17:17And all of these things + just work for free.
+
17:19And so a bunch of + special cases are known,
+
17:22interesting special + cases, like search trees
+
17:24you can do in the + functional world.
+
17:26And that makes all + of these things easy.
+
17:29So the rest of this + lecture is going
+
17:30to be general techniques for + doing partial full persistence,
+
17:34what we know about + confluent, and what
+
17:36we know about functional, + brief overview.
+
17:41Any questions about those + goals, problem definitions?
+
17:47Yeah.
+
17:48AUDIENCE: I'm still confused + about functional, because--
+
17:50ERIK DEMAINE: What + does functional mean?
+
17:52AUDIENCE: [INAUDIBLE]
+
17:55ERIK DEMAINE: Yeah, I + guess you'll see what--
+
17:57functional looks like all + the other things, I agree.
+
18:00You'll see in a moment + how we actually implement
+
18:02partial and persistence.
+
18:03We're going to be + changing nodes a lot.
+
18:07As long as we still + represent the same data
+
18:10in the old versions, we + don't have to represent it
+
18:13in the same way.
+
18:14That lets us do things + more efficiently.
+
18:15Whereas in functional, + you have to represent
+
18:17all the old versions + in exactly the way
+
18:19you used to represent them.
+
18:20Here we can kind of + mangle things around
+
18:22and it makes things faster.
+
18:23Yeah, good question.
+
18:25So it seems almost the same, + but it's nodes versus versions.
+
18:29I haven't really + defined a version.
+
18:31But it's just that all the + queries answer the same way.
+
18:34That's what you need + for persistence.
+
18:38Other questions?
+
18:40All right.
+
18:43Well, let's do some + real data structures.
+
18:49We start with + partial persistence.
+
18:55This is the easiest.
+
18:59For both partial and + full persistence,
+
19:02there is the following result. + Any pointer machine data
+
19:06structure, one catch with a + constant number of pointers
+
19:19to any node--
+
19:22so this is constant n degree.
+
19:27In a pointer machine, you + always have a constant number
+
19:30of pointers out + of a node at most.
+
19:32But for this result + to hold, we also
+
19:34need a constant number of + pointers into any node.
+
19:37So this is an extra constraint.
+
19:42Can be transformed into + another data structure that
+
19:47is partially persistent and does + all the things it used to do--
+
19:53so I'll just say, can be + made partially persistent.
+
20:00You have to pay something, but + you have to pay very little--
+
20:03constant amortized + factor overhead,
+
20:12multiplicative overhead + and constant amount
+
20:21of additive space per change + in the data structure.
+
20:30So every time you do a + modification in your pointer
+
20:33machine-- you set one of + the fields to something--
+
20:36you have to store that forever.
+
20:37So, I mean, this is the + best you could hope to do.
+
20:39You've got to store + everything that happened.
+
20:43You pay a constant + factor overhead, eh.
+
20:45We're theoreticians.
+
20:46That doesn't matter.
+
20:48Then you get any data + structure in this world
+
20:50can be made + partially persistent.
+
20:53That's nice.
+
20:54Let's prove it.
+
21:00OK, the idea is pretty simple.
+
21:04Pointer machines are all + about nodes and fields.
+
21:06So we just need to simulate + whatever the data structure is
+
21:09doing to those nodes + and fields in a way
+
21:11that we don't lose + all the information
+
21:13and we can still + search it very quickly.
+
21:17First idea is to + store back pointers.
+
21:21And this is why we need the + constant n degree constraint.
+
21:27So if we have a node--
+
21:31how do I want to + draw a node here?
+
21:35So maybe these are the + three fields of the node.
+
21:38I want to also store + some back pointers.
+
21:42Whenever there is a node + that points to this node,
+
21:48I want to have a + back pointer that
+
21:50points back so I know where + all the pointers came from.
+
21:54If there's only p pointers, + then this is fine.
+
21:57There'll be p fields here.
+
22:00So still constant, still in + the pointier machine model.
+
22:03OK, I'm going to need + some other stuff too.
+
22:08So this is a simple thing, + definitely want this.
+
22:11Because if my nodes + ever move around,
+
22:13I've got to update + the pointers to them.
+
22:15And where are those pointers?
+
22:16Well, the back pointers + tell you where they are.
+
22:20Nodes will still + be constant size,
+
22:22remain in pointer + machine data structure.
+
22:25OK.
+
22:26That's idea one.
+
22:28Idea two is this part.
+
22:35This is going to store + something called mods.
+
22:39It could stand for something, + but I'll leave it as mods.
+
22:44So these are two of the + fields of the data structure.
+
22:56Ah, one convenience here + is for back pointers,
+
23:01I'm only going to store it for + the latest version of the data
+
23:04structure.
+
23:16Sorry.
+
23:16I forgot about that.
+
23:19We'll come back to that later.
+
23:21And then the idea is to + store these modifications.
+
23:23How many modifications?
+
23:25Let's say up to p, twice p.
+
23:34p was the bound on the + n degree of a node.
+
23:38So I'm going to allow 2p + modifications over here.
+
23:43And what's a + modification look like?
+
23:48It's going to consist + of three things--
+
23:50get them in the right order--
+
23:53the version in which + something was changed,
+
23:56the field that got changed, + and the value it go changed to.
+
24:02So the idea is that these + are the fields here.
+
24:07We're not going to touch those.
+
24:09Once they're set to something--
+
24:11or, I mean, whatever + they are initially,
+
24:13they will stay that way.
+
24:15And so instead of actually + changing things like the data
+
24:18structure normally + would, we're just
+
24:19going to add modifications + here to say, oh,
+
24:21well at this time, this field + changed to the value of 5.
+
24:25And then later on, it + changed to the value 7.
+
24:27And then later on, this one + changed to the value 23,
+
24:31whatever.
+
24:32So that's what + they'll look like.
+
24:36There's a limit to how many--
+
24:37we can only store a constant + number of mods to each node.
+
24:40And our constant will be 2p.
+
24:44OK.
+
24:45Those are the + ideas, and now it's
+
24:46just a matter of + making this all work
+
24:47and analyzing that it's + constant amortized overhead.
+
24:53So first thing is if you + want to read a field,
+
25:06how would I read a field?
+
25:08This is really easy.
+
25:11First you look at what the + field is in the node itself.
+
25:15But then it might + have been changed.
+
25:17And so remember when + I say read the field,
+
25:19I actually mean while + I'm given some version,
+
25:21v, I want to know what is the + value of this field at version
+
25:24v, because I want to be able + to look at any of the old data
+
25:28structures too.
+
25:29So this would be at + version v. I just
+
25:37look through all + the modifications.
+
25:38There's constantly many, + so it takes constant time
+
25:40to just flip through them + and say, well, what changes
+
25:43have happened up to version v?
+
25:46So I look at mods + with version less than
+
25:56or equal to v. That will be all + the changes that happened up
+
25:59to this point.
+
26:00I see, did this field change?
+
26:02I look at the latest one.
+
26:03That will be how I read + the field of the node, so
+
26:07constant time.
+
26:08There's lots of ways to make + this efficient in practice.
+
26:10But for our purposes, + it doesn't matter.
+
26:13It's constant.
+
26:16The hard part is how + do you change a field?
+
26:18Because there might not be + any room in the mod structure.
+
26:35So to modify, say we want to + set node.field equal to x.
+
26:47What we do is first + we check, is there
+
26:52any space in the mod structure?
+
26:54If there's any blank mods, + so if the node is not full,
+
27:03we just add a mod.
+
27:06So a mod will look + like now field x.
+
27:13Just throw that in there.
+
27:15Because right at this moment--
+
27:17so we maintain a time + counter, just increment
+
27:19it ever time we do a change.
+
27:21This field changed that value.
+
27:23So that's the easy case.
+
27:24The trouble, of course, + is if the node is full--
+
27:29the moment you've + all been waiting for.
+
27:31So what we're going to do + here is make a new node.
+
27:33We've ran out of space.
+
27:34So we need to make a new node.
+
27:36We're not going to touch the old + node, just going to let it sit.
+
27:38It still maintains all + those old versions.
+
27:40Now we want a new node + that represents the latest
+
27:43and greatest of this node.
+
27:44OK.
+
27:45So make a new node.
+
27:51I'll call it node prime to + distinguish from node, where
+
27:57with all the mods, and this + modification in particular,
+
28:04applied.
+
28:07OK, so we make a new + version of this node.
+
28:11It's going to have some + different fields, whatever
+
28:15was the latest version + represented by those mods.
+
28:18It's still going to + have back pointers,
+
28:20so we have to maintain + all those back pointers.
+
28:24And now the mod, + initially, is going
+
28:26to be empty, because we + just applied them all.
+
28:29So this new node doesn't + have any recent mods.
+
28:32Old node represents + the old versions.
+
28:34This node is going to + represent the new versions.
+
28:37What's wrong with this picture?
+
28:39AUDIENCE: Update pointers.
+
28:40ERIK DEMAINE: Update pointers.
+
28:42Yeah, there's pointers + to the old version
+
28:43of the node, which are fine for + the old versions of the data
+
28:47structure.
+
28:48But for the latest version + of the data structure,
+
28:50this node has moved + to this new location.
+
28:53So if there are any old + pointers to that node,
+
28:56we've got to update them + in the current version.
+
28:58We have to update them to + point to this node instead.
+
29:00The old versions are fine, but + the new version is in trouble.
+
29:04Other questions or + all the same answer?
+
29:06Yeah.
+
29:06AUDIENCE: So if you wanted + to read an old version
+
29:10but you just have the + new version, [INAUDIBLE]?
+
29:15ERIK DEMAINE: OK--
+
29:16AUDIENCE: [INAUDIBLE]
+
29:17ERIK DEMAINE: The + question is essentially,
+
29:19how do we hold on to versions?
+
29:22Essentially, you can think of + a version of the data structure
+
29:24as where the root node is.
+
29:26That's probably the easiest.
+
29:27I mean, in general, we're + representing versions
+
29:29by a number, v. But we + always start at the root.
+
29:33And so you've given + the data structure,
+
29:35which is represented + by the root node.
+
29:36And you say, search + for the value 5.
+
29:40Is it in this binary + search tree or whatever?
+
29:43And then you just start + navigating from the root,
+
29:45but you know I'm inversion + a million or whatever.
+
29:49I know what version + I'm looking for.
+
29:51So you start with the root, + which never changes, let's say.
+
29:56And then you follow + pointers that
+
29:58essentially tell + you for that version
+
30:00where you should be going.
+
30:01I guess at the root version, + it's a little trickier.
+
30:03You probably want a little array + that says for this version,
+
30:07here's the root node.
+
30:08But that's a special case.
+
30:11Yeah.
+
30:11Another question?
+
30:13AUDIENCE: So on the + new node that you
+
30:15created, the fields that you + copied, you also have to have
+
30:19a version for them, right?
+
30:20Because [INAUDIBLE]?
+
30:26ERIK DEMAINE: These--
+
30:27AUDIENCE: Or do you + version the whole node?
+
30:30ERIK DEMAINE: Here we're + versioning the whole node.
+
30:32The original field + values represent
+
30:34what was originally there, + whenever this node was created.
+
30:37Then the mods specify what + time the fields change.
+
30:40So I don't think + we need times here.
+
30:44All right.
+
30:45So we've got to update + two kinds of pointers.
+
30:47There's regular + pointers, which live
+
30:48in the fields, which are + things pointing to the node.
+
30:52But then there's + also back pointers.
+
30:53Because if this is + a pointer to a node,
+
30:55then there'll be a back + pointer back to the node.
+
30:57And all of those have to change.
+
31:00Conveniently, the back + pointers are easy.
+
31:11So if they're back + pointers to the node,
+
31:13we change them to + the node prime.
+
31:14How do we find + the back pointers?
+
31:16Well, we just follow + all the pointers
+
31:17and then there will be + back pointers there.
+
31:21Because I said we're + only maintaining
+
31:23backed pointers for + the latest version,
+
31:25I don't need to preserve + the old versions
+
31:28of those backed pointers.
+
31:29So I just go in + and I change them.
+
31:31It takes constant time, + because the constant number
+
31:33of things I point to, each + one as a back pointer.
+
31:35So this is cheap.
+
31:37There's no persistence here.
+
31:39That's an advantage of + partial persistence.
+
31:41The hard part is + updating the pointers
+
31:44because those live in fields.
+
31:45I need to remember the old + versions of those fields.
+
31:47And that we do recursively.
+
31:58Because to change + those pointers,
+
32:00that's a field update.
+
32:01That's something + exactly of this form.
+
32:02So that's the same operation + but on a different node.
+
32:05So I just do that.
+
32:07I claim this is good.
+
32:08That's the end of the algorithm.
+
32:11Now we need to analyze it.
+
32:24How do we analyze it?
+
32:25Any guesses?
+
32:29AUDIENCE: Amortize it.
+
32:30ERIK DEMAINE: Amortized + analysis, exactly
+
32:32the answer I was looking for.
+
32:33OK.
+
32:34[INAUDIBLE] amortization.
+
32:36The most powerful + technique in amortization
+
32:38is probably the + potential method.
+
32:40So we're going to use that.
+
32:42There's a sort of more--
+
32:44you'll see a charging + argument in a moment.
+
32:50We want the potential function + to represent when this data
+
32:53structure is in a bad state.
+
32:55Intuitively, it's in a bad state + when a lot of nodes are full.
+
32:58Because then as soon as + you make a change in them,
+
33:00they will burst, and you have + to do all this crazy recursion
+
33:03and stuff.
+
33:04This case is nice and cheap.
+
33:05We just add a modification, + constant time.
+
33:08This case, not so nice + because we recurse.
+
33:10And then that's going + to cause more recursions
+
33:12and all sorts of + chaos could happen.
+
33:16So there's probably a few + different potential functions
+
33:20that would work here.
+
33:21And an old version + of these nodes I said
+
33:23should be the number + of full nodes.
+
33:25But I think we can make + life a little bit easier
+
33:27by the following.
+
33:32Basically, the total + number of modifications--
+
33:36not quite the total, + almost the total.
+
33:39So I'm going to do c times + the sum of the number of mods
+
33:49in latest version nodes.
+
33:59OK.
+
34:00So because we sort + of really only
+
34:02care about-- we're only + changing the latest version,
+
34:05so I really only + care about nodes that
+
34:07live in the latest version.
+
34:08What do I mean by this?
+
34:09Well, when I made + this new node prime,
+
34:11this becomes the new + representation of that node.
+
34:14The old version is dead.
+
34:15We will never change it again.
+
34:18If we're modifying, we will + never even look at it again.
+
34:21Because now everything + points to here.
+
34:24So I don't really + care about that node.
+
34:26It's got a ton of mods.
+
34:27But what's nice is that when + I create this new node, now
+
34:30the mod list is empty.
+
34:31So I start from scratch, + just like reinstalling
+
34:33your operating system.
+
34:34It's a good feeling.
+
34:38And so the potential goes down + by, I guess, c times 2 times p.
+
34:45When I do this change, potential + goes down by basically p.
+
34:49AUDIENCE: Is c any constant or--
+
34:52ERIK DEMAINE: c will be a + constant to be determined.
+
34:55I mean, it could be 1.
+
34:57It depends how you + want to define it.
+
34:58I'm going to use the CLRS + notion of amortized cost, which
+
35:02is actual cost plus + change in potential.
+
35:06And then I need a + constant here, because I'm
+
35:08measuring a running time versus + some combinatorial quantity.
+
35:12So this will be to match the + running time that we'll get to.
+
35:17OK.
+
35:17So what is amortized cost?
+
35:22There's sort of two + cases modification.
+
35:24There's the cheap case + and the not so cheap case.
+
35:28In general, amortized cost--
+
35:34in both cases, it's + going to be at most--
+
35:37well, first of all, we + do some constant work
+
35:39just to figure out all this + stuff, make copies, whatever.
+
35:44So that's some constant time.
+
35:49That's the part that I don't + want to try to measure.
+
35:52Then potentially, + we add a new mod.
+
35:55If we add a mod, that + increases the potential by c.
+
35:59Because we're just counting + mods, multiplying by c.
+
36:02So we might get plus 1 mod.
+
36:04This is going to + be an upper bound.
+
36:06We don't always add 1, but + worst case, we always had 1,
+
36:09let's say.
+
36:11And then there's + this annoying part.
+
36:14And this might happen, + might not happen.
+
36:16So then there's a plus maybe.
+
36:20If this happens, we + decrease the potential
+
36:23because we empty out the + mods for that node in terms
+
36:26of the latest version.
+
36:27So then we get a negative + 2cp, change in potential.
+
36:34And then we'd have to pay + I guess up to p recursions.
+
36:49Because we have to--
+
36:51how many pointers + are there to me?
+
36:53Well, at most p of them, because + there are at most p pointers
+
36:58to any node.
+
37:02OK.
+
37:03This is kind of a weird--
+
37:05it's not exactly algebra here.
+
37:06I have this thing, recursions.
+
37:09But if you think about + how this would expand,
+
37:11all right, this + is constant time.
+
37:13That's good.
+
37:14And then if we do this--
+
37:15I'll put a question mark here.
+
37:16It might be here.
+
37:16It might not.
+
37:18If it's not here, find constant.
+
37:19If it is here, then this gets + expanded into this thing.
+
37:24It's a weird way to + write a recurrence.
+
37:26But we get p times whatever + is in this right hand side.
+
37:30OK.
+
37:31But then there's this minus 2cp.
+
37:33So we're going to + get p times 2c here.
+
37:36That's the initial cost.
+
37:37So that will cancel with this.
+
37:40And then we might get + another recursion.
+
37:41But every time we get a + recursion, all the terms
+
37:43cancel.
+
37:44So it doesn't matter + whether this is here or not.
+
37:46You get 0, which is great.
+
37:49And you're left with + the original 2c.
+
37:53Constant.
+
37:55OK.
+
37:56[INAUDIBLE] potential functions + are always a little crazy.
+
37:59What's happening here is + that, OK, maybe you add a mod.
+
38:03That's cheap.
+
38:05But when we have to do this + work and we have to do this
+
38:08recursion-- this is up to + 2p updates or recursions--
+
38:14we are charging it to the + emptying of this node.
+
38:17The number of mods + went from 2p down to 0.
+
38:21And so we're just + charging this update cost
+
38:22to that modification.
+
38:24So if you like charging schemes, + this is much more intuitive.
+
38:26But with charging schemes, + it's always a little careful.
+
38:28You have to make sure + you're not double charging.
+
38:30Here it's obvious that + you're not double charging.
+
38:34Kind of a cool and magical.
+
38:37This is a paper by + Driscoll, Sarnak, Sleator,
+
38:42Tarjan from 1989.
+
38:43So it's very early + days of amortization.
+
38:45But they knew how to do it.
+
38:47Question?
+
38:48AUDIENCE: [INAUDIBLE]
+
38:50ERIK DEMAINE: What happens + if you overflow the root?
+
38:52Yeah, I never thought about + the root before today.
+
38:54But I think the way + to fix the root is
+
38:57just you have one big table + that says, for a given version--
+
39:02I guess a simple + way would be to say,
+
39:04not only is a version + a number, but it's also
+
39:06a pointer to the root.
+
39:07There we go.
+
39:07Pointer machine.
+
39:09So that way you're just + always explicitly maintaining
+
39:11the root copy or the pointer.
+
39:15Because otherwise, + you're in trouble.
+
39:18AUDIENCE: Then can you + go back to [INAUDIBLE].
+
39:21ERIK DEMAINE: So in order + to refer to an old version,
+
39:24you have to have the + pointer to that root node.
+
39:26If you want to do it just + from a version number,
+
39:29look at the data structure.
+
39:30Just from a version + number, you would
+
39:31need some kind of + lookup table, which
+
39:33is outside the pointer machine.
+
39:34So you could do it + in a real computer,
+
39:36but a pointer machine is + not technically allowed.
+
39:39So it's slightly awkward.
+
39:40No arrays are allowed + in pointer machines,
+
39:42in case that wasn't clear.
+
39:43Another question?
+
39:44AUDIENCE: [INAUDIBLE] constant + space to store for [INAUDIBLE].
+
39:48And also, what if we have + really big numbers [INAUDIBLE]?
+
39:54ERIK DEMAINE: In this model, + in the pointer machine model,
+
39:56we're assuming that whatever + the data is in the items
+
39:58take constant space each.
+
40:01If you want to know about + bigger things in here,
+
40:03then refer to future lectures.
+
40:05This is time travel, after all.
+
40:06Just go to a future + class and then come back.
+
40:09[LAUGHS] So we'll get + there, but right now,
+
40:11we're not thinking + about what's in here.
+
40:15Whatever big thing + you're trying to store,
+
40:16you reduce it down to + constant size things.
+
40:19And then you spread them around + nodes of a pointer machine.
+
40:22How you do that, that's + up to the data structure.
+
40:25We're just transforming the + data structure to be persistent.
+
40:28OK, you could ask about other + models than pointer machines,
+
40:30but we're going to stick + to pointer machines here.
+
40:34All right.
+
40:36That was partial persistence.
+
40:38Let's do full persistence.
+
40:41That was too easy.
+
40:46Same paper does + full persistence.
+
40:48Systems That was just a warm up.
+
40:50Full persistence is actually + not that much harder.
+
40:55So let me tell you + basically what changes.
+
41:04There are two issues.
+
41:05One is that everything here + has to change and not by much.
+
41:09We're still going to + use back pointers.
+
41:11We're still going + to have my mods.
+
41:12The number of mods is going + to be slightly different
+
41:15but basically the same.
+
41:16Back pointers no longer just + refer to the latest version.
+
41:19We have to maintain back + pointers in all versions.
+
41:21So that's annoying.
+
41:22But hey, that's life.
+
41:24The amortization, the + potential function
+
41:25will change slightly + but basically not much.
+
41:30Sort of the bigger issue you + might first wonder about,
+
41:33and it's actually the most + challenging technically,
+
41:35is versions are + no longer numbers.
+
41:37Because it's not a line.
+
41:39Versions are nodes in a tree.
+
41:41You should probably + call them vertices
+
41:42in a tree to distinguish them + from nodes in the pointer
+
41:45machine.
+
41:46OK, so you've got + this tree of versions.
+
41:48And then versions are just + some point on that tree.
+
41:53This is annoying + because we like lines.
+
41:57We don't like trees as much.
+
41:58So what we're going to + do is linearize the tree.
+
42:04Like, when in doubt, cheat.
+
42:12How do we do this?
+
42:13With tree traversal.
+
42:15Imagine I'm going to draw + a super complicated tree
+
42:18of versions.
+
42:19Say there are three versions.
+
42:21OK.
+
42:22I don't want to number + them, because that would be
+
42:24kind of begging the question.
+
42:26So let's just call + them x, y, and z.
+
42:33All right.
+
42:34I mean, it's a directed + tree, because we
+
42:36have the older versions.
+
42:37This is like the + original version.
+
42:38And we made a change.
+
42:39We made a different change + on the same version.
+
42:42What I'd like to do is a + traversal of that tree,
+
42:45like a regular, as if you're + going to sort those nodes.
+
42:48Actually, let me use + color, high def here.
+
42:53So here's our + traversal of the tree.
+
42:59And I want to look at the + first and the last time I
+
43:01visit each node.
+
43:02So here's the first + time I visit x.
+
43:05So I'll write this is + the beginning of x.
+
43:09Capital X. Then this is + the first time I visit y,
+
43:13so it's beginning of y.
+
43:15And then this is the last time + I visit y, so it's the end of y.
+
43:19And then, don't care.
+
43:20Then this is the beginning of z.
+
43:24And this is the end of z.
+
43:27And then this is the end x.
+
43:29If I write those sequentially, + I get bxbyeybzez,
+
43:38because this is so easy, ex.
+
43:42OK, you can think of these + as parentheses, right?
+
43:45For whatever reason I chose b + and e for beginning and ending,
+
43:48but this is like open + parens, close parens.
+
43:50This is easy to + do in linear time.
+
43:52I think you all know how.
+
43:53Except it's not + a static problem.
+
43:55Versions are changing + all the time.
+
43:56We're adding versions.
+
43:57We're never deleting + versions, but we're always
+
43:59adding stuff to here.
+
44:00It's a little + awkward, but the idea
+
44:01is I want to + maintain this order,
+
44:05maintain the begin and + the end of each you
+
44:16might say subtree of versions.
+
44:23This string, from + bx to ex, represents
+
44:25all of the stuff in x's + subtree, in the rooted tree
+
44:29starting at x.
+
44:33How do I maintain that?
+
44:40Using a data structure.
+
44:56So we're going to use something, + a data structure we haven't yet
+
45:00seen.
+
45:02It will be in lecture 8.
+
45:04This is a time travel + data structure,
+
45:06so I'm allowed to do that.
+
45:10So order maintenance + data structure.
+
45:14You can think of this as + a magical linked list.
+
45:16Let me tell you what the + magical linked list can do.
+
45:19You can insert--
+
45:22I'm going to call it + an item, because node
+
45:24would be kind of confusing + given where we are right now.
+
45:28You can insert a new item in + the list immediately before
+
45:32or after a given item.
+
45:37OK.
+
45:37This is like a + regular linked list.
+
45:41Here's a regular linked list.
+
45:44And if I'm given a particular + item like this one,
+
45:48I can say, well, insert + a new item right here.
+
45:51You say, OK.
+
45:51Fine.
+
45:52I'll just make a new node and + relink here, relink there.
+
45:57Constant time, right?
+
45:58So in an order maintenance + data structure,
+
46:00you can do this + in constant time.
+
46:01Wow!
+
46:02So amazing.
+
46:05OK, catch is the second + operation you can do.
+
46:08Maybe I'll number these.
+
46:09This is the update.
+
46:10Then there's the query.
+
46:13The query is, what + is the relative order
+
46:17of two notes, of two items?
+
46:24x and y.
+
46:27So now I give you this + node and this node.
+
46:29And I say, which is to the left?
+
46:32Which is earlier in the order?
+
46:34I want to know, is x + basically less than y
+
46:36in terms of the + order in the list?
+
46:37Or is y less than x?
+
46:41And an order maintenance + data structure
+
46:42can do this in constant time.
+
46:45Now it doesn't look like your + mother's linked list, I guess.
+
46:50It's not the link list + you learned in school.
+
46:52It's a magical linked + list that can somehow
+
46:54answer these queries.
+
46:55How?
+
46:56Go to lecture 7.
+
46:58OK.
+
46:59Forward reference, + lecture 8, sorry.
+
47:03For now, we're just going to + assume that this magical data
+
47:05structure exists.
+
47:06So in constant + time, this is great.
+
47:09Because if we're maintaining + these b's and e's, we
+
47:11want to maintain the order + that these things appear in.
+
47:16If we want to create + a new version,
+
47:17like suppose we were + just creating version z,
+
47:20well, it used to be everything + without this bz, ez.
+
47:23And we'd just insert two + items in here, bz and ez.
+
47:27They're right next + to each other.
+
47:28And if we were given version + x, we could just say,
+
47:30oh, we'll look at ex and insert + two items right before it.
+
47:34Or you can put them + right after bx.
+
47:36I mean, there's no + actual order here.
+
47:37So it could have been y first + and then z or z first and then
+
47:40y.
+
47:42So it's really easy to add a + new version in constant time.
+
47:44You just do two of + these insert operations.
+
47:47And now you have this magical + order operation, which
+
47:50if I'm given two versions--
+
47:54I don't know, v and w--
+
47:56and I want to know is + v an ancestor of w,
+
48:00now I can do it + in constant time.
+
48:02So this lets me do a third + operation, which is, is version
+
48:09v an ancestor of version w?
+
48:21Because that's going to + be true if and only if bv
+
48:26is an ev nest around bw and ew.
+
48:39OK.
+
48:40So that's just three tests.
+
48:41They're probably not + all even necessary.
+
48:43This one always holds.
+
48:45But if these guys fit in between + these guys, then you know--
+
48:50now, what this tells us, + what we care about here,
+
48:54is reading fields.
+
48:58When we read a field, + we said, oh, we'll
+
49:00apply all the modifications + that apply to version
+
49:02v. Before that, that + was a linear order.
+
49:04So it's just all versions + less than or equal to v. Now
+
49:06it's all versions that are + ancestors of v. Given a mod,
+
49:10we need to know, does this + mod apply to my version?
+
49:13And now I tell you, I can + do that in constant time
+
49:16through magic.
+
49:17I just test these + order relations.
+
49:20If they hold, then that + mod applies to my version.
+
49:24So w's the version + we're testing.
+
49:27v is some version in the mod.
+
49:29And I want to know, am + descendant of that version?
+
49:32If so, the mod applies.
+
49:34And I update what the field is.
+
49:36I can do all pairwise ancestor + checks and figure out,
+
49:39what is the most recent + version in my ancestor history
+
49:43that modified a given field?
+
49:44That lets me read a + field in constant time.
+
49:47Constants are getting + kind of big at this point,
+
49:49but it can be done.
+
49:53Clear?
+
49:54A little bit of + a black box here.
+
49:56But now we've gotten + as far as reading.
+
50:01And we don't need + to change much else.
+
50:04So this is good news
+
50:11Maybe I'll give you + a bit of a diff.
+
50:15So full persistence, + fully persistent theorem--
+
50:26done.
+
50:27OK.
+
50:27Same theorem just + with full persistence.
+
50:30How do we do it?
+
50:31We store back pointers + now for all versions.
+
50:35It's a little bit annoying.
+
50:36But how many mods do we use?
+
50:40There's lots of ways + to get this to work,
+
50:42but I'm going to + change this number
+
50:44to 2 times d plus p plus 1.
+
50:51Wait, what's d? d is the + number of fields here.
+
50:56OK.
+
50:57We said it was + constant number fields.
+
50:59I never said what that constant + is. d for out degree, I guess.
+
51:03So p is in degree, max in + degree. d is max out degree.
+
51:09So just slightly more-- + that main reason for this
+
51:11is because back pointers now + are treated like everyone else.
+
51:14We have to treat both the out + pointers and the in pointers
+
51:17as basically the same.
+
51:18So instead of p, + we have d plus p.
+
51:19And there's a plus + 1 just for safety.
+
51:23It gets my amortization + to work, hopefully.
+
51:28OK.
+
51:29Not much else-- this + page is all the same.
+
51:32Mods are still, you give + versions, fields, values,
+
51:35reading.
+
51:36OK, well, this is no longer + less than or equal to v. But
+
51:41this is now with a version, sort + of the nearest version, that's
+
51:47an ancestor of v.
+
51:50That's what we were + just talking about.
+
51:52So that can be done + in constant time.
+
51:54Check it for all of + them, constant work.
+
51:57OK.
+
51:58That was the first part.
+
52:04Now we get to the hard + part, which is modification.
+
52:07This is going to be different.
+
52:08Maybe you I should just erase--
+
52:10yeah, I think I'll + erase everything,
+
52:13except the first clause.
+
52:24OK.
+
52:24If a node is not + full, we'll just
+
52:26add a mod, just like before.
+
52:28What changes is + when a node is full.
+
52:36Here we have to do something + completely different.
+
52:38Why?
+
52:38Because if we just + make a new version
+
52:41of this node that has empty + mods, this one's still full.
+
52:45And I can keep modifying + the same version.
+
52:48This new node that I just erased + represents some new version.
+
52:52But if I keep modifying + an old version, which
+
52:54I can do in full persistence, + this node keeps being full.
+
52:57And I keep paying + potentially huge cost.
+
53:00If all the nodes were full, + and when I make this change
+
53:02every node gets + copied, and then I
+
53:04make a change to + the same version,
+
53:06every node gets copied again.
+
53:07This is going to take + linear time per operation.
+
53:09So I can't do the old strategy.
+
53:11I need to somehow make + this node less full.
+
53:15This is where we're + definitely not functional.
+
53:17None of this was + functional, but now I'm
+
53:19going to change an old node, not + just make a new one in a more
+
53:24drastic way.
+
53:25Before I was adding a mod.
+
53:27That's not a + functional operation.
+
53:28Now I'm actually going to remove + mods from a node to rebalance.
+
53:33So what I'd like to do is + split the node into two halves.
+
53:43OK.
+
53:43So I had some big + node that was--
+
53:46I'll draw it-- completely full.
+
53:50Now I'm going to make two nodes.
+
53:52Here we go.
+
53:59This one is going + to be half full.
+
54:01This one's going to + be half full of mods.
+
54:04OK.
+
54:05The only question left is, what + do I do with all these things?
+
54:12Basically what I'd like + to do is have the--
+
54:14on the one hand, I want + to have the old node.
+
54:18It's just where it used to be.
+
54:20I've just removed half of + the mods, the second half,
+
54:23the later half.
+
54:25What does that mean?
+
54:26I don't know.
+
54:27Figure it out.
+
54:29It's linearized.
+
54:31I haven't thought + deeply about that.
+
54:32Now we're going to make a + new node with the second half
+
54:36of the mods.
+
54:40It's more painful + than I thought.
+
54:41In reality, these mods represent + a tree of modifications.
+
54:45And what you need to do is + find a partition of that tree
+
54:48into two roughly equal halves.
+
54:51You can actually do a + one third, 2/3 split.
+
54:52That's also in a future lecture, + which whose number I forget.
+
54:57So really, you're + splitting this tree
+
54:58into two roughly + balanced halves.
+
55:01And so this 2 might actually + need to change to a 3,
+
55:03but it's a constant.
+
55:06OK.
+
55:07What I want is for + this to represent
+
55:09a subtree of versions.
+
55:10Let me draw the picture.
+
55:11So here's a tree of versions + represented by the old mods.
+
55:15I'd like to cut out a + subtree rooted at some node.
+
55:18So let's just assume + for now this has exactly
+
55:21half the nodes.
+
55:22And this has half the nodes.
+
55:25In reality, I think it + can be one third, 2/3.
+
55:29OK.
+
55:29But let's keep it convenient.
+
55:32So I want the new + node to represent
+
55:34this subtree and this node + to represent everything else.
+
55:37This node is as if this + stuff hasn't happened yet.
+
55:41I mean, so it represents all + these old versions that do not,
+
55:44that are not in the subtree.
+
55:45This represents all + the latest stuff.
+
55:47So what I'm going to + do is like before, I
+
55:49want to apply some + mods to these fields.
+
55:54And whatever minds were + relevant at this point, whatever
+
55:58had been applied, I apply + those to the fields here.
+
56:02And so that means I can + remove all of these mods.
+
56:06I only cared about these ones.
+
56:09Update these fields accordingly.
+
56:11I still have the other mods to + represent all the other changes
+
56:14that could be in that subtree.
+
56:16OK.
+
56:16So we actually split the tree, + and we apply mods to new nodes.
+
56:38Anything else I need to say?
+
56:42Oh, now we need to + update pointers.
+
56:44That's always the fun part.
+
56:49Let's go over here.
+
57:05So old node hasn't moved.
+
57:07But this new node has moved.
+
57:09So for all of these + versions, I want
+
57:13to change the pointer that + used to point to old node
+
57:18should now point to new node.
+
57:20In this version, it's fine.
+
57:21It should still + point to old node,
+
57:23because this represents + all those old versions.
+
57:25But for the new version, + that version in the subtree,
+
57:28I've got to point here instead.
+
57:30OK.
+
57:31So how many pointers could + there be to this node
+
57:37that need to change.
+
57:38That's a tricky part + in this analysis.
+
57:41Think about it for a while.
+
57:45I mean, in this + new node, whatever
+
57:47is pointed to by either here or + here in the new node also has
+
57:50a return pointer.
+
57:50All pointers are bidirectional.
+
57:52So we don't really care + about whether they're forward
+
57:54or backward.
+
57:54How many pointers + are there here?
+
57:56Well, there's d here + and there's p here.
+
57:59But then there's also + some additional pointers
+
58:01represented over here.
+
58:02How many?
+
58:04Well, if we assume this + magical 50/50 split,
+
58:06there's right now d plus p plus + 1 mods over here, half of them.
+
58:12Each of them might be a pointer + to some other place, which
+
58:16has a return pointer + in that version.
+
58:18So number of back pointers + that we need to update
+
58:23is going to be this, 2 + times d 2 times p plus 1.
+
58:30So recursively update at + most 2 times d plus 2 times p
+
58:41plus 1 pointers to the node.
+
58:50The good news is this is + really only half of them
+
58:52or some fraction of them.
+
58:54It used to be--
+
58:57well, there were + more pointers before.
+
58:59We don't have to + deal with these ones.
+
59:00That's where we're + saving, and that's
+
59:01why this amortization works.
+
59:03Let me give you a potential + function that makes this work--
+
59:12is minus c times sum of the + number of empty mod slots.
+
59:23It's kind of the same + potential but before
+
59:26we had this notion of + dead and alive nodes.
+
59:28Now everything's alive + because everything
+
59:30could change at any moment.
+
59:31So instead, I'm going to + measure how much room I have
+
59:36in each node.
+
59:37Before I had no + room in this node.
+
59:38Now I have half the + space in both nodes.
+
59:41So that's good news.
+
59:44Whenever we have + this recursion, we
+
59:48can charge it to a + potential decrease.
+
59:56Fee goes down by--
+
1:00:01because I have a + negative sign here--
+
1:00:03c times, oh man, 2 times + d plus p plus 1, I think.
+
1:00:13Because there's d plus + p plus 1 space here,
+
1:00:15d plus p plus 1 space here.
+
1:00:17I mean, we added + one whole new node.
+
1:00:18And total capacity + of a node in mods
+
1:00:20is 2 times d plus p plus 1.
+
1:00:23So we get that times c.
+
1:00:26And this is basically + just enough,
+
1:00:28because this is 2 times + d plus 2 times p plus 2.
+
1:00:32And here we have a plus 1.
+
1:00:34And so the recursion gets + annihilated by 2 times d plus
+
1:00:392 times p plus 1.
+
1:00:41And then there's one + c left over to absorb
+
1:00:43whatever constant cost there + was to do all this other work.
+
1:00:47So I got the constants + just to work,
+
1:00:51except that I cheated and it's + really a one third, 2/3 split.
+
1:00:54So probably all of these + constants have to change,
+
1:00:57such is life.
+
1:00:58But I think you get the idea.
+
1:01:01Any questions about + full persistence?
+
1:01:07This is fun stuff, time travel.
+
1:01:10Yeah?
+
1:01:11AUDIENCE: So in the first + half of the thing where
+
1:01:14the if, there's room + you can put it in.
+
1:01:16ERIK DEMAINE: Right.
+
1:01:17AUDIENCE: I have a + question about how
+
1:01:17we represent the version.
+
1:01:19Because before when we said + restore now [INAUDIBLE].
+
1:01:23It made more sense if now was + like a timestamp or something.
+
1:01:25ERIK DEMAINE: OK.
+
1:01:26Right, so how do we represent a + version even here or anywhere?
+
1:01:31When we do a modification, an + update, in the data structure,
+
1:01:34we want to return + the new version.
+
1:01:36Basically, we're going + to actually store
+
1:01:39the DAG of versions.
+
1:01:41And a version is going to + be represented by a pointer
+
1:01:43into this DAG.
+
1:01:44One of the nodes in this + DAG becomes a version.
+
1:01:47Every node in this DAG is + going to store a pointer
+
1:01:50to the corresponding b character + and a corresponding e character
+
1:01:53in this data + structure, which then
+
1:01:56lets you do anything you want.
+
1:01:57Then you can query + against that version,
+
1:01:59whether it's an ancestor + of another version.
+
1:02:01So yeah, I didn't mention that.
+
1:02:02Versions are nodes in here.
+
1:02:04Nodes in here have pointers + to the b's and e's.
+
1:02:06And vice versa, the b's + and e's have pointers back
+
1:02:08to the corresponding + version node.
+
1:02:10And then you can keep + track of everything.
+
1:02:12Good question.
+
1:02:14Yeah?
+
1:02:15AUDIENCE: [INAUDIBLE] question.
+
1:02:16Remind me what d is in this.
+
1:02:17ERIK DEMAINE: Oh, d was + the maximum out degree.
+
1:02:19It's the number of fields in + a node, as defined right here.
+
1:02:26Other questions?
+
1:02:29Whew.
+
1:02:30OK, a little breather.
+
1:02:31That was partial persistence, + full persistence.
+
1:02:33This is, unfortunately, the + end of the really good results.
+
1:02:36As long as we have + constant degree nodes,
+
1:02:38in and out degree, + we can do all.
+
1:02:41We can do for + persistence for free.
+
1:02:44Obviously there are practical + constants involved here.
+
1:02:47But in theory, you + can do this perfectly.
+
1:02:53Before we go on to + confluence, there
+
1:02:54is one positive result, + which is what if you
+
1:02:58don't like amortize bounds.
+
1:03:00There are various reasons + amortize bounds might not
+
1:03:02be good.
+
1:03:03Maybe you really care + about every operation
+
1:03:04being no slower than it was + except by a constant factor.
+
1:03:08We're amortizing here, so some + operations get really slow.
+
1:03:11But the others are all + fast to compensate.
+
1:03:14You can deamortize, it's called.
+
1:03:22You can get constant + worst case slowdown
+
1:03:30for partial persistence.
+
1:03:36This is a result of Garret + Brodle from the late '90s, '97.
+
1:03:44For full persistence-- + so it's an open problem.
+
1:03:47I don't know if people + have worked on that.
+
1:03:55All right.
+
1:03:56So some, mostly good results.
+
1:03:59Let's move on to confluent + persistence where things
+
1:04:01get a lot more challenging.
+
1:04:17Lots of things go out the window + with confluent persistence.
+
1:04:20In particular, your + versions are now a DAG.
+
1:04:23It's a lot harder + to linearize a DAG.
+
1:04:25Trees are not that + far from pads.
+
1:04:28But DAGs are quite far + from pads, unfortunately.
+
1:04:33But that's not all + that goes wrong.
+
1:04:44Let me first tell you the + kind of end effect as a user.
+
1:04:50Imagine you have + a data structure.
+
1:04:54Think of it as a + list, I guess, which
+
1:04:57is a list of characters + in your document.
+
1:04:59You're using vi or Word, + your favorite, whatever.
+
1:05:03It's a text editor.
+
1:05:05You've got a string of words.
+
1:05:06And now you like to do + things like copy and paste.
+
1:05:09It's a nice operation.
+
1:05:11So you select an interval of + the string and you copy it.
+
1:05:16And then you paste + it somewhere else.
+
1:05:18So now you've got two + copies of that string.
+
1:05:21This is, in some + sense, what you might
+
1:05:24call a confluent + operation, because--
+
1:05:27yeah, maybe a cleaner way to + think of it is the following.
+
1:05:30You have your string.
+
1:05:31Now I have an operation, + which is split it.
+
1:05:33So now I have two strings.
+
1:05:35OK.
+
1:05:36And now I have an operation, + which is split it.
+
1:05:38Now I have three strings.
+
1:05:40OK.
+
1:05:41Now I have an operation + which is concatenate.
+
1:05:44So I can, for + example, reconstruct
+
1:05:47the original string-- actually, + I have the original string.
+
1:05:49No biggie.
+
1:05:51Let's say-- because + I have all versions.
+
1:05:54I never lose them.
+
1:05:55So now instead, I'm going to + cut the string here, let's say.
+
1:05:59So now I have this and this.
+
1:06:03And now I can do + things like concatenate
+
1:06:06from here to here to here.
+
1:06:10And I will get this + plus this plus this.
+
1:06:16OK.
+
1:06:17This guy moved here.
+
1:06:18So that's a copy/paste + operation with a constant number
+
1:06:20of splits and concatenates.
+
1:06:22I could also do cut and paste.
+
1:06:23With confluence, I can + do crazy cuts and pastes
+
1:06:26in all sorts of ways.
+
1:06:28So what?
+
1:06:29Well, the so what + is I can actually
+
1:06:32double the size of + my data structure
+
1:06:33in a constant number + of operations.
+
1:06:36I can take, for example, + the entire string
+
1:06:38and concatenate it to itself.
+
1:06:40That will double the + number of characters,
+
1:06:41number of elements in there.
+
1:06:43I can do that again + and again and again.
+
1:06:45So in u updates, + I can potentially
+
1:06:51get a data structure + size 2 to the u.
+
1:06:57Kind of nifty.
+
1:06:58I think this is why + confluence is cool.
+
1:07:00It's also why it's hard.
+
1:07:02So not a big surprise.
+
1:07:03But, here we go.
+
1:07:08In that case, the version DAG, + for reference, looks like this.
+
1:07:13You're taking the same + version, combining it.
+
1:07:16So here I'm assuming I have + a concatenate operation.
+
1:07:20And so the effect here, + every time I do this,
+
1:07:24I double the size.
+
1:07:44All right.
+
1:07:44What do I want to say about + confluent persistence?
+
1:07:46All right.
+
1:07:47Let me start with the + most general result, which
+
1:07:53is by Fiat and Kaplan in 2003.
+
1:08:04They define a notion called + effective depth of a version.
+
1:08:08Let me just write it down.
+
1:08:21It's kind of like + if you took this DAG
+
1:08:24and expanded it out to be a + tree of all possible paths.
+
1:08:30Instead of point + to the same node,
+
1:08:31you could just + duplicate that node
+
1:08:33and then have pointers + left and right.
+
1:08:35OK.
+
1:08:35So if I did that, of course, + this size grows exponentially.
+
1:08:38It explicitly represents the + size of my data structure.
+
1:08:41At the bottom, if + I have u things,
+
1:08:42I'm going to have 2 to the + u leaves at the bottom.
+
1:08:45But then I can easily + measure the number of paths
+
1:08:49from the root to + the same version.
+
1:08:50At the bottom, I still + label it, oh, those
+
1:08:52are all v. They're all the + same version down there.
+
1:08:54So exponential number + of paths, if I take log,
+
1:08:56I get what I call + effective depth.
+
1:08:58It's like if you somehow + could rebalance that tree,
+
1:09:02this is the best you + could hope to do.
+
1:09:05It's not really a lower bound.
+
1:09:07But it's a number.
+
1:09:08It's a thing.
+
1:09:09OK.
+
1:09:10Then the result they achieve + is that the overhead is
+
1:09:17log the number of + updates plus-- this
+
1:09:19is a multiplicative overhead, + so you take your running time.
+
1:09:22You multiply it by this.
+
1:09:25And this is a time + and a space overhead.
+
1:09:31So maximum effective depth + of all versions, maybe even
+
1:09:34sum of effective depths, but + we'll just say max to be safe.
+
1:09:39Sorry-- sum over + all the operations.
+
1:09:41This is per operation.
+
1:09:43You pay basically + the effective depth
+
1:09:44of that operation as a factor.
+
1:09:48Now, the annoying thing is if + you have this kind of set up
+
1:09:51where the size + grew exponentially,
+
1:09:54then number of paths + is exponential.
+
1:09:56Log of the number of + paths is linear in u.
+
1:09:59And so this factor could be + as much as u, linear slowdown.
+
1:10:06Now, Fiat and Kaplan argue + linear slowdown is not
+
1:10:08that bad, because if you weren't + even persistent, if you did
+
1:10:13this in the naive way of + just recopying the data,
+
1:10:18you were actually spending + exponential time to build
+
1:10:21the final data structure.
+
1:10:22It has exponential size.
+
1:10:23Just to represent it explicitly + requires exponential time,
+
1:10:26so losing a linear + factor to do u operations
+
1:10:29and now u squared time + instead of 2 to the u.
+
1:10:31So it's a big + improvement to do this.
+
1:10:35The downside of this approach is + that even if you have a version
+
1:10:40DAG that looks like this, + even if the size of the data
+
1:10:43structure is staying + normal, staying linear, so
+
1:10:46this potential, you could + be doubling the size.
+
1:10:48But we don't know what + this merge operation is.
+
1:10:49Maybe it just throws + away one of the versions
+
1:10:51or does something--
+
1:10:53somehow takes half + the nodes from one
+
1:10:55side, half the nodes from + the other side maybe.
+
1:10:57These operations + do preserve size.
+
1:10:58Then there's no great reason why + it should be a linear slowdown,
+
1:11:02but it is.
+
1:11:03OK?
+
1:11:04So it's all right but not great.
+
1:11:10And it's the best + general result we know.
+
1:11:13They also prove a lower bound.
+
1:11:21So lower bound is some effect + of depth, total bits of space.
+
1:11:37OK.
+
1:11:37What does this mean?
+
1:11:40So even if this + is not happening,
+
1:11:42the number of bits + of space you need
+
1:11:44in the worst case-- + this does not
+
1:11:45apply to every data structure.
+
1:11:47That's one catch.
+
1:11:49They give a specific + data structure
+
1:11:52where you need this much space.
+
1:11:53So it's similar to + this kind of picture.
+
1:11:57We'll go into the details.
+
1:11:58And you need this much space.
+
1:12:00Now, this is kind of + bad, because if there's
+
1:12:02u operations, and each of these + is u, that's u squared space.
+
1:12:06So we actually need a + factor u blow up in space.
+
1:12:09It looks like.
+
1:12:11But to be more precise, + what this means is
+
1:12:14that you need omega e of + v space, and therefore
+
1:12:17time overhead per update, if--
+
1:12:27this is not written + in the paper--
+
1:12:29queries are free.
+
1:12:35Implicit here, they just want + to slow down and increase space
+
1:12:40for the updates you do, + which is pretty natural.
+
1:12:43Normally you think of queries + as not increasing space.
+
1:12:46But in order to construct + this lower bound,
+
1:12:49they actually do + this many queries.
+
1:12:52So they do e of v queries + and then one update.
+
1:12:55And they say, oh well, space + had to go up by an extra e of v.
+
1:12:59So if you only charge + updates for the space,
+
1:13:02then yes, you have + to lose potentially
+
1:13:04a linear factor, this effect + of death, potentially u.
+
1:13:07But if you also + charge the queries,
+
1:13:09it's still constant + in their example.
+
1:13:13So open question, for + confluent persistence,
+
1:13:18can you achieve + constant everything?
+
1:13:21Constant time and + space overheads,
+
1:13:27multiplicative + factor per operation,
+
1:13:33both updates and queries.
+
1:13:35So if you charge the + queries, potentially you
+
1:13:37could get constant everything.
+
1:13:38This is a relatively + new realization.
+
1:13:43And no one knows + how to do this yet.
+
1:13:47Nice challenge.
+
1:13:47I think maybe we'll work on that + in our first problem session.
+
1:13:50I would like to.
+
1:13:53Questions about that result?
+
1:13:54I'm not going to + prove the result.
+
1:13:56But it is a fancy + rebalancing of those kinds
+
1:13:59of pictures to get this log.
+
1:14:10There are other results + I'd like to tell you about.
+
1:14:32So brand new result--
+
1:14:34that was from 2003.
+
1:14:35This is from 2012--
+
1:14:38no, '11, '11, sorry.
+
1:14:42It's SOTO, which is in January, + so it's a little confusing.
+
1:14:47Is it '11?
+
1:14:49Maybe '12.
+
1:14:50Actually now I'm not sure.
+
1:14:51It's February already, right?
+
1:14:54A January, either this + year or last year.
+
1:15:00It's not as general + a transformation.
+
1:15:02It's only going to hold in + what's called a disjoint case.
+
1:15:05But it gets a very good bound--
+
1:15:07not quite constant, + but logarithmic.
+
1:15:09OK, logarithmic + would also be nice.
+
1:15:12Or log, log n, whatever n is.
+
1:15:17Pick your favorite n, + number of operations, say.
+
1:15:22OK.
+
1:15:25If you assume that confluent + operations are performed only
+
1:15:39on two versions with + no shared nodes--
+
1:15:50OK, this would be a way to + forbid this kind of behavior
+
1:15:53where I concatenate the + data structure with itself.
+
1:15:56All the nodes are common.
+
1:15:58If I guarantee that maybe I, you + know, slice this up, slice it,
+
1:16:01dice it, wherever, and + then re-emerge them
+
1:16:03in some other order, but + I never use two copies
+
1:16:06of the same piece, that + would be a valid confluent
+
1:16:10operation over here.
+
1:16:12This is quite a + strong restriction
+
1:16:13that you're not allowed.
+
1:16:16If you try to, who + knows what happens.
+
1:16:19Behavior's undefined.
+
1:16:19So won't tell you, + oh, those two versions
+
1:16:21have this node in common.
+
1:16:22You've got to make + a second copy of it.
+
1:16:24So somehow you have to guarantee + that control and operations
+
1:16:27never overlap.
+
1:16:29But they can be reordered.
+
1:16:33Then you can get + order log n overhead.
+
1:16:39n is the number of operations.
+
1:16:45I have a sketch + of a proof of this
+
1:16:46but not very much + time to talk about it.
+
1:16:48All right.
+
1:16:49Let me give you a quick picture.
+
1:16:51In general, the + versions form a DAG.
+
1:16:55But if you make this assumption, + and you look at a single node,
+
1:17:00and look at all the versions + where that node appears,
+
1:17:03that is a tree.
+
1:17:05Because you're not allowed + to remerge versions
+
1:17:07that have the same node.
+
1:17:08So while the big + picture is a DAG,
+
1:17:11the small picture of a + single guy is some tree.
+
1:17:17I'm drawing all + these wiggly lines
+
1:17:18because there are all + these versions where
+
1:17:20the node isn't changing.
+
1:17:21This is the entire version DAG.
+
1:17:23And then some of these nodes--
+
1:17:26some of these versions, + I should say--
+
1:17:29that node that we're + thinking about changes.
+
1:17:31OK, whenever it + branches, it's probably
+
1:17:33because the actual + node changed, maybe.
+
1:17:36I don't know.
+
1:17:37Anyway there are some dots + here where the version changed,
+
1:17:40some of the leaves, + maybe, that changed.
+
1:17:41Maybe some of them haven't yet.
+
1:17:44In fact, let's see.
+
1:17:48Here where it's change, it could + be that we destroyed the node.
+
1:17:51Maybe it's gone from the + actual data structure.
+
1:17:54But there still may + be versions down here.
+
1:17:56It's not really a tree.
+
1:17:57It's a whole DAG of + stuff down there.
+
1:17:59So that's kind of ugly.
+
1:18:01Where never the + node still exists,
+
1:18:03I guess that is an + actual leaf of the DAG.
+
1:18:05So those are OK.
+
1:18:06But as soon as I maybe + delete that node,
+
1:18:08then there can be a + whole subtree down there.
+
1:18:11OK.
+
1:18:12So now if you look at + an arbitrary version,
+
1:18:15so what we're thinking about + is how to implement reading,
+
1:18:17let's say.
+
1:18:18Reading and writing are + more or less the same.
+
1:18:21I give you a version.
+
1:18:22I give you a node, and + I give you a field.
+
1:18:23I want to know, what is the + value of that field, that node,
+
1:18:26that version?
+
1:18:27So now where could + a version fall?
+
1:18:30Well it has to be + in this subtree.
+
1:18:31Because the node has to exist.
+
1:18:36And then it's maybe a pointer.
+
1:18:38A pointer could be to + another node, which
+
1:18:42also has this kind of picture.
+
1:18:44They could be overlapping trees.
+
1:18:46In general, there + are three cases.
+
1:18:48Either you're lucky, and the + version you're talking about
+
1:18:51is a version where + the node was changed.
+
1:18:53In that case, the data is + just stored right there.
+
1:18:58That's easy.
+
1:18:59So you could just say, oh, + how did the node change?
+
1:19:01Oh, that's what the field is.
+
1:19:02OK, follow the pointer.
+
1:19:05A slightly harder + case it's a version
+
1:19:08in between two such changes.
+
1:19:09And maybe these are not updates.
+
1:19:11So I sort of want to know, what + was the previous version where
+
1:19:17this node changed + in constant time?
+
1:19:21It can be done.
+
1:19:22Not constant time, + actually, logarithmic time,
+
1:19:25using a data structure + called link-cut trees,
+
1:19:28another fun black + box for now, which
+
1:19:31we will cover in lecture + 19, far in the future.
+
1:19:36OK.
+
1:19:39Well, that's one case.
+
1:19:40There's also the version + where maybe a version
+
1:19:43is down here in a subtree.
+
1:19:45I guess then the + node didn't exist.
+
1:19:48Well, all these + things can happen.
+
1:19:50And that's even harder.
+
1:19:51It's messy.
+
1:19:53They use another trick, which + is called fractional cascading,
+
1:19:59which I'm not even going to + try to describe what it means.
+
1:20:02But it's got a very cool name.
+
1:20:04Because we'll be + covering it in lecture 3.
+
1:20:06So stay tuned for that.
+
1:20:07I'm not going to say how + it applies to this setting,
+
1:20:09but it's a necessary + step in here.
+
1:20:13In the remaining + zero minutes, let
+
1:20:15me tell you a little bit about + functional data structures.
+
1:20:17[LAUGHTER]
+
1:20:20Beauty of time travel.
+
1:20:24Functional-- I just + want to give you
+
1:20:31some examples of things that + can be done functionally.
+
1:20:33There's a whole book about + functional data structures
+
1:20:35by Okasaki.
+
1:20:36It's pretty cool.
+
1:20:38A simple example + is balanced BSTs.
+
1:20:42So if you just want to get + log n time for everything,
+
1:20:44you can do that functionally.
+
1:20:45It's actually really easy.
+
1:20:46You pick your favorite balance + BST, like red black trees.
+
1:20:48You implement it top down so you + never follow parent pointers.
+
1:20:51So you don't need + parent pointers.
+
1:20:52So then as you make changes + down the tree, you just copy.
+
1:20:57It's called path copying.
+
1:20:58Whenever you're about + to make a change,
+
1:21:00make a copy of that node.
+
1:21:02So you end up copying all + the change nodes and all
+
1:21:05their ancestors.
+
1:21:06There's only log n of them, + so it takes log n time.
+
1:21:09Clear?
+
1:21:10Easy.
+
1:21:11It's a nice technique.
+
1:21:12Sometimes path copying + is very useful.
+
1:21:14Like link-cut + trees, for example,
+
1:21:16can be made functional.
+
1:21:17We don't know what they are, + but they're basically a BST.
+
1:21:19And you can make + them functional.
+
1:21:21We use that in a paper.
+
1:21:23All right.
+
1:21:23Deques.
+
1:21:25These are doubly ended queues.
+
1:21:27So it's like a stack and + a queue and everything.
+
1:21:29You can insert and delete from + the beginning and the end.
+
1:21:32People start to know + what these are now,
+
1:21:34because Python calls him that.
+
1:21:35But you can also + do concatenation
+
1:21:41with deques in constant + time per operation.
+
1:21:43This is cool.
+
1:21:44Deques are not very + hard to make functional.
+
1:21:46But you can do deques and + you can concatenate them
+
1:21:48like we were doing in the figure + that's right behind this board.
+
1:21:51Constant time split + is a little harder.
+
1:21:53That's actually one + of my open problems.
+
1:21:56Can you do lists with split and + concatenate in constant time--
+
1:22:01functionally or confluently, + persistently, or whatever?
+
1:22:05Another example-- oh, you + can do a mix of the two.
+
1:22:08You can get log n search in + constant time deque operations,
+
1:22:12is you can do tries.
+
1:22:14So a try is a tree + with a fixed topology.
+
1:22:17Think of it as a directory tree.
+
1:22:20So maybe you're + using Subversion.
+
1:22:21Subversion has time + travel operations.
+
1:22:23You can copy an entire + subtree from one version
+
1:22:26and stick it into a new + version, another version.
+
1:22:30So you get a version DAG.
+
1:22:32It's a confluently + persistent data structure--
+
1:22:34not implemented optimally, + because we don't necessarily
+
1:22:37know how.
+
1:22:38But there is one paper.
+
1:22:40This actually came from the open + problem section of this class
+
1:22:43four years ago, I think.
+
1:22:45It's with Eric Price + and Stefan Langerman.
+
1:22:49You can get very good results.
+
1:22:50I won't write them down + because it takes a while.
+
1:22:52Basically log the degree + of the nodes factor
+
1:22:56and get functional, and + you can be even fancier
+
1:22:59and get slightly better + bounds like log log the degree
+
1:23:02and get confluently persistent + with various tricks,
+
1:23:05including using all of + these data structures.
+
1:23:07So if you want to implement + subversion optimally,
+
1:23:09that is known how to be done but + hasn't actually been done yet.
+
1:23:14Because there are those + pesky constant factors.
+
1:23:18I think that's all.
+
1:23:19What is known about functional + is there's a log n separation.
+
1:23:23You can be log n + away from the best.
+
1:23:26That's the worst + separation known,
+
1:23:30between functional and just + a regular old data structure.
+
1:23:33It'd be nice to improve that.
+
1:23:34Lots of open problems here.
+
1:23:35Maybe we'll work + on them next time.
+
\ No newline at end of file