File size: 8,237 Bytes
e98583c 4a2dae2 e98583c 4a2dae2 e98583c 4a2dae2 e98583c 4a2dae2 e98583c 4a2dae2 e98583c 4a2dae2 e98583c 4a2dae2 e98583c 9e3be4d e98583c 4a2dae2 e98583c 4a2dae2 e98583c 4a2dae2 9e3be4d e98583c 4a2dae2 e98583c 4a2dae2 e98583c f46e93f e98583c a0f85c8 e14d4fe e98583c e14d4fe 388ced1 e14d4fe 388ced1 e14d4fe e98583c a5ab2e5 e98583c a5ab2e5 38e6262 a5ab2e5 a0f85c8 38e6262 a5ab2e5 a0f85c8 38e6262 4970c55 e14d4fe 4fb947b 38e6262 a5ab2e5 e98583c 4a2dae2 e98583c d9e4fd5 e98583c d9e4fd5 e98583c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 |
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="description"
content="WOPR: Word Predictor. Memory-based language modeling">
<meta name="keywords" content="word prediction, wopr, memory-based learning, timbl, memory-based language modeling">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>WOPR: Memory-based language modeling</title>
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro"
rel="stylesheet">
<link rel="stylesheet" href="./static/css/bulma.min.css">
<link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
<link rel="stylesheet" href="./static/css/bulma-slider.min.css">
<link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
<link rel="stylesheet"
href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
<link rel="stylesheet" href="./static/css/index.css">
<link rel="icon" href="./static/images/favicon.svg">
<!-- <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script> -->
<script defer src="./static/js/fontawesome.all.min.js"></script>
<script src="./static/js/bulma-carousel.min.js"></script>
<script src="./static/js/bulma-slider.min.js"></script>
<script src="./static/js/index.js"></script>
</head>
<body>
<section class="hero">
<div class="hero-body">
<div class="container is-max-desktop">
<div class="columns is-centered">
<div class="column has-text-centered">
<h1 class="title is-1 publication-title">WOPR: Memory-based language modeling</h1>
<div class="is-size-5 publication-authors">
<span class="author-block">
<a href="https://antalvandenbosch.nl/" target="_blank">Antal van den Bosch</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://www.humlab.lu.se/person/PeterBerck/" target="_blank">Peter Berck</a><sup>2</sup>,</span>
</div>
<div class="is-size-5 publication-authors">
<span class="author-block"><sup>1</sup>Utrecht University</span>
<span class="author-block"><sup>2</sup>University of Lund</span>
</div>
<div class="column has-text-centered">
<div class="publication-links">
<!-- PDF Link. -->
<span class="link-block">
<a href="https://berck.se/thesis.pdf" target="_blank"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fas fa-file-pdf"></i>
</span>
<span>Thesis</span>
</a>
</span>
<!-- PDF Link. -->
<span class="link-block">
<a href="http://ufal.mff.cuni.cz/pbml/91/art-bosch.pdf" target="_blank"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fas fa-file-pdf"></i>
</span>
<span>Paper</span>
</a>
</span>
<!-- Code Link. -->
<span class="link-block">
<a href="https://github.com/LanguageMachines/wopr" target="_blank"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fab fa-github"></i>
</span>
<span>Code</span>
</a>
</div>
</div>
</div>
</div>
</div>
</div>
</section>
<section class="section">
<div class="container is-max-desktop">
<!-- Abstract. -->
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<h2 class="title is-3">WOPR in brief</h2>
<div class="content has-text-justified">
<p>
WOPR, Word Predictor, is a memory-based language model developed in 2006-2011.
It just woke up from its cryogenic sleep and is figuring out what is
all the fuss about LLMs.
</p>
<p>
WOPR is an ecologically friendly alternative LLM with a staggeringly simple
core. Everyone who took "Machine Learning 101" knows that the <i>k</i>-nearest
neighbor classifier is among the simplest yet most robust ML classifiers out
there, perhaps only beaten by the Naive Bayes classifier. So what happens if
you train a <i>k</i>-NN classifier to predict words?
</p>
<p>
WOPR's engine is the
<a href="https://github.com/LanguageMachines/timbl">TiMBL</a> classifier,
which implements a number of fast approximations of <i>k</i>-NN classification,
all partly based on decision-tree classification. On
tasks like next-word prediction, <i>k</i>-NN is inhibitively slow, but the
<a href="https://github.com/LanguageMachines/timbl">TiMBL</a>
approximations can classify faster at many orders of magnitude.
</p>
<p>
Compared to Transformer-based LLMs, on the plus side memory-based LLMs are
</p>
<ul>
<li>very efficient in training. Training is essentially reading the data (in linear time)
and compressing it into a decision tree structure. This can be done on CPUs,
with sufficient RAM. In short, its <b>ecological footprint is dramatically lower</b>;</li>
<li>pretty efficient in generation when running with the fastest decision-tree
approximations of <i>k</i>-NN classification. <b>This can be done on CPUs as well</b>;</li>
<li>completely transparent in their functioning. There can also be no doubt about
the fact that <b>they memorize training data patterns</b>.</li>
</ul>
<p>On the downside,</p>
<ul>
<li><b>Their performance is currently not as great as current Transformer-based LLMs</b>,
but we have not trained
beyond data set sizes with orders of magnitudes above 100 million words.
Watch this space!</li>
<li>They <b>do not have a delicate attention mechanism</b>, arguably the killer feature
of Transformer-based decoders;</li>
<li>Memory requirements during training are <b>heavy with large datasets</b>
(more than 32 GB RAM with more than 100 million words);</li>
</ul>
</div>
</div>
</div>
<!--/ Abstract. -->
</div>
</section>
<section class="section" id="BibTeX">
<div class="container is-max-desktop content">
<h2 class="title">BibTeX</h2>
<pre><code>@article{VandenBosch+09,
author = {A. {Van den Bosch} and P. Berck},
journal = {The Prague Bulletin of Mathematical Linguistics},
pages = {17--26},
title = {Memory-based machine translation and language modeling},
volume = {91},
year = {2009},
bdsk-url-1 = {http://ufal.mff.cuni.cz/pbml/91/art-bosch.pdf}}
}</code></pre>
</div>
</section>
<footer class="footer">
<div class="container">
<div class="content has-text-centered">
<a class="icon-link" href="https://github.com/LanguageMachines/wopr" target="_blank" class="external-link" disabled>
<i class="fab fa-github"></i>
</a>
</div>
<div class="columns is-centered">
<div class="column is-8">
<div class="content">
<p>
This website is licensed under a <a rel="license" target="_blank"
href="http://creativecommons.org/licenses/by-sa/4.0/">Creative
Commons Attribution-ShareAlike 4.0 International License</a>.
</p>
<p>
This websites gladly made use of the <a target="_blank"
href="https://github.com/nerfies/nerfies.github.io">source code</a> of the Nerfies website. Thanks!
</p>
</div>
</div>
</div>
</div>
</footer>
</body>
</html>
|