|
<!DOCTYPE html> |
|
<html> |
|
<head> |
|
<meta charset="utf-8"> |
|
<meta name="description" |
|
content="WOPR: Word Predictor. Memory-based language modeling"> |
|
<meta name="keywords" content="word prediction, wopr, memory-based learning, timbl, memory-based language modeling"> |
|
<meta name="viewport" content="width=device-width, initial-scale=1"> |
|
<title>WOPR: Memory-based language modeling</title> |
|
|
|
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" |
|
rel="stylesheet"> |
|
|
|
<link rel="stylesheet" href="./static/css/bulma.min.css"> |
|
<link rel="stylesheet" href="./static/css/bulma-carousel.min.css"> |
|
<link rel="stylesheet" href="./static/css/bulma-slider.min.css"> |
|
<link rel="stylesheet" href="./static/css/fontawesome.all.min.css"> |
|
<link rel="stylesheet" |
|
href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css"> |
|
<link rel="stylesheet" href="./static/css/index.css"> |
|
<link rel="icon" href="./static/images/favicon.svg"> |
|
|
|
|
|
<script defer src="./static/js/fontawesome.all.min.js"></script> |
|
<script src="./static/js/bulma-carousel.min.js"></script> |
|
<script src="./static/js/bulma-slider.min.js"></script> |
|
<script src="./static/js/index.js"></script> |
|
</head> |
|
<body> |
|
|
|
<section class="hero"> |
|
<div class="hero-body"> |
|
<div class="container is-max-desktop"> |
|
<div class="columns is-centered"> |
|
<div class="column has-text-centered"> |
|
<h1 class="title is-1 publication-title">WOPR: Memory-based language modeling</h1> |
|
<div class="is-size-5 publication-authors"> |
|
<span class="author-block"> |
|
<a href="https://antalvandenbosch.nl/" target="_blank">Antal van den Bosch</a><sup>1</sup>,</span> |
|
<span class="author-block"> |
|
<a href="https://www.humlab.lu.se/person/PeterBerck/" target="_blank">Peter Berck</a><sup>2</sup>,</span> |
|
</div> |
|
|
|
<div class="is-size-5 publication-authors"> |
|
<span class="author-block"><sup>1</sup>Utrecht University</span> |
|
<span class="author-block"><sup>2</sup>University of Lund</span> |
|
</div> |
|
|
|
<div class="column has-text-centered"> |
|
<div class="publication-links"> |
|
|
|
|
|
<span class="link-block"> |
|
<a href="https://berck.se/thesis.pdf" target="_blank" |
|
class="external-link button is-normal is-rounded is-dark"> |
|
<span class="icon"> |
|
<i class="fas fa-file-pdf"></i> |
|
</span> |
|
<span>Thesis</span> |
|
</a> |
|
</span> |
|
|
|
|
|
<span class="link-block"> |
|
<a href="http://ufal.mff.cuni.cz/pbml/91/art-bosch.pdf" target="_blank" |
|
class="external-link button is-normal is-rounded is-dark"> |
|
<span class="icon"> |
|
<i class="fas fa-file-pdf"></i> |
|
</span> |
|
<span>Paper</span> |
|
</a> |
|
</span> |
|
|
|
|
|
<span class="link-block"> |
|
<a href="https://github.com/LanguageMachines/wopr" target="_blank" |
|
class="external-link button is-normal is-rounded is-dark"> |
|
<span class="icon"> |
|
<i class="fab fa-github"></i> |
|
</span> |
|
<span>Code</span> |
|
</a> |
|
|
|
</div> |
|
|
|
</div> |
|
</div> |
|
</div> |
|
</div> |
|
</div> |
|
</section> |
|
|
|
|
|
|
|
<section class="section"> |
|
<div class="container is-max-desktop"> |
|
|
|
<div class="columns is-centered has-text-centered"> |
|
<div class="column is-four-fifths"> |
|
<h2 class="title is-3">WOPR in brief</h2> |
|
<div class="content has-text-justified"> |
|
<p> |
|
WOPR, Word Predictor, is a memory-based language model developed in 2006-2011. |
|
It just woke up from its cryogenic sleep and is figuring out what is |
|
all the fuss about LLMs. |
|
</p> |
|
<p> |
|
WOPR is an ecologically friendly alternative LLM with a staggeringly simple |
|
core. Everyone who took "Machine Learning 101" knows that the <i>k</i>-nearest |
|
neighbor classifier is among the simplest yet most robust ML classifiers out |
|
there, perhaps only beaten by the Naive Bayes classifier. So what happens if |
|
you train a <i>k</i>-NN classifier to predict words? |
|
</p> |
|
<p> |
|
WOPR's engine is the |
|
<a href="https://github.com/LanguageMachines/timbl">TiMBL</a> classifier, |
|
which implements a number of fast approximations of <i>k</i>-NN classification, |
|
all partly based on decision-tree classification. On |
|
tasks like next-word prediction, <i>k</i>-NN is inhibitively slow, but the |
|
<a href="https://github.com/LanguageMachines/timbl">TiMBL</a> |
|
approximations can classify faster at many orders of magnitude. |
|
</p> |
|
<p> |
|
Compared to Transformer-based LLMs, on the plus side memory-based LLMs are |
|
</p> |
|
<ul> |
|
<li>very efficient in training. Training is essentially reading the data (in linear time) |
|
and compressing it into a decision tree structure. This can be done on CPUs, |
|
with sufficient RAM. In short, its <b>ecological footprint is dramatically lower</b>;</li> |
|
<li>pretty efficient in generation when running with the fastest decision-tree |
|
approximations of <i>k</i>-NN classification. <b>This can be done on CPUs as well</b>;</li> |
|
<li>completely transparent in their functioning. There can also be no doubt about |
|
the fact that <b>they memorize training data patterns</b>.</li> |
|
</ul> |
|
<p>On the downside,</p> |
|
<ul> |
|
<li><b>Their performance is currently not as great as current Transformer-based LLMs</b>, |
|
but we have not trained |
|
beyond data set sizes with orders of magnitudes above 100 million words. |
|
Watch this space!</li> |
|
<li>They <b>do not have a delicate attention mechanism</b>, arguably the killer feature |
|
of Transformer-based decoders;</li> |
|
<li>Memory requirements during training are <b>heavy with large datasets</b> |
|
(more than 32 GB RAM with more than 100 million words);</li> |
|
</ul> |
|
</div> |
|
</div> |
|
</div> |
|
|
|
|
|
</div> |
|
</section> |
|
|
|
|
|
|
|
|
|
<section class="section" id="BibTeX"> |
|
<div class="container is-max-desktop content"> |
|
<h2 class="title">BibTeX</h2> |
|
<pre><code>@article{VandenBosch+09, |
|
author = {A. {Van den Bosch} and P. Berck}, |
|
journal = {The Prague Bulletin of Mathematical Linguistics}, |
|
pages = {17--26}, |
|
title = {Memory-based machine translation and language modeling}, |
|
volume = {91}, |
|
year = {2009}, |
|
bdsk-url-1 = {http://ufal.mff.cuni.cz/pbml/91/art-bosch.pdf}} |
|
}</code></pre> |
|
</div> |
|
</section> |
|
|
|
|
|
<footer class="footer"> |
|
<div class="container"> |
|
<div class="content has-text-centered"> |
|
<a class="icon-link" href="https://github.com/LanguageMachines/wopr" target="_blank" class="external-link" disabled> |
|
<i class="fab fa-github"></i> |
|
</a> |
|
</div> |
|
<div class="columns is-centered"> |
|
<div class="column is-8"> |
|
<div class="content"> |
|
<p> |
|
This website is licensed under a <a rel="license" target="_blank" |
|
href="http://creativecommons.org/licenses/by-sa/4.0/">Creative |
|
Commons Attribution-ShareAlike 4.0 International License</a>. |
|
</p> |
|
<p> |
|
This websites gladly made use of the <a target="_blank" |
|
href="https://github.com/nerfies/nerfies.github.io">source code</a> of the Nerfies website. Thanks! |
|
</p> |
|
</div> |
|
</div> |
|
</div> |
|
</div> |
|
</footer> |
|
|
|
</body> |
|
</html> |
|
|