Update index.html
Browse files- index.html +8 -6
index.html
CHANGED
@@ -109,16 +109,16 @@
|
|
109 |
core. Everyone who took "Machine Learning 101" knows that the <i>k</i>-nearest
|
110 |
neighbor classifier is among the simplest yet most robust ML classifiers out
|
111 |
there, perhaps only beaten by the Naive Bayes classifier. So what happens if
|
112 |
-
you train a <i>k</i>-NN classifier to predict words?
|
113 |
</p>
|
114 |
<p>
|
115 |
-
|
116 |
<a href="https://github.com/LanguageMachines/timbl">TiMBL</a> classifier,
|
117 |
-
|
118 |
-
|
|
|
119 |
<a href="https://github.com/LanguageMachines/timbl">TiMBL</a>
|
120 |
-
|
121 |
-
partly based on decision-tree classification and many orders of magnitude faster.
|
122 |
</p>
|
123 |
<p>
|
124 |
Compared to Transformer-based LLMs, on the plus side memory-based LLMs are
|
@@ -138,6 +138,8 @@
|
|
138 |
but we have not trained
|
139 |
beyond data set sizes with orders of magnitudes above 100 million words.
|
140 |
Watch this space!</li>
|
|
|
|
|
141 |
<li>Memory requirements during training are <b>heavy with large datasets</b>
|
142 |
(more than 32 GB RAM with more than 100 million words);</li>
|
143 |
<li>Memory-based LLMs are not efficient at generation time when running relatively
|
|
|
109 |
core. Everyone who took "Machine Learning 101" knows that the <i>k</i>-nearest
|
110 |
neighbor classifier is among the simplest yet most robust ML classifiers out
|
111 |
there, perhaps only beaten by the Naive Bayes classifier. So what happens if
|
112 |
+
you train a <i>k</i>-NN classifier to predict words?
|
113 |
</p>
|
114 |
<p>
|
115 |
+
WOPR's engine is the
|
116 |
<a href="https://github.com/LanguageMachines/timbl">TiMBL</a> classifier,
|
117 |
+
which implements a number of fast approximations of <i>k</i>-NN classification,
|
118 |
+
all partly based on decision-tree classification. On
|
119 |
+
tasks like next-word prediction, <i>k</i>-NN is inhibitively slow, but the
|
120 |
<a href="https://github.com/LanguageMachines/timbl">TiMBL</a>
|
121 |
+
approximations can classify faster at many orders of magnitude.
|
|
|
122 |
</p>
|
123 |
<p>
|
124 |
Compared to Transformer-based LLMs, on the plus side memory-based LLMs are
|
|
|
138 |
but we have not trained
|
139 |
beyond data set sizes with orders of magnitudes above 100 million words.
|
140 |
Watch this space!</li>
|
141 |
+
<li>They <b>do not have a delicate attention mechanism</b>b>, arguably the killer feature
|
142 |
+
of Transformer-based decoders;</li>
|
143 |
<li>Memory requirements during training are <b>heavy with large datasets</b>
|
144 |
(more than 32 GB RAM with more than 100 million words);</li>
|
145 |
<li>Memory-based LLMs are not efficient at generation time when running relatively
|