antalvdb commited on
Commit
e14d4fe
·
verified ·
1 Parent(s): a0f85c8

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +8 -6
index.html CHANGED
@@ -109,16 +109,16 @@
109
  core. Everyone who took "Machine Learning 101" knows that the <i>k</i>-nearest
110
  neighbor classifier is among the simplest yet most robust ML classifiers out
111
  there, perhaps only beaten by the Naive Bayes classifier. So what happens if
112
- you train a <i>k</i>-NN classifier to predict words? ...
113
  </p>
114
  <p>
115
- A memory-based language model, in this case running on the
116
  <a href="https://github.com/LanguageMachines/timbl">TiMBL</a> classifier,
117
- is in the most basic sense a <i>k</i>-nearest neighbor classifier. However, on
118
- tasks like next-word prediction, <i>k</i>-NN becomes inhibitively slow. Fortunately,
 
119
  <a href="https://github.com/LanguageMachines/timbl">TiMBL</a>
120
- offers a number of fast approximations of <i>k</i>-NN classification, all
121
- partly based on decision-tree classification and many orders of magnitude faster.
122
  </p>
123
  <p>
124
  Compared to Transformer-based LLMs, on the plus side memory-based LLMs are
@@ -138,6 +138,8 @@
138
  but we have not trained
139
  beyond data set sizes with orders of magnitudes above 100 million words.
140
  Watch this space!</li>
 
 
141
  <li>Memory requirements during training are <b>heavy with large datasets</b>
142
  (more than 32 GB RAM with more than 100 million words);</li>
143
  <li>Memory-based LLMs are not efficient at generation time when running relatively
 
109
  core. Everyone who took "Machine Learning 101" knows that the <i>k</i>-nearest
110
  neighbor classifier is among the simplest yet most robust ML classifiers out
111
  there, perhaps only beaten by the Naive Bayes classifier. So what happens if
112
+ you train a <i>k</i>-NN classifier to predict words?
113
  </p>
114
  <p>
115
+ WOPR's engine is the
116
  <a href="https://github.com/LanguageMachines/timbl">TiMBL</a> classifier,
117
+ which implements a number of fast approximations of <i>k</i>-NN classification,
118
+ all partly based on decision-tree classification. On
119
+ tasks like next-word prediction, <i>k</i>-NN is inhibitively slow, but the
120
  <a href="https://github.com/LanguageMachines/timbl">TiMBL</a>
121
+ approximations can classify faster at many orders of magnitude.
 
122
  </p>
123
  <p>
124
  Compared to Transformer-based LLMs, on the plus side memory-based LLMs are
 
138
  but we have not trained
139
  beyond data set sizes with orders of magnitudes above 100 million words.
140
  Watch this space!</li>
141
+ <li>They <b>do not have a delicate attention mechanism</b>b>, arguably the killer feature
142
+ of Transformer-based decoders;</li>
143
  <li>Memory requirements during training are <b>heavy with large datasets</b>
144
  (more than 32 GB RAM with more than 100 million words);</li>
145
  <li>Memory-based LLMs are not efficient at generation time when running relatively