|
--- |
|
datasets: |
|
- wikitext |
|
- wikitext-103-v1 |
|
language: |
|
- en |
|
metrics: |
|
- perplexity |
|
- cross_entropy |
|
--- |
|
|
|
**(!) _Don't forget to preprocess unknown_tokens and substitute them with <|endoftext|>. Otherwise the \<unk\> tokens in dataset will be split into the '<', 'unk' and '>' tokens_** |
|
|
|
|
|
- Full context (1024) perplexity on test set: **13.68** |
|
|
|
**Dependence of the cross entropy loss on the length of the context for prediction** |
|
|
|
- x-axis*128 = context length |
|
- y-axis = cross entropy |
|
|
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/63c1ac8cc58fcfeac186bda2/Dpc_d3buivfBd5_-A03Vb.png) |
|
|
|
|