|
--- |
|
datasets: |
|
- wikitext-2-v1 |
|
- wikitext |
|
language: |
|
- en |
|
metrics: |
|
- perplexity |
|
- cross_entropy |
|
--- |
|
|
|
**metrics on 1024 context**: |
|
- valid_perplexity = 14.79 |
|
- valid_cross_entropy = 2.69 |
|
- train_perplexity = 13.77 |
|
- train_cross_entropy = 2.62 |
|
|
|
**metrics on 252 context**: |
|
- valid_perplexity = 17.35 |
|
|
|
**metrics on 378 context**: |
|
- valid_perplexity = 16.4 |
|
|
|
**metrics on 504 context**: |
|
- valid_perplexity = 15.86 |
|
|
|
**Dependence of the cross entropy loss on the length of the context for prediction** |
|
|
|
- x-axis*128 = context length |
|
- y-axis = cross entropy |
|
|
|
 |