JonasGeiping
commited on
Commit
·
285bab3
1
Parent(s):
ab7a979
Update README.md
Browse files
README.md
CHANGED
@@ -17,7 +17,7 @@ tags:
|
|
17 |
|
18 |
# crammed BERT (legacy/v1)
|
19 |
|
20 |
-
This is one of the final models described in the **FIRST VERSION OF** "Cramming: Training a Language Model on a Single GPU in One Day". This is an *English*-language model pretrained like BERT, but with less compute. This one was trained for 24 hours on a single A6000 GPU. To use this model, you need the code from the repo at https://github.com/JonasGeiping/cramming.
|
21 |
|
22 |
You can find the paper here (linked to the old version on arxiv): https://arxiv.org/abs/2212.14034/v1, and the abstract below:
|
23 |
|
|
|
17 |
|
18 |
# crammed BERT (legacy/v1)
|
19 |
|
20 |
+
This is one of the final models described in the **FIRST VERSION OF** "Cramming: Training a Language Model on a Single GPU in One Day". This is an *English*-language model pretrained like BERT, but with less compute. This one was trained for 24 hours on a single A6000 GPU. To use this model, you need the code from the repo at https://github.com/JonasGeiping/cramming tagged v1.13.
|
21 |
|
22 |
You can find the paper here (linked to the old version on arxiv): https://arxiv.org/abs/2212.14034/v1, and the abstract below:
|
23 |
|