Update README.md
Browse files
README.md
CHANGED
@@ -37,7 +37,7 @@ Refer to the base model for info on the patching process.
|
|
37 |
|
38 |
Besides language modeling, another aim of this experiment was to test the accelerate library by offloading certain workloads to CPU as well as finding the optimal training iterations.
|
39 |
|
40 |
-
The perplexity of this model is
|
41 |
It took around the same time duration to train this model but I only used 1 GPU here.
|
42 |
|
43 |
|
@@ -47,7 +47,7 @@ Please refer to the [script](https://github.com/huggingface/transformers/tree/ma
|
|
47 |
provided by Huggingface.
|
48 |
|
49 |
|
50 |
-
The model was trained for
|
51 |
|
52 |
|
53 |
|
|
|
37 |
|
38 |
Besides language modeling, another aim of this experiment was to test the accelerate library by offloading certain workloads to CPU as well as finding the optimal training iterations.
|
39 |
|
40 |
+
The perplexity of this model is 16.12 after 400,000 steps. Comparing to the previous [attempt](https://huggingface.co/jed351/gpt2_tiny_zh-hk-shikoto) 27.02 after 400,000 steps.
|
41 |
It took around the same time duration to train this model but I only used 1 GPU here.
|
42 |
|
43 |
|
|
|
47 |
provided by Huggingface.
|
48 |
|
49 |
|
50 |
+
The model was trained for 400,000 steps on 1 NVIDIA Quadro RTX6000 for around 30 hours at the Research Computing Services of Imperial College London.
|
51 |
|
52 |
|
53 |
|