Update README.md
Browse files
README.md
CHANGED
@@ -18,4 +18,47 @@ tags:
|
|
18 |
- gpt2
|
19 |
- from-scratch
|
20 |
- polish-gpt2
|
21 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
- gpt2
|
19 |
- from-scratch
|
20 |
- polish-gpt2
|
21 |
+
---
|
22 |
+
|
23 |
+
## Description
|
24 |
+
This is the polish gpt2 model in small architecture.
|
25 |
+
This model was released on 11.08.2023
|
26 |
+
|
27 |
+
|
28 |
+
## Datasets
|
29 |
+
Data which are used to train this model:
|
30 |
+
- clarin-knext/msmarco-pl
|
31 |
+
- clarin-knext/nq-pl
|
32 |
+
- clarin-knext/hotpotqa-pl
|
33 |
+
- clarin-knext/scidocs-pl
|
34 |
+
- clarin-knext/nfcorpus-pl
|
35 |
+
- clarin-knext/dbpedia-pl
|
36 |
+
- clarin-knext/trec-covid-pl
|
37 |
+
- clarin-knext/quora-pl
|
38 |
+
- clarin-knext/arguana-pl
|
39 |
+
- clarin-knext/fiqa-pl
|
40 |
+
- own corpora not published yet
|
41 |
+
|
42 |
+
It is about 10,5 GB of data.
|
43 |
+
|
44 |
+
|
45 |
+
## Metrics from W&B
|
46 |
+
|
47 |
+
- train/loss: 2.9569
|
48 |
+
- train/train_samples_per_second: 31.797
|
49 |
+
- train/epoch: 20
|
50 |
+
- train/train_steps_per_second: 3.18
|
51 |
+
- train/total_flos: 16645483478384640000
|
52 |
+
- train/train_loss: 3.106043342053213
|
53 |
+
- train/learning_rate: 2.2070550413783577e-8
|
54 |
+
- train/global_step: 3185240
|
55 |
+
- train/train_runtime:1001735.8967
|
56 |
+
- eval/samples_per_second: 57.896
|
57 |
+
- eval/runtime: 1447.4458
|
58 |
+
- eval/steps_per_second: 5.79
|
59 |
+
- eval/loss: 2.890829086303711
|
60 |
+
- eval/accuracy: 0.4637797431547294
|
61 |
+
|
62 |
+
|
63 |
+
## Changelog
|
64 |
+
- _Nothing_
|