|
--- |
|
license: cc-by-4.0 |
|
datasets: |
|
- clarin-knext/msmarco-pl |
|
- clarin-knext/nq-pl |
|
- clarin-knext/hotpotqa-pl |
|
- clarin-knext/scidocs-pl |
|
- clarin-knext/nfcorpus-pl |
|
- clarin-knext/dbpedia-pl |
|
- clarin-knext/trec-covid-pl |
|
- clarin-knext/quora-pl |
|
- clarin-knext/arguana-pl |
|
- clarin-knext/fiqa-pl |
|
language: |
|
- pl |
|
library_name: transformers |
|
tags: |
|
- gpt2 |
|
- from-scratch |
|
- polish-gpt2 |
|
--- |
|
|
|
## Description |
|
This is the polish gpt2 model in small architecture. |
|
|
|
This model was released on 11.08.2023, actually is **deprecated**. |
|
|
|
New version (`radlab/polish-gpt2-small-v2`) of this model is available there https://huggingface.co/radlab/polish-gpt2-small-v2 |
|
|
|
|
|
## Datasets |
|
Data which are used to train this model: |
|
- clarin-knext/msmarco-pl |
|
- clarin-knext/nq-pl |
|
- clarin-knext/hotpotqa-pl |
|
- clarin-knext/scidocs-pl |
|
- clarin-knext/nfcorpus-pl |
|
- clarin-knext/dbpedia-pl |
|
- clarin-knext/trec-covid-pl |
|
- clarin-knext/quora-pl |
|
- clarin-knext/arguana-pl |
|
- clarin-knext/fiqa-pl |
|
- own corpora not published yet |
|
|
|
It is about 10,5 GB of data. |
|
|
|
|
|
## Metrics from W&B |
|
|
|
- train/loss: 2.9569 |
|
- train/train_samples_per_second: 31.797 |
|
- train/epoch: 20 |
|
- train/train_steps_per_second: 3.18 |
|
- train/total_flos: 16645483478384640000 |
|
- train/train_loss: 3.106043342053213 |
|
- train/learning_rate: 2.2070550413783577e-8 |
|
- train/global_step: 3185240 |
|
- train/train_runtime:1001735.8967 |
|
- eval/samples_per_second: 57.896 |
|
- eval/runtime: 1447.4458 |
|
- eval/steps_per_second: 5.79 |
|
- eval/loss: 2.890829086303711 |
|
- eval/accuracy: 0.4637797431547294 |
|
|
|
|
|
## Changelog |
|
- _11.08.2023_ publishig the first release of the model. |