Description

This is the polish gpt2 model in small architecture.

This model was released on 11.08.2023, actually is deprecated.

New version (radlab/polish-gpt2-small-v2) of this model is available there https://huggingface.co/radlab/polish-gpt2-small-v2

Datasets

Data which are used to train this model:

  • clarin-knext/msmarco-pl
  • clarin-knext/nq-pl
  • clarin-knext/hotpotqa-pl
  • clarin-knext/scidocs-pl
  • clarin-knext/nfcorpus-pl
  • clarin-knext/dbpedia-pl
  • clarin-knext/trec-covid-pl
  • clarin-knext/quora-pl
  • clarin-knext/arguana-pl
  • clarin-knext/fiqa-pl
  • own corpora not published yet

It is about 10,5 GB of data.

Metrics from W&B

  • train/loss: 2.9569
  • train/train_samples_per_second: 31.797
  • train/epoch: 20
  • train/train_steps_per_second: 3.18
  • train/total_flos: 16645483478384640000
  • train/train_loss: 3.106043342053213
  • train/learning_rate: 2.2070550413783577e-8
  • train/global_step: 3185240
  • train/train_runtime:1001735.8967
  • eval/samples_per_second: 57.896
  • eval/runtime: 1447.4458
  • eval/steps_per_second: 5.79
  • eval/loss: 2.890829086303711
  • eval/accuracy: 0.4637797431547294

Changelog

  • 11.08.2023 publishig the first release of the model.
Downloads last month
27
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train radlab/polish-gpt2-small

Collection including radlab/polish-gpt2-small