polish-gpt2-small / README.md
pkedzia's picture
Update README.md
2cc7cf9
metadata
license: cc-by-4.0
datasets:
  - clarin-knext/msmarco-pl
  - clarin-knext/nq-pl
  - clarin-knext/hotpotqa-pl
  - clarin-knext/scidocs-pl
  - clarin-knext/nfcorpus-pl
  - clarin-knext/dbpedia-pl
  - clarin-knext/trec-covid-pl
  - clarin-knext/quora-pl
  - clarin-knext/arguana-pl
  - clarin-knext/fiqa-pl
language:
  - pl
library_name: transformers
tags:
  - gpt2
  - from-scratch
  - polish-gpt2

Description

This is the polish gpt2 model in small architecture.

This model was released on 11.08.2023, actually is deprecated.

New version (radlab/polish-gpt2-small-v2) of this model is available there https://huggingface.co./radlab/polish-gpt2-small-v2

Datasets

Data which are used to train this model:

  • clarin-knext/msmarco-pl
  • clarin-knext/nq-pl
  • clarin-knext/hotpotqa-pl
  • clarin-knext/scidocs-pl
  • clarin-knext/nfcorpus-pl
  • clarin-knext/dbpedia-pl
  • clarin-knext/trec-covid-pl
  • clarin-knext/quora-pl
  • clarin-knext/arguana-pl
  • clarin-knext/fiqa-pl
  • own corpora not published yet

It is about 10,5 GB of data.

Metrics from W&B

  • train/loss: 2.9569
  • train/train_samples_per_second: 31.797
  • train/epoch: 20
  • train/train_steps_per_second: 3.18
  • train/total_flos: 16645483478384640000
  • train/train_loss: 3.106043342053213
  • train/learning_rate: 2.2070550413783577e-8
  • train/global_step: 3185240
  • train/train_runtime:1001735.8967
  • eval/samples_per_second: 57.896
  • eval/runtime: 1447.4458
  • eval/steps_per_second: 5.79
  • eval/loss: 2.890829086303711
  • eval/accuracy: 0.4637797431547294

Changelog

  • 11.08.2023 publishig the first release of the model.