README.md · ClassCat/roberta-small-greek at acaccd39b536a05f80204287591f4adbc5454c28

metadata

language: el
license: cc-by-sa-4.0
datasets:
  - cc100
  - oscar
  - wikipedia
widget:
  - text: Έχει πολύ καιρό που δεν έχουμε <mask>.
  - text: Ευχαριστώ για το <mask> σου.
  - text: Αυτό είναι <mask>.
  - text: Ανοιξα <mask>.
  - text: Ευχαριστώ για <mask>.
  - text: Έχει πολύ καιρό που δεν <mask>.

RoBERTa Greek small model (Uncased)

Prerequisites

transformers==4.19.2

Model architecture

This model uses approximately half the size of RoBERTa base model parameters.

Tokenizer

Using BPE tokenizer with vocabulary size 50,000.

Training Data

Subset of CC-100/el : Monolingual Datasets from Web Crawl Data
Subset of oscar
wiki40b/el (French Wikipedia)

Usage

from transformers import pipeline

unmasker = pipeline('fill-mask', model='ClassCat/roberta-small-greek')
unmasker("Έχει πολύ καιρό που δεν <mask>.")