|
--- |
|
license: cc-by-nd-4.0 |
|
--- |
|
## Czech Metrum Validator. |
|
Validator for metrum. Trained on Czech poetry from github project by |
|
Institute of Czech Literature, Czech Academy of Sciences. |
|
|
|
https://github.com/versotym/corpusCzechVerse |
|
|
|
## Usage |
|
|
|
### Loading model |
|
Download validator.py with interface |
|
Download model and load it by pytorch |
|
|
|
```python |
|
import torch |
|
model: ValidatorInterface = (torch.load(args.metre_model_path_full, map_location=torch.device('cpu'))) |
|
``` |
|
|
|
Load base robeczech tokenizer and try it out |
|
|
|
```python |
|
tokenizer = = AutoTokenizer.from_pretrained('roberta-base') |
|
model.validate(input_ids=datum["input_ids"], metre=datum["metre"])['acc'] |
|
``` |
|
|
|
### Train Model |
|
|
|
```python |
|
meter_model = MeterValidator(pretrained_model=args.pretrained_model) |
|
tokenizer = AutoTokenizer.from_pretrained(args.tokenizer) |
|
|
|
training_args = TrainingArguments( |
|
save_strategy = "no", |
|
logging_steps = 500, |
|
warmup_steps = args.worm_up, |
|
weight_decay = 0.0, |
|
num_train_epochs = args.epochs, |
|
learning_rate = args.learning_rate, |
|
fp16 = True if torch.cuda.is_available() else False, |
|
ddp_backend = "nccl", |
|
lr_scheduler_type="cosine", |
|
logging_dir = './logs', |
|
output_dir = './results', |
|
per_device_train_batch_size = args.batch_size) |
|
|
|
Trainer(model = rhyme_model, |
|
args = training_args, |
|
train_dataset= train_data.pytorch_dataset_body, |
|
data_collator=collate).train() |
|
|
|
``` |
|
|
|
|