YAML Metadata Error: "language[0]" must only contain lowercase characters
YAML Metadata Error: "language[0]" with value "English" is not valid. It must be an ISO 639-1, 639-2 or 639-3 code (two/three letters), or a special value like "code", "multilingual". If you want to use BCP-47 identifiers, you can specify them in language_bcp47.

roberta-base for MLM

Objective: To make a Roberta Base for the Movie Domain by using various Movie Datasets as simple text for Masked Language Modeling. This is the Movie Roberta to be used in Movie Domain applications.

model_name = "thatdramebaazguy/movie-roberta-base"
pipeline(model=model_name, tokenizer=model_name, revision="v1.0", task="Fill-Mask")

Overview

Language model: roberta-base
Language: English
Downstream-task: Fill-Mask
Training data: imdb, polarity movie data, cornell_movie_dialogue, 25mlens movie names
Eval data: imdb, polarity movie data, cornell_movie_dialogue, 25mlens movie names
Infrastructure: 4x Tesla v100
Code: See example

Hyperparameters

Num examples = 4767233
Num Epochs = 2
Instantaneous batch size per device = 20
Total train batch size (w. parallel, distributed & accumulation) = 80
Gradient Accumulation steps = 1
Total optimization steps = 119182
eval_loss  = 1.6153
eval_samples = 20573
perplexity = 5.0296
learning_rate=5e-05
n_gpu = 4

Performance

perplexity = 5.0296

Some of my work:


Downloads last month
13
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train thatdramebaazguy/movie-roberta-base