YAML Metadata Error: "language[0]" must only contain lowercase characters
YAML Metadata Error: "language[0]" with value "English" is not valid. It must be an ISO 639-1, 639-2 or 639-3 code (two/three letters), or a special value like "code", "multilingual". If you want to use BCP-47 identifiers, you can specify them in language_bcp47.

roberta-base + DAPT + Domain-Specific QA

Objective: This is Roberta Base with Domain Adaptive Pretraining on Movie Corpora --> Then a changed head to do the SQuAD Task. This makes a QA model capable of answering questions in the movie domain.
https://huggingface.co/thatdramebaazguy/movie-roberta-base was used as the MovieRoberta.

model_name = "thatdramebaazguy/movie-roberta-squad"
pipeline(model=model_name, tokenizer=model_name, revision="v1.0", task="question-answering")

Overview

Language model: roberta-base
Language: English
Downstream-task: QA
Training data: imdb, polarity movie data, cornell_movie_dialogue, 25mlens movie names, SQuADv1
Eval data: MoviesQA (From https://github.com/ibm-aur-nlp/domain-specific-QA)
Infrastructure: 1x Tesla v100
Code: See example

Hyperparameters

Num examples = 88567  
Num Epochs = 10
Instantaneous batch size per device = 32  
Total train batch size (w. parallel, distributed & accumulation) = 32  

Performance

Eval on MoviesQA

  • eval_samples = 5032
  • exact_match = 51.64944
  • f1 = 65.53983

Eval on SQuADv1

  • exact_match = 81.23936
  • f1 = 89.27827

Github Repo:


Downloads last month
11
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train thatdramebaazguy/movie-roberta-squad