metadata

license: apache-2.0
datasets:
  - Lowerated/lm6-movies-reviews-aspects
language:
  - en
metrics:
  - accuracy
pipeline_tag: text-classification
tags:
  - movies
  - reviews
  - lm6
  - ai
  - rating

Lowerated/lm6-movie-aspect-extraction-bert

Model Details

Model Name: Lowerated/lm6-movie-aspect-extraction-bert Model Type: Aspects Extraction from Text Language: English Framework: PyTorch
License: Apache 2.0

Model Description

Lowerated/lm6-movie-aspect-extraction-bert is a bert-base-uncased model fine-tuned for aspects extraction from IMDb movie reviews. The model is designed to detect aspects of filmmaking: Cinematography, Direction, Story, Characters, Production Design, Unique Concept, and Emotions.

Dataset

Dataset Name: Lowerated/imdb-reviews-rated
Dataset URL: IMDb Reviews Rated
Dataset Description: The dataset contains IMDb movie reviews with sentiment scores for seven aspects of filmmaking.

Usage for Rating a Movie

Install lowerated:

pip install lowerated

Now, you can use it like this:

from lowerated.rate.entity import Entity

# Example usage
if __name__ == "__main__":
    some_movie_reviews = [
        "bad movie!", "worse than other movies.", "bad.",
        "best movie", "very good movie", "the cinematography was insane",
        "story was so beautiful", "the emotional element was missing but cinematography was great",
        "didn't feel a thing watching this",
        "oooof, eliot and jessie were so good. the casting was the best",
        "yo who designed the set, that was really good",
        "such stories are rare to find"
    ]

    # Create entity object (loads the whole pipeline)
    # list of aspects. ('Cinematography', 'Direction', 'Story', 'Characters', 'Production Design', 'Unique Concept', 'Emotions')
    entity = Entity(name="Movie")

    rating = entity.rate(reviews=some_movie_reviews)

    print("LM6: ", rating["LM6"])

Usage of Model

import torch
from transformers import DebertaV2ForSequenceClassification, DebertaV2Tokenizer

# Load the fine-tuned model and tokenizer
model = DebertaV2ForSequenceClassification.from_pretrained('Lowerated/deberta-v3-lm6')
tokenizer = DebertaV2Tokenizer.from_pretrained('Lowerated/deberta-v3-lm6')

# Ensure the model is in evaluation mode
model.eval()

# Define the label mapping
label_columns = ['Cinematography', 'Direction', 'Story', 'Characters', 'Production Design', 'Unique Concept', 'Emotions']

# Function for predicting sentiment scores
def predict_sentiment(review):
    # Tokenize the input review
    inputs = tokenizer(review, return_tensors='pt', truncation=True, padding=True)
    
    # Disable gradient calculations for inference
    with torch.no_grad():
        # Get model outputs
        outputs = model(**inputs)
    
    # Get the prediction logits
    predictions = outputs.logits.squeeze().detach().numpy()
    return predictions

# Function to print predictions with labels
def print_predictions(review, predictions):
    print(f"Review: {review}")
    for label, score in zip(label_columns, predictions):
        print(f"{label}: {score:.2f}")


review = "The cinematography was stunning, but the story was weak."
predictions = predict_sentiment(review)
print_predictions(review, predictions)

Performance

{
'eval_loss': 0.04379426687955856,
 'eval_model_preparation_time': 0.0016,
 'eval_accuracy': 0.9845067801235796,
 'eval_f1': 0.7419,
 'eval_precision': 0.6831499999999999,
 'eval_recall': 0.86185,
 'eval_runtime': 2014.0076,
 'eval_samples_per_second': 29.451,
 'eval_steps_per_second': 3.682
}

Example:

original review:  the story was amazing but the cinematography wasn't it

Cinematography ["the cinematography wasn't"]
Direction []
Story ['the story was amazing']
Characters []
Production Design []
Unique Concept []
Emotions []

Intended Use

This model is intended for rating of movies across seven aspects of filmmaking. It can be used to provide a more nuanced understanding of viewer opinions and improve movie rating systems.

Limitations

While the model performs well on the evaluation dataset, its performance may vary on different datasets. Continuous monitoring and retraining with diverse data are recommended to maintain and improve its accuracy.

Future Work

Future improvements could focus on exploring alternative methods for handling neutral values, investigating advanced techniques for addressing missing ratings, enhancing sentiment analysis methods, and expanding the range of aspects analyzed.

Citation

If you use this model in your research, please cite it as follows:

@model{lm6-movie-aspect-extraction-bert,
  author = {LOWERATED},
  title = {lm6-movie-aspect-extraction-bert},
  year = {2024},
  url = {https://huggingface.coLowerated/lm6-movie-aspect-extraction-bert},
}