--- license: mit library_name: transformers tags: - camera level - camera feature - movie analysis metrics: - accuracy - f1 pipeline_tag: image-classification --- # Convnextv2 finetuned for camera level classification Convnextv2 base-size model finetuned for the classification of camera angles. [Cinescale](https://cinescale.github.io/camera_al/#dataset) dataset is used to finetune the model for 20 epochs. Classifies an image into six classes: *aerial, eye, ground, hip, knee, shoulder* ## Evaluation On the test set (test.csv), the model has an accuracy of 89.82% and macro-f1 of 82.31% ## How to use ```python from transformers import AutoModelForImageClassification import torch from torchvision.transforms import v2 from torchvision.io import read_image, ImageReadMode model = AutoModelForImageClassification.from_pretrained("gullalc/convnextv2-base-22k-384-cinescale-level") im_size = 384 # https://www.pexels.com/photo/aerial-view-of-city-buildings-8783146/ image = read_image("demo/level_demo.jpg", mode=ImageReadMode.RGB) transform = v2.Compose([v2.Resize((im_size,im_size), antialias=True), v2.ToDtype(torch.float32, scale=True), v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]) inputs = transform(image).unsqueeze(0) with torch.no_grad(): outputs = model(pixel_values=inputs) predicted_label = model.config.id2label[torch.argmax(outputs.logits).item()] print(predicted_label) # --> aerial ```