--- license: mit library_name: transformers tags: - camera angle - camera feature - cinescale - film analysis - movie style metrics: - accuracy - f1 pipeline_tag: image-classification --- # Convnextv2 finetuned for angle classification Convnextv2 base-size model finetuned for the classification of camera angles. [Cinescale](https://cinescale.github.io/camera_al/#dataset) dataset is used to finetune the model for 30 epochs. Classifies an image into five classes: *dutch, high, low, neutral, overhead* ## Evaluation On the test set (test.csv), the model has an accuracy of 94.85% and macro-f1 of 92.52% ## How to use ```python from transformers import AutoModelForImageClassification import torch from torchvision.transforms import v2 from torchvision.io import read_image, ImageReadMode model = AutoModelForImageClassification.from_pretrained("gullalc/convnextv2-base-22k-384-cinescale-angle") im_size = 384 ## https://www.pexels.com/photo/man-in-black-dress-walking-in-between-brown-wooden-pews-9614069/ image = read_image("demo/angle_demo.jpg", mode=ImageReadMode.RGB) transform = v2.Compose([v2.Resize(im_size, antialias=True), v2.CenterCrop((im_size,im_size)), v2.ToDtype(torch.float32, scale=True), v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]) inputs = transform(image).unsqueeze(0) with torch.no_grad(): outputs = model(pixel_values=inputs) predicted_label = model.config.id2label[torch.argmax(outputs.logits).item()] print(predicted_label) # --> high ```