---
license: mit
library_name: transformers
tags:
- camera angle
- camera feature
- cinescale
- film analysis
- movie style
metrics:
- accuracy
- f1
pipeline_tag: image-classification
---

# Convnextv2 finetuned for angle classification

Convnextv2 base-size model finetuned for the classification of camera angles. [Cinescale](https://cinescale.github.io/camera_al/#dataset) dataset is used to finetune the model for 30 epochs.

Classifies an image into five classes: *dutch, high, low, neutral, overhead*


## Evaluation

On the test set (test.csv), the model has an accuracy of 94.85% and macro-f1 of 92.52%

## How to use

```python
from transformers import AutoModelForImageClassification
import torch
from torchvision.transforms import v2
from torchvision.io import read_image, ImageReadMode


model = AutoModelForImageClassification.from_pretrained("gullalc/convnextv2-base-22k-384-cinescale-angle")
im_size = 384

## https://www.pexels.com/photo/man-in-black-dress-walking-in-between-brown-wooden-pews-9614069/
image = read_image("demo/angle_demo.jpg", mode=ImageReadMode.RGB)

transform = v2.Compose([v2.Resize(im_size, antialias=True), 
                            v2.CenterCrop((im_size,im_size)),
                            v2.ToDtype(torch.float32, scale=True),
                            v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])

inputs = transform(image).unsqueeze(0)

with torch.no_grad():
    outputs = model(pixel_values=inputs)
    

predicted_label = model.config.id2label[torch.argmax(outputs.logits).item()]
print(predicted_label)
# --> high
```