--- license: mit library_name: transformers tags: - camera angle - camera feature - cinescale - film analysis - movie style - movie analysis metrics: - accuracy - f1 pipeline_tag: image-classification --- # Convnextv2 finetuned for angle classification Convnextv2 base-size model finetuned for the classification of camera angles. [Cinescale](https://cinescale.github.io/camera_al/#dataset) dataset is used to finetune the model for 30 epochs. Classifies an image into five classes: *dutch, high, low, neutral, overhead* ## Evaluation On the test set (test.csv), the model has an accuracy of 93.32% and macro-f1 of 90.01% ## How to use ```python from transformers import AutoModelForImageClassification import torch from torchvision.transforms import v2 from torchvision.io import read_image, ImageReadMode model = AutoModelForImageClassification.from_pretrained("gullalc/convnextv2-base-22k-224-cinescale-angle") im_size = 224 ## https://www.pexels.com/photo/man-in-black-dress-walking-in-between-brown-wooden-pews-9614069/ image = read_image("demo/angle_demo.jpg", mode=ImageReadMode.RGB) transform = v2.Compose([v2.Resize(im_size, antialias=True), v2.CenterCrop((im_size,im_size)), v2.ToDtype(torch.float32, scale=True), v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]) inputs = transform(image).unsqueeze(0) with torch.no_grad(): outputs = model(pixel_values=inputs) predicted_label = model.config.id2label[torch.argmax(outputs.logits).item()] print(predicted_label) # --> high ``` ## Training Details ```python ## Training transforms randomorder = v2.RandomOrder([ v2.RandomHorizontalFlip(), v2.GaussianBlur(5), v2.RandomAdjustSharpness(2), v2.RandomGrayscale(p=0.2), v2.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1)]) train_transform = v2.Compose([v2.Resize(im_size, antialias=True), v2.RandomResizedCrop((im_size,im_size), antialias=True), randomorder, v2.ToDtype(torch.float32, scale=True), v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]) ## Training Arguments training_args = TrainingArguments( evaluation_strategy = "epoch", save_strategy = "epoch", learning_rate=5e-5, per_device_train_batch_size=128, gradient_accumulation_steps=4, per_device_eval_batch_size=128, num_train_epochs=30, warmup_ratio=0.1, logging_steps=10, load_best_model_at_end=True, metric_for_best_model="f1", dataloader_num_workers=32, torch_compile=True ) ```