Amitz244 commited on
Commit
e69c065
·
verified ·
1 Parent(s): 694091d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ base_model:
5
+ - openai/clip-vit-large-patch14
6
+ tags:
7
+ - IQA
8
+ - computer_vision
9
+ - perceptual_tasks
10
+ - CLIP
11
+ - KonIQ-10k
12
+ ---
13
+ **PerceptCLIP-IQA** is a model designed to predict **image quality assessment (IQA) score**. This is the official model from the paper:
14
+ 📄 **["Don't Judge Before You CLIP: A Unified Approach for Perceptual Tasks"](https://arxiv.org/abs/2503.13260)**.
15
+ We apply **LoRA adaptation** on the **CLIP visual encoder** and add an **MLP head** for IQA score prediction. Our model achieves **state-of-the-art results**.
16
+
17
+ ## Training Details
18
+
19
+ - *Dataset*: [KonIQ-10k](https://arxiv.org/pdf/1910.06180)
20
+ - *Architecture*: CLIP Vision Encoder (ViT-L/14) with *LoRA adaptation*
21
+ - *Loss Function*: Pearson correlation induced loss \( L_{PLCC} = \frac{1}{2} (1 - PLCC(\tilde{y}, y)) \)
22
+ - *Optimizer*: AdamW
23
+ - *Learning Rate*: 5e-05
24
+ - *Batch Size*: 32
25
+
26
+ ## Installation & Requirements
27
+
28
+ You can set up the environment using environment.yml or manually install dependencies:
29
+ - python=3.9.15
30
+ - cudatoolkit=11.7
31
+ - torchvision=0.14.0
32
+ - transformers=4.45.2
33
+ - peft=0.14.0
34
+
35
+ ## Usage
36
+
37
+ To use the model for inference:
38
+
39
+ ```python
40
+ from torchvision import transforms
41
+ import torch
42
+ from PIL import Image
43
+ from huggingface_hub import hf_hub_download
44
+ import importlib.util
45
+
46
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
47
+
48
+ # Load the model class definition dynamically
49
+ class_path = hf_hub_download(repo_id="PerceptCLIP/PerceptCLIP_IQA", filename="modeling.py")
50
+ spec = importlib.util.spec_from_file_location("modeling", class_path)
51
+ modeling = importlib.util.module_from_spec(spec)
52
+ spec.loader.exec_module(modeling)
53
+
54
+ # initialize a model
55
+ ModelClass = modeling.clip_lora_model
56
+ model = ModelClass().to(device)
57
+
58
+ # Load pretrained model
59
+ model_path = hf_hub_download(repo_id="PerceptCLIP/PerceptCLIP_IQA", filename="perceptCLIP_IQA.pth")
60
+ model.load_state_dict(torch.load(model_path, map_location=device))
61
+ model.eval()
62
+ # Load an image
63
+ image = Image.open("image_path.jpg").convert("RGB")
64
+
65
+ # Preprocess and predict
66
+ def IQA_preprocess():
67
+ transform = transforms.Compose([
68
+ transforms.Resize(224),
69
+ transforms.CenterCrop(size=(224, 224)),
70
+ transforms.ToTensor(),
71
+ transforms.Normalize(mean=(0.48145466, 0.4578275, 0.40821073),
72
+ std=(0.26862954, 0.26130258, 0.27577711))
73
+ ])
74
+ return transform
75
+
76
+ image = IQA_preprocess()(image).unsqueeze(0).to(device)
77
+
78
+ with torch.no_grad():
79
+ iqa_score = model(image).item()
80
+
81
+ print(f"Predicted quality Score: {iqa_score:.4f}")