ngocson2002 commited on
Commit
84b50a5
·
verified ·
1 Parent(s): eef3f34

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -1
README.md CHANGED
@@ -3,4 +3,44 @@ language:
3
  - vi
4
  metrics:
5
  - accuracy
6
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  - vi
4
  metrics:
5
  - accuracy
6
+ ---
7
+
8
+ # Advancing Vietnamese Visual Question Answering with Transformer and Convolutional Integration
9
+ ✨  [Ngoc-Son Nguyen](mailto:[email protected]), [Van-Son Nguyen](mailto:[email protected]), and [Tung Le](mailto:[email protected])\
10
+ 🏠  University of Science, VNU-HCM
11
+
12
+ ## Installation
13
+ ```bash
14
+ git clone https://github.com/ngocson1042002/ViVQA.git
15
+ cd ViVQA/beit3/HCMUS
16
+ pip install salesforce-lavis
17
+ pip install torchscale timm underthesea efficientnet_pytorch
18
+ pip install --upgrade transformers
19
+ ```
20
+ ## Sample inference code
21
+ ```python
22
+ from transformers import AutoModel
23
+ from transformers import AutoTokenizer
24
+ from processor import Processor
25
+ from PIL import Image
26
+ import torch
27
+
28
+ device = "cuda" if torch.cuda.is_available() else "cpu"
29
+
30
+ model = AutoModel.from_pretrained("ngocson2002/vivqa-model", trust_remote_code=True).to(device)
31
+ processor = Processor()
32
+
33
+ image = Image.open('./ViVQA/demo/1.jpg').convert('RGB')
34
+ question = "màu áo của con chó là gì?"
35
+
36
+ inputs = processor(image, question, return_tensors='pt')
37
+ inputs["image"] = inputs["image"].unsqueeze(0)
38
+
39
+ model.eval()
40
+ with torch.no_grad():
41
+ output = model(**inputs)
42
+ logits = output.logits
43
+ idx = logits.argmax(-1).item()
44
+
45
+ print("Predicted answer:", model.config.id2label[idx]) # prints: màu đỏ
46
+ ```