Alfasign D0k-tor commited on
Commit
bf67bd8
·
0 Parent(s):

Duplicate from nttdataspain/Image-To-Text-Lora-ViT

Browse files

Co-authored-by: Daniel Puente Viejo <[email protected]>

.gitattributes ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tflite filter=lfs diff=lfs merge=lfs -text
29
+ *.tgz filter=lfs diff=lfs merge=lfs -text
30
+ *.wasm filter=lfs diff=lfs merge=lfs -text
31
+ *.xz filter=lfs diff=lfs merge=lfs -text
32
+ *.zip filter=lfs diff=lfs merge=lfs -text
33
+ *.zst filter=lfs diff=lfs merge=lfs -text
34
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
35
+ examples/image.jpg filter=lfs diff=lfs merge=lfs -text
36
+ examples/example3.jpg filter=lfs diff=lfs merge=lfs -text
37
+ examples/example2.jpg filter=lfs diff=lfs merge=lfs -text
38
+ examples/example1.jpg filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Image To Text Lora ViT
3
+ tags:
4
+ - image to text
5
+ - language models
6
+ - LLMs
7
+ emoji: 📷
8
+ colorFrom: white
9
+ colorTo: blue
10
+ sdk: gradio
11
+ sdk_version: 3.14.0
12
+ app_file: app.py
13
+ pinned: true
14
+ license: mit
15
+ duplicated_from: nttdataspain/Image-To-Text-Lora-ViT
16
+ ---
17
+
18
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
app.py ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import re
3
+ import gradio as gr
4
+ from PIL import Image
5
+
6
+ from transformers import AutoTokenizer, ViTFeatureExtractor, VisionEncoderDecoderModel
7
+ import os
8
+ import tensorflow as tf
9
+ os.environ['TF_ENABLE_ONEDNN_OPTS'] = '0'
10
+
11
+ device='cpu'
12
+
13
+ model_id = "nttdataspain/vit-gpt2-coco-lora"
14
+ model = VisionEncoderDecoderModel.from_pretrained(model_id)
15
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
16
+ feature_extractor = ViTFeatureExtractor.from_pretrained(model_id)
17
+
18
+ # Predict function
19
+ def predict(image):
20
+ img = image.convert('RGB')
21
+ model.eval()
22
+ pixel_values = feature_extractor(images=[img], return_tensors="pt").pixel_values
23
+ with torch.no_grad():
24
+ output_ids = model.generate(pixel_values, max_length=16, num_beams=4, return_dict_in_generate=True).sequences
25
+
26
+ preds = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
27
+ preds = [pred.strip() for pred in preds]
28
+ return preds[0]
29
+
30
+ input = gr.inputs.Image(label="Upload any Image", type = 'pil', optional=True)
31
+ output = gr.outputs.Textbox(type="text",label="Captions")
32
+ examples_folder = os.path.join(os.path.dirname(__file__), "examples")
33
+ examples = [os.path.join(examples_folder, file) for file in os.listdir(examples_folder)]
34
+
35
+ with gr.Blocks() as demo:
36
+
37
+ gr.HTML(
38
+ """
39
+ <div style="text-align: center; max-width: 1200px; margin: 20px auto;">
40
+ <h1 style="font-weight: 900; font-size: 3rem; margin: 0rem">
41
+ 📸 ViT Image-to-Text with LORA 📝
42
+ </h1>
43
+ <h2 style="text-align: left; font-weight: 450; font-size: 1rem; margin-top: 2rem; margin-bottom: 1.5rem">
44
+ In the field of large language models, the challenge of fine-tuning has long perplexed researchers. Microsoft, however, has unveiled an innovative solution called <b>Low-Rank Adaptation (LoRA)</b>. With the emergence of behemoth models like GPT-3 boasting billions of parameters, the cost of fine-tuning them for specific tasks or domains has become exorbitant.
45
+ <br>
46
+ <br>
47
+ LoRA offers a groundbreaking approach by freezing the weights of pre-trained models and introducing trainable layers known as <b>rank-decomposition matrices in each transformer block</b>. This ingenious technique significantly reduces the number of trainable parameters and minimizes GPU memory requirements, as gradients no longer need to be computed for the majority of model weights.
48
+ <br>
49
+ <br>
50
+ You can find more info here: <u><a href="https://www.linkedin.com/pulse/fine-tuning-image-to-text-algorithms-with-lora-daniel-puente-viejo" target="_blank">Linkedin article</a></u>
51
+ </h2>
52
+ </div>
53
+ """)
54
+
55
+ with gr.Row():
56
+ with gr.Column(scale=1):
57
+ img = gr.inputs.Image(label="Upload any Image", type = 'pil', optional=True)
58
+ button = gr.Button(value="Describe")
59
+ with gr.Column(scale=1):
60
+ out = gr.outputs.Textbox(type="text",label="Captions")
61
+
62
+ button.click(predict, inputs=[img], outputs=[out])
63
+
64
+ gr.Examples(
65
+ examples=examples,
66
+ inputs=img,
67
+ outputs=out,
68
+ fn=predict,
69
+ cache_examples=True,
70
+ )
71
+ demo.launch(debug=True)
examples/example1.jpg ADDED

Git LFS Details

  • SHA256: 05767682eee8cd0259fea4c1430fc1ccff638174ef53ca4a923f9722c9c20171
  • Pointer size: 130 Bytes
  • Size of remote file: 53.5 kB
examples/example2.jpg ADDED

Git LFS Details

  • SHA256: dbe11e20217b3cc033df96493c82324528106af15a4bcefb25c20c0fc7f7ef75
  • Pointer size: 130 Bytes
  • Size of remote file: 62.3 kB
examples/example3.jpg ADDED

Git LFS Details

  • SHA256: f2f0889fc0e6e75a4f9379096a08ceb9f22208f33d0f367e3fc6304f564e2489
  • Pointer size: 132 Bytes
  • Size of remote file: 2.32 MB
requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ streamlit
2
+ transformers
3
+ pillow
4
+ requests
5
+ torch
6
+ tensorflow