Spaces:

nttdataspain
/

Image-To-Text-Lora-ViT

Runtime error

App Files Files Community

D0k-tor commited on Jun 15, 2023

Commit

2d2629a

1 Parent(s): 9a282a8

Update app.py

Browse files

Files changed (1) hide show

app.py +2 -6

app.py CHANGED Viewed

@@ -12,13 +12,10 @@ device='cpu'
 encoder_checkpoint = "nlpconnect/vit-gpt2-image-captioning"
 decoder_checkpoint = "nlpconnect/vit-gpt2-image-captioning"
 model_checkpoint = "nlpconnect/vit-gpt2-image-captioning"
-print("------------------------- 1 -------------------------\n")
 feature_extractor = ViTFeatureExtractor.from_pretrained(encoder_checkpoint)
-print("------------------------- 2 -------------------------\n")
 tokenizer = AutoTokenizer.from_pretrained(decoder_checkpoint)
-print("------------------------- 3 -------------------------\n")
 model = VisionEncoderDecoderModel.from_pretrained(model_checkpoint).to(device)
-print("------------------------- 4 -------------------------\n")
 def predict(image, max_length=64, num_beams=4):
@@ -29,7 +26,6 @@ def predict(image, max_length=64, num_beams=4):
   caption_text = clean_text(tokenizer.decode(caption_ids))
   return caption_text
-print("------------------------- 5 -------------------------\n")
 input = gr.inputs.Image(label="Upload any Image", type = 'pil', optional=True)
 output = gr.outputs.Textbox(type="text",label="Captions")
 examples = ["example1.jpg"]
@@ -62,7 +58,7 @@ with gr.Blocks() as demo:
         LoRA offers a groundbreaking approach by freezing the weights of pre-trained models and introducing trainable layers known as <b>rank-decomposition matrices in each transformer block</b>. This ingenious technique significantly reduces the number of trainable parameters and minimizes GPU memory requirements, as gradients no longer need to be computed for the majority of model weights.
         <br>
         <br>
-        You can find more info here: <a href="https://www.linkedin.com/pulse/fine-tuning-image-to-text-algorithms-with-lora-daniel-puente-viejo" target="_blank" style="text-decoration: underline;>Linkedin article</a>
         </h2>
         </div>

 encoder_checkpoint = "nlpconnect/vit-gpt2-image-captioning"
 decoder_checkpoint = "nlpconnect/vit-gpt2-image-captioning"
 model_checkpoint = "nlpconnect/vit-gpt2-image-captioning"
 feature_extractor = ViTFeatureExtractor.from_pretrained(encoder_checkpoint)
 tokenizer = AutoTokenizer.from_pretrained(decoder_checkpoint)
 model = VisionEncoderDecoderModel.from_pretrained(model_checkpoint).to(device)
 def predict(image, max_length=64, num_beams=4):
   caption_text = clean_text(tokenizer.decode(caption_ids))
   return caption_text
 input = gr.inputs.Image(label="Upload any Image", type = 'pil', optional=True)
 output = gr.outputs.Textbox(type="text",label="Captions")
 examples = ["example1.jpg"]
         LoRA offers a groundbreaking approach by freezing the weights of pre-trained models and introducing trainable layers known as <b>rank-decomposition matrices in each transformer block</b>. This ingenious technique significantly reduces the number of trainable parameters and minimizes GPU memory requirements, as gradients no longer need to be computed for the majority of model weights.
         <br>
         <br>
+        You can find more info here: <a href="https://www.linkedin.com/pulse/fine-tuning-image-to-text-algorithms-with-lora-daniel-puente-viejo" target="_blank";>Linkedin article</a>
         </h2>
         </div>