Spaces:

selfDotOsman
/

BobVLM-demo

Sleeping

selfDotOsman commited on Feb 12

Commit

a458187

verified ·

1 Parent(s): ce9221c

Update app.py

Files changed (1) hide show

app.py CHANGED Viewed

@@ -14,7 +14,7 @@ def analyze_image(image):
     response = pipe(
         chat=[
             {"role": "system", "content": "You are an image understanding assistant. You can see and interpret images in fine detail. Provide clear, engaging descriptions that highlight the key elements and atmosphere of the image."},
-            {"role": "user", "content": "Describe the image"},
         ],
         images=image
     )
@@ -28,9 +28,8 @@ with gr.Blocks(theme=gr.themes.Soft(
 )) as demo:
     gr.Markdown(
         """
-        # 🤖 BobVLM Image Analyzer
-        Upload an image and let BobVLM describe what it sees. BobVLM combines CLIP's vision capabilities
-        with LLaMA's language understanding to provide detailed, natural descriptions of images.
         """
     )
@@ -78,7 +77,7 @@ with gr.Blocks(theme=gr.themes.Soft(
         """
         ### About BobVLM
         BobVLM is a Vision Language Model that combines CLIP's visual understanding with LLaMA's language capabilities.
-        It uses a specialized adapter layer to bridge the gap between vision and language, enabling detailed and natural
         image descriptions.
         [View on GitHub](https://github.com/yourusername/BobVLM) | [Hugging Face Model](https://huggingface.co/selfDotOsman/BobVLM-1.5b)

     response = pipe(
         chat=[
             {"role": "system", "content": "You are an image understanding assistant. You can see and interpret images in fine detail. Provide clear, engaging descriptions that highlight the key elements and atmosphere of the image."},
+            {"role": "user", "content": "Describe the image shortly"},
         ],
         images=image
     )
 )) as demo:
     gr.Markdown(
         """
+        # 🤖 BobVLM Demo
+        This demo runs on cpu since I can't afford GPU prices here 🤧. So it is quite slow so bare with me. Upload an image and let BobVLM describe what it sees
         """
     )
         """
         ### About BobVLM
         BobVLM is a Vision Language Model that combines CLIP's visual understanding with LLaMA's language capabilities.
+        It was born out an experiment to train a small adapter layer to see how much it can learn given supervised finetuning (sft) data. The product is a model that can produce detailed and natural
         image descriptions.
         [View on GitHub](https://github.com/yourusername/BobVLM) | [Hugging Face Model](https://huggingface.co/selfDotOsman/BobVLM-1.5b)