Update README.md
Browse files
README.md
CHANGED
@@ -26,9 +26,9 @@ Omnivision is a compact, sub-billion (968M) multimodal model for processing both
|
|
26 |
Omnivision is intended for **Visual Question Answering** (answering questions about images) and **Image Captioning** (describing scenes in photos), making it ideal for on-device applications.
|
27 |
|
28 |
**Example Demo:**
|
29 |
-
|
30 |
|
31 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/
|
32 |
|
33 |
|
34 |
## Benchmarks
|
|
|
26 |
Omnivision is intended for **Visual Question Answering** (answering questions about images) and **Image Captioning** (describing scenes in photos), making it ideal for on-device applications.
|
27 |
|
28 |
**Example Demo:**
|
29 |
+
Generating captions for a 1046×1568 image on M4 Pro Macbook takes **< 2s processing time** and requires only 988 MB RAM and 948 MB Storage.
|
30 |
|
31 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/P8HFmA7huCdpMClWVuXZO.png" alt="Example" style="width:700px;"/>
|
32 |
|
33 |
|
34 |
## Benchmarks
|