Update README.md
Browse files
README.md
CHANGED
@@ -17,24 +17,17 @@ Omni-Vision is a sub-billion (968M) multimodal model capable of processing both
|
|
17 |
|
18 |
Quick Links:
|
19 |
1. Interact directly in the HuggingFace Space.
|
20 |
-
2. [
|
21 |
3. Learn more details in our blogs
|
22 |
|
23 |
**Feedback:** Send questions or comments about the model in our [Discord](https://discord.gg/nexa-ai)
|
24 |
|
25 |
## Intended Use Cases
|
26 |
-
|
27 |
|
28 |
-
|
29 |
-
2. Image Captioning: Image captioning bridges the gap between vision and language, extracting details, understanding the scene, and then crafting a sentence or two that tells the story.
|
30 |
|
31 |
-
Example:
|
32 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/w07yBAp_lZt12E_Vz0Lyk.png" alt="Benchmark Radar Chart" style="width:250px;"/>
|
33 |
-
```bash
|
34 |
-
>>>> caption this
|
35 |
-
```
|
36 |
-
|
37 |
-
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/dHZSgVGY9yV_lsNIW-iRj.png)
|
38 |
|
39 |
|
40 |
## Benchmarks
|
|
|
17 |
|
18 |
Quick Links:
|
19 |
1. Interact directly in the HuggingFace Space.
|
20 |
+
2. [Quickstart to run locally](#how-to-run-locally)
|
21 |
3. Learn more details in our blogs
|
22 |
|
23 |
**Feedback:** Send questions or comments about the model in our [Discord](https://discord.gg/nexa-ai)
|
24 |
|
25 |
## Intended Use Cases
|
26 |
+
OmniVision is intended for Visual Question Answering (answering questions about images) and Image Captioning (describing scenes in photos), optimized for edge devices. See example below:
|
27 |
|
28 |
+
Omni-Vision generated captions for a 1046×1568 pixel poster | **Processing time: <2s** | Device: MacBook M4 Pro
|
|
|
29 |
|
30 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/PTG3_n_p7_atBHCwRLOEE.png" alt="Example" style="width:700px;"/>
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
|
32 |
|
33 |
## Benchmarks
|