NexaAIDev
/

omnivision-968M

Image-Text-to-Text

Inference Endpoints

Model card Files Files and versions Community

alanzhuly commited on 1 day ago

Commit

6e0b0dd

•

1 Parent(s): 336cb9d

Update README.md

Files changed (1) hide show

README.md +4 -11

README.md CHANGED Viewed

@@ -17,24 +17,17 @@ Omni-Vision is a sub-billion (968M) multimodal model capable of processing both
 Quick Links:
 1. Interact directly in the HuggingFace Space.
-2. [How to run locally in 2 simple steps](#how-to-run-locally)
 3. Learn more details in our blogs
 **Feedback:** Send questions or comments about the model in our [Discord](https://discord.gg/nexa-ai)
 ## Intended Use Cases
-Omnivision is best used locally on edge devices. It is intended for visual question answering ()
-1. Visual Question Answering (VQA) and Visual Reasoning: Imagine a machine that looks at a picture and understands your questions about it.
-2. Image Captioning: Image captioning bridges the gap between vision and language, extracting details, understanding the scene, and then crafting a sentence or two that tells the story.
-Example:
-<img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/w07yBAp_lZt12E_Vz0Lyk.png" alt="Benchmark Radar Chart" style="width:250px;"/>
-```bash
->>>> caption this
-```
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/dHZSgVGY9yV_lsNIW-iRj.png)
 ## Benchmarks

 Quick Links:
 1. Interact directly in the HuggingFace Space.
+2. [Quickstart to run locally](#how-to-run-locally)
 3. Learn more details in our blogs
 **Feedback:** Send questions or comments about the model in our [Discord](https://discord.gg/nexa-ai)
 ## Intended Use Cases
+OmniVision is intended for Visual Question Answering (answering questions about images) and Image Captioning (describing scenes in photos), optimized for edge devices. See example below:
+Omni-Vision generated captions for a 1046×1568 pixel poster | **Processing time: <2s** | Device: MacBook M4 Pro
+<img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/PTG3_n_p7_atBHCwRLOEE.png" alt="Example" style="width:700px;"/>
 ## Benchmarks