unum-cloud
/

uform-gen

text-generation

image-captioning

visual-question-answering

Model card Files Files and versions Community

ashvardanian commited on Dec 29, 2023

Commit

9f6638f

·

1 Parent(s): dbe9130

Add previews

Files changed (1) hide show

README.md +8 -8

README.md CHANGED Viewed

@@ -9,30 +9,30 @@ datasets:
 - HuggingFaceM4/VQAv2
 - ChristophSchuhmann/MS_COCO_2017_URL_TEXT
 widget:
-- text: "What is the invoice number?"
-  src: "https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/invoice.png"
-- text: "What is the purchase amount?"
-  src: "https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/contract.jpeg"
 language:
 - en
 license: apache-2.0
 base_model: unum-cloud/uform-vl-english
 ---
 <h1 align="center">UForm</h1>
 <h3 align="center">
 Pocket-Sized Multimodal AI<br/>
 For Content Understanding and Generation<br/>
 </h3>
-<Gallery />
 ## Description
 UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model consists of two parts:
-1. [UForm Vision Encoder](https://huggingface.co/unum-cloud/uform-vl-english)
-2. [Sheared-LLaMA-1.3B](https://huggingface.co/princeton-nlp/Sheared-LLaMA-1.3B) manually tuned on the instructions dataset
 The model was pre-trained on: MSCOCO, SBU Captions, Visual Genome, VQAv2, GQA and a few internal datasets.

 - HuggingFaceM4/VQAv2
 - ChristophSchuhmann/MS_COCO_2017_URL_TEXT
 widget:
+- text: "The living room is cozy, featuring a red leather chair and a white table. The chair is in the center, and the table is on the left side. A lamp on the left side illuminates the space. A large picture hangs on the wall, adding artistic flair. A vase on the table adds a decorative touch. The room is well-lit, creating a warm and inviting atmosphere."
+  src: "https://github.com/ashvardanian/usearch-images/blob/main/assets/uform-gen-interior.png?raw=true"
+- text: "A young girl stands in a grassy field, holding an umbrella to shield herself from the rain. She dons a yellow dress and seems to relish her time outdoors. The umbrella is open, offering protection from the rain. The field is bordered by trees, fostering a tranquil and natural ambiance"
+  src: "https://github.com/ashvardanian/usearch-images/blob/main/assets/uform-gen-umbrella.png?raw=true"
 language:
 - en
 license: apache-2.0
 base_model: unum-cloud/uform-vl-english
 ---
+<Gallery />
 <h1 align="center">UForm</h1>
 <h3 align="center">
 Pocket-Sized Multimodal AI<br/>
 For Content Understanding and Generation<br/>
 </h3>
 ## Description
 UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model consists of two parts:
+1. [`uform-vl-english`](https://huggingface.co/unum-cloud/uform-vl-english) visual encoder,
+2. [`Sheared-LLaMA-1.3B`](https://huggingface.co/princeton-nlp/Sheared-LLaMA-1.3B) language model tuned on instruction datasets.
 The model was pre-trained on: MSCOCO, SBU Captions, Visual Genome, VQAv2, GQA and a few internal datasets.