ashvardanian
commited on
Commit
•
9f6638f
1
Parent(s):
dbe9130
Add previews
Browse files
README.md
CHANGED
@@ -9,30 +9,30 @@ datasets:
|
|
9 |
- HuggingFaceM4/VQAv2
|
10 |
- ChristophSchuhmann/MS_COCO_2017_URL_TEXT
|
11 |
widget:
|
12 |
-
- text: "
|
13 |
-
src: "https://
|
14 |
-
- text: "
|
15 |
-
src: "https://
|
16 |
language:
|
17 |
- en
|
18 |
license: apache-2.0
|
19 |
base_model: unum-cloud/uform-vl-english
|
20 |
---
|
21 |
|
|
|
|
|
22 |
<h1 align="center">UForm</h1>
|
23 |
<h3 align="center">
|
24 |
Pocket-Sized Multimodal AI<br/>
|
25 |
For Content Understanding and Generation<br/>
|
26 |
</h3>
|
27 |
|
28 |
-
<Gallery />
|
29 |
-
|
30 |
## Description
|
31 |
|
32 |
UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model consists of two parts:
|
33 |
|
34 |
-
1. [
|
35 |
-
2. [Sheared-LLaMA-1.3B](https://huggingface.co/princeton-nlp/Sheared-LLaMA-1.3B)
|
36 |
|
37 |
The model was pre-trained on: MSCOCO, SBU Captions, Visual Genome, VQAv2, GQA and a few internal datasets.
|
38 |
|
|
|
9 |
- HuggingFaceM4/VQAv2
|
10 |
- ChristophSchuhmann/MS_COCO_2017_URL_TEXT
|
11 |
widget:
|
12 |
+
- text: "The living room is cozy, featuring a red leather chair and a white table. The chair is in the center, and the table is on the left side. A lamp on the left side illuminates the space. A large picture hangs on the wall, adding artistic flair. A vase on the table adds a decorative touch. The room is well-lit, creating a warm and inviting atmosphere."
|
13 |
+
src: "https://github.com/ashvardanian/usearch-images/blob/main/assets/uform-gen-interior.png?raw=true"
|
14 |
+
- text: "A young girl stands in a grassy field, holding an umbrella to shield herself from the rain. She dons a yellow dress and seems to relish her time outdoors. The umbrella is open, offering protection from the rain. The field is bordered by trees, fostering a tranquil and natural ambiance"
|
15 |
+
src: "https://github.com/ashvardanian/usearch-images/blob/main/assets/uform-gen-umbrella.png?raw=true"
|
16 |
language:
|
17 |
- en
|
18 |
license: apache-2.0
|
19 |
base_model: unum-cloud/uform-vl-english
|
20 |
---
|
21 |
|
22 |
+
<Gallery />
|
23 |
+
|
24 |
<h1 align="center">UForm</h1>
|
25 |
<h3 align="center">
|
26 |
Pocket-Sized Multimodal AI<br/>
|
27 |
For Content Understanding and Generation<br/>
|
28 |
</h3>
|
29 |
|
|
|
|
|
30 |
## Description
|
31 |
|
32 |
UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model consists of two parts:
|
33 |
|
34 |
+
1. [`uform-vl-english`](https://huggingface.co/unum-cloud/uform-vl-english) visual encoder,
|
35 |
+
2. [`Sheared-LLaMA-1.3B`](https://huggingface.co/princeton-nlp/Sheared-LLaMA-1.3B) language model tuned on instruction datasets.
|
36 |
|
37 |
The model was pre-trained on: MSCOCO, SBU Captions, Visual Genome, VQAv2, GQA and a few internal datasets.
|
38 |
|