metadata

license: apache-2.0

UForm

Multi-Modal Inference Library
For Semantic Search Applications

UForm is a Multi-Modal Modal Inference package, designed to encode Multi-Lingual Texts, Images, and, soon, Audio, Video, and Documents, into a shared vector space!

This is the repository of English and multilingual UForm models converted to CoreML MLProgram format. Currently, only unimodal parts of models are converted.

Descriptions

Each model is separated into two parts: image-encoder and text-encoder:

English image-encoder: english.image-encoder.mlpackage
English text-encoder: english.text-encoder.mlpackage
Multilingual image-encoder: multilingual.image-encoder.mlpackage
Multilingual text-encoder: multilingual.text-encoder.mlpackage

Each checkpoint is a zip archive with an MLProgram of the corresponding encoder.

A text encoder has the following input fields:

input_ids: int32
attention_mask: int32

An image encoder has a single input field image: float32

Both encoders return:

features: float32
embeddings: float32

UForm

Multi-Modal Inference Library For Semantic Search Applications

Descriptions

Multi-Modal Inference Library
For Semantic Search Applications