metadata
license: apache-2.0
UForm
Multi-Modal Inference Library
For Semantic Search Applications
UForm is a Multi-Modal Modal Inference package, designed to encode Multi-Lingual Texts, Images, and, soon, Audio, Video, and Documents, into a shared vector space!
This is the repository of English and multilingual UForm models converted to CoreML MLProgram format. Currently, only unimodal parts of models are converted.
Descriptions
Each model is separated into two parts: image-encoder
and text-encoder
:
- English image-encoder: english.image-encoder.mlpackage
- English text-encoder: english.text-encoder.mlpackage
- Multilingual image-encoder: multilingual.image-encoder.mlpackage
- Multilingual text-encoder: multilingual.text-encoder.mlpackage
Each checkpoint is a zip archive with an MLProgram of the corresponding encoder.
A text encoder has the following input fields:
input_ids
: int32attention_mask
: int32
An image encoder has a single input field image
: float32
Both encoders return:
features
: float32embeddings
: float32