Running on Zero 1.35k 1.35k Chat With Janus-Pro-7B ๐ A unified multimodal understanding and generation model.
Segformer Collection Transformer-based semantic segmentation model by Nvidia โข 15 items โข Updated 23 days ago โข 4
Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis Paper โข 2412.01819 โข Published Dec 2, 2024 โข 35
PaliGemma 2 Release Collection Vision-Language Models available in multiple 3B, 10B and 28B variants. โข 23 items โข Updated Dec 13, 2024 โข 134
meta-llama/Llama-3.2-11B-Vision-Instruct Image-Text-to-Text โข Updated Dec 4, 2024 โข 2.34M โข โข 1.29k
AIMv2 Collection A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint. โข 19 items โข Updated Nov 22, 2024 โข 71
Running on T4 45 45 ColPali ๐ค Vespa - Visual Retrieval ๐ Visual Retrieval with ColPali and Vespa
google/siglip-so400m-patch16-256-i18n Zero-Shot Image Classification โข Updated Nov 18, 2024 โข 4.72k โข 28