Multimodal AI - a shi-labs Collection

Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

shi-labs 's Collections

Visual Understanding

Multimodal AI

updated Dec 11, 2024

Large multimodal models

Running on Zero

4

🔍

OLA-VLM
Running on Zero

82

🐐

CuMo 7b Zero
Runtime error

63

✌️

VCoder
shi-labs/vcoder_ds_llava-v1.5-13b

Text Generation • Updated Dec 20, 2023 • 10 • 4
shi-labs/CuMo-mistral-7b

Text Generation • Updated May 9, 2024 • 39 • 15
shi-labs/CuMo-mixtral-8x7b

Text Generation • Updated May 9, 2024 • 25 • 3
shi-labs/vcoder_llava-v1.5-7b

Text Generation • Updated Dec 20, 2023 • 10 • 2
shi-labs/vcoder_ds_llava-v1.5-7b

Text Generation • Updated Dec 20, 2023 • 12
shi-labs/vcoder_llava-v1.5-13b

Text Generation • Updated Dec 20, 2023 • 13 • 4
VCoder: Versatile Vision Encoders for Multimodal Large Language Models

Paper • 2312.14233 • Published Dec 21, 2023 • 16
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts

Paper • 2405.05949 • Published May 9, 2024 • 2
shi-labs/OLA-VLM-CLIP-ViT-Phi3-4k-mini

Image-Text-to-Text • Updated Dec 10, 2024 • 37 • 1
shi-labs/OLA-VLM-CLIP-ConvNeXT-Llama3-8b

Image-Text-to-Text • Updated Dec 10, 2024 • 25 • 1
shi-labs/OLA-VLM-CLIP-ConvNeXT-Phi3-4k-mini

Image-Text-to-Text • Updated Dec 10, 2024 • 15 • 1
shi-labs/vpt_OLA-VLM-CLIP-ConvNeXT-Llama3-8b

Image-Text-to-Text • Updated Dec 10, 2024 • 69 • 2
shi-labs/OLA-VLM-CLIP-ViT-Llama3-8b

Image-Text-to-Text • Updated Dec 10, 2024 • 34
shi-labs/pretrain_dsg_OLA-VLM-CLIP-ViT-Llama3-8b

Image-Text-to-Text • Updated Dec 10, 2024 • 74

Collection guide
Browse collections

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs