Language Models that takes vision input and/or audio input, hand picked by Nexa Team.
-
NexaAI/gemma-3n-E4B-it-4bit-MLX
Image-Text-to-Text ⢠Updated ⢠128 -
NexaAI/Qwen2.5-VL-7B-Instruct-4bit-MLX
Image-Text-to-Text ⢠2B ⢠Updated ⢠96 -
NexaAI/SmolVLM-500M-Instruct-8bit-MLX
Image-Text-to-Text ⢠0.7B ⢠Updated ⢠39 -
NexaAI/SmolVLM-Instruct-8bit-MLX
Image-Text-to-Text ⢠0.7B ⢠Updated ⢠43