HuggingFaceTB/SmolVLM2-500M-Video-Instruct Image-Text-to-Text โข Updated 8 days ago โข 6.92k โข 41
microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition โข Updated 1 day ago โข 441k โข 1.13k
Running 288 288 Kokoro Text-to-Speech (WebGPU) ๐ฃ High-quality speech synthesis powered by Kokoro TTS
mlx-community/SmolVLM2-500M-Video-Instruct-mlx Video-Text-to-Text โข Updated 22 days ago โข 861 โข 12
FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations Paper โข 2411.10818 โข Published Nov 16, 2024 โข 25
StyleDrop: Text-to-Image Generation in Any Style Paper โข 2306.00983 โข Published Jun 1, 2023 โข 7
Kosmos-2: Grounding Multimodal Large Language Models to the World Paper โข 2306.14824 โข Published Jun 26, 2023 โข 34