view post Post 4954 Reply Multimodal Ichigo Llama 3.1 - Real Time Voice AI 🔥> WhisperSpeech X Llama 3.1 8B> Trained on 50K hours of speech (7 languages)> Continually trained on 45hrs 10x A1000s> MLS -> WhisperVQ tokens -> Llama 3.1> Instruction tuned on 1.89M samples> 70% speech, 20% transcription, 10% text> Apache 2.0 licensed ⚡Architecture:> WhisperSpeech/ VQ for Semantic Tokens> Llama 3.1 8B Instruct for Text backbone> Early fusion (Chameleon)I'm super bullish on HomeBrew/ Jan and early fusion, audio and text, multimodal models!(P.S. Play with the demo on Hugging Face: jan-hq/Ichigo-llama3.1-s-instruct)
view post Post 3005 Reply NEW: Open Source Text/ Image to video model is out - MIT licensed - Rivals Gen-3, Pika & Kling 🔥> Pyramid Flow: Training-efficient Autoregressive Video Generation method> Utilizes Flow Matching> Trains on open-source datasets> Generates high-quality 10-second videos> Video resolution: 768p> Frame rate: 24 FPS> Supports image-to-video generation> Model checkpoints available on the hub 🤗: rain1011/pyramid-flow-sd3
SAM 2.1 Collection of SAM 2.1 model checkpoints facebook/sam2.1-hiera-large Mask Generation • Updated 24 days ago • 14.6k • 27 facebook/sam2.1-hiera-base-plus Mask Generation • Updated 24 days ago • 694 • 4 facebook/sam2.1-hiera-small Mask Generation • Updated 24 days ago • 1.43k • 4 facebook/sam2.1-hiera-tiny Mask Generation • Updated 24 days ago • 1.61k • 3