Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant Paper • 2410.15316 • Published Oct 20, 2024 • 10 • 4
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant Paper • 2410.15316 • Published Oct 20, 2024 • 10
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant Paper • 2410.15316 • Published Oct 20, 2024 • 10
🍓 Ichigo v0.3 Collection The experimental family designed to train LLMs to understand sound natively. • 6 items • Updated Nov 11, 2024 • 17
🍓 Ichigo v0.3 Collection The experimental family designed to train LLMs to understand sound natively. • 6 items • Updated Nov 11, 2024 • 17
view post Post 5474 Multimodal Ichigo Llama 3.1 - Real Time Voice AI 🔥> WhisperSpeech X Llama 3.1 8B> Trained on 50K hours of speech (7 languages)> Continually trained on 45hrs 10x A1000s> MLS -> WhisperVQ tokens -> Llama 3.1> Instruction tuned on 1.89M samples> 70% speech, 20% transcription, 10% text> Apache 2.0 licensed ⚡Architecture:> WhisperSpeech/ VQ for Semantic Tokens> Llama 3.1 8B Instruct for Text backbone> Early fusion (Chameleon)I'm super bullish on HomeBrew/ Jan and early fusion, audio and text, multimodal models!(P.S. Play with the demo on Hugging Face: jan-hq/Ichigo-llama3.1-s-instruct) 🔥 16 16 👍 5 5 ❤️ 2 2 😎 1 1 👀 1 1 🚀 1 1 + Reply
alandao/raw_audio_with_audio_tokens_for_pretraining_just_tokens Viewer • Updated Aug 10, 2024 • 2.42M • 38