2 7 9

Alan Dao

alandao

https://about.alandao.net

tikikun

AI & ML interests

None yet

Recent Activity

upvoted a paper 17 days ago

AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO

replied to tianchez's post 18 days ago

Introducing VLM-R1! GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks? The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task). https://github.com/om-ai-lab/VLM-R1

authored a paper 20 days ago

AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO

View all activity

Organizations

alandao's activity

upvoted a paper 17 days ago

AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO

Paper • 2502.14669 • Published 21 days ago • 11

replied to tianchez's post 18 days ago

Great job guys, reasoning bringing so many potential!

we also have similiar idea! but only applied for maze

https://huggingface.co/homebrewltd/AlphaMaze-v0.2-1.5B

authored a paper 20 days ago

AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO

Paper • 2502.14669 • Published 21 days ago • 11

updated a model 2 months ago

homebrewltd/Ichigo-whisper-v0.1

Audio-Text-to-Text • Updated Jan 3 • 2 • 19

upvoted a paper 3 months ago

Parallelized Autoregressive Visual Generation

Paper • 2412.15119 • Published Dec 19, 2024 • 51

updated 2 models 4 months ago

alandao/f5-tts-mlx-4bit

Updated Nov 21, 2024 • 1

alandao/f5-tts-mlx-8bit

Updated Nov 21, 2024 • 1

liked a model 4 months ago

bartowski/Ichigo-llama3.1-s-instruct-v0.4-GGUF

Text Generation • Updated Nov 11, 2024 • 622 • 2

upvoted a collection 4 months ago

OpenCoder Datasets

Collection

OpenCoder datasets! • 6 items • Updated Nov 15, 2024 • 39

commented a paper 5 months ago

Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant

Paper • 2410.15316 • Published Oct 20, 2024 • 10 •

upvoted a paper 5 months ago

Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant

Paper • 2410.15316 • Published Oct 20, 2024 • 10

authored a paper 5 months ago

Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant

Paper • 2410.15316 • Published Oct 20, 2024 • 10

upvoted a collection 5 months ago

🍓 Ichigo v0.3

Collection

The experimental family designed to train LLMs to understand sound natively. • 6 items • Updated Nov 11, 2024 • 17

updated a model 5 months ago

homebrewltd/mini-Ichigo-llama3.2-3B-s-instruct

Audio-Text-to-Text • Updated Nov 19, 2024 • 28 • 34

updated a collection 5 months ago

🍓 Ichigo v0.3

Collection

The experimental family designed to train LLMs to understand sound natively. • 6 items • Updated Nov 11, 2024 • 17

reacted to reach-vb's post with 😎 5 months ago

Post

5541

Multimodal Ichigo Llama 3.1 - Real Time Voice AI 🔥

> WhisperSpeech X Llama 3.1 8B
> Trained on 50K hours of speech (7 languages)
> Continually trained on 45hrs 10x A1000s
> MLS -> WhisperVQ tokens -> Llama 3.1
> Instruction tuned on 1.89M samples
> 70% speech, 20% transcription, 10% text
> Apache 2.0 licensed ⚡

Architecture:
> WhisperSpeech/ VQ for Semantic Tokens
> Llama 3.1 8B Instruct for Text backbone
> Early fusion (Chameleon)

I'm super bullish on HomeBrew/ Jan and early fusion, audio and text, multimodal models!

(P.S. Play with the demo on Hugging Face: jan-hq/Ichigo-llama3.1-s-instruct)

liked a Space 7 months ago

113

Llama3.1 S V0.2 Checkpoint 2024 08 20

😻

Convert text to audio and vice versa

liked a dataset 7 months ago

homebrewltd/instruction-speech-encodec-v1

Viewer • Updated Aug 19, 2024 • 493k • 557 • 15