FantasyID: Face Knowledge Enhanced ID-Preserving Video Generation Paper • 2502.13995 • Published 23 days ago • 8
VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation Paper • 2502.07531 • Published about 1 month ago • 13
Stable Flow: Vital Layers for Training-Free Image Editing Paper • 2411.14430 • Published Nov 21, 2024 • 22
Zero-Shot Voice Cloning Collection TTS models that support zero-shot voice cloning • 7 items • Updated Oct 26, 2024 • 10
steiner-preview Collection Reasoning models trained on synthetic data using reinforcement learning. • 3 items • Updated Oct 20, 2024 • 32
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated Sep 18, 2024 • 227
Audio Dialogues: Dialogues dataset for audio and music understanding Paper • 2404.07616 • Published Apr 11, 2024 • 16