Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2411.14432 • Published Nov 21 • 21
MIGA: Mixture-of-Experts with Group Aggregation for Stock Market Prediction Paper • 2410.02241 • Published Oct 3 • 6
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling Paper • 2409.16160 • Published Sep 24 • 32
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation Paper • 2312.12491 • Published Dec 19, 2023 • 69
DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors Paper • 2409.08278 • Published Sep 12 • 10
GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers Paper • 2409.04196 • Published Sep 6 • 14
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models Jun 24 • 180
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published Jun 10 • 65
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control Paper • 2405.12970 • Published May 21 • 22
FIFO-Diffusion: Generating Infinite Videos from Text without Training Paper • 2405.11473 • Published May 19 • 53
view article Article Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints May 1 • 69
view article Article StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation Apr 29 • 76
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models Paper • 2404.14507 • Published Apr 22 • 21
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis Paper • 2404.13686 • Published Apr 21 • 27
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study Paper • 2404.14047 • Published Apr 22 • 44
RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion Paper • 2404.07199 • Published Apr 10 • 25