Cosmos World Foundation Model Platform for Physical AI Paper • 2501.03575 • Published 6 days ago • 56
Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers Paper • 2501.03931 • Published 6 days ago • 13
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation Paper • 2412.07589 • Published Dec 10, 2024 • 45
OminiControl: Minimal and Universal Control for Diffusion Transformer Paper • 2411.15098 • Published Nov 22, 2024 • 53
Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion Paper • 2412.09593 • Published Dec 12, 2024 • 18
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition Paper • 2412.09501 • Published Dec 12, 2024 • 44
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published Dec 5, 2024 • 105
FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance Paper • 2408.08189 • Published Aug 15, 2024 • 17
UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization Paper • 2408.05939 • Published Aug 12, 2024 • 14
ControlNeXt: Powerful and Efficient Control for Image and Video Generation Paper • 2408.06070 • Published Aug 12, 2024 • 53
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22, 2024 • 254
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Dec 6, 2024 • 702
AniClipart: Clipart Animation with Text-to-Video Priors Paper • 2404.12347 • Published Apr 18, 2024 • 12
Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance Paper • 2306.00943 • Published Jun 1, 2023 • 5
MGM-Data Collection Official data collection for the paper "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models" • 2 items • Updated Apr 21, 2024 • 7
MGM Collection Official model collection for the paper "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models" • 13 items • Updated May 3, 2024 • 47