-
Video Creation by Demonstration
Paper • 2412.09551 • Published • 8 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 45 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 71 -
APOLLO: SGD-like Memory, AdamW-level Performance
Paper • 2412.05270 • Published • 38
Collections
Discover the best community collections!
Collections including paper arxiv:2412.07589
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 25 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 12 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 40 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 20
-
Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion
Paper • 2412.09593 • Published • 18 -
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
Paper • 2412.16112 • Published • 21 -
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Paper • 2412.14171 • Published • 24 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 45
-
Causal Diffusion Transformers for Generative Modeling
Paper • 2412.12095 • Published • 23 -
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training
Paper • 2412.09619 • Published • 20 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 45 -
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper • 2412.15213 • Published • 25
-
Animate-X: Universal Character Image Animation with Enhanced Motion Representation
Paper • 2410.10306 • Published • 54 -
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
Paper • 2411.05003 • Published • 70 -
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
Paper • 2411.04709 • Published • 25 -
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Paper • 2410.07171 • Published • 42
-
XLabs-AI/flux-RealismLora
Text-to-Image • Updated • 235k • • 948 -
StyleMaster: Stylize Your Video with Artistic Generation and Translation
Paper • 2412.07744 • Published • 19 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 45 -
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Paper • 2412.09501 • Published • 44
-
80📈
Dailypapershackernews
-
Prithvi WxC: Foundation Model for Weather and Climate
Paper • 2409.13598 • Published • 40 -
TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles
Paper • 2410.05262 • Published • 9 -
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant
Paper • 2410.15316 • Published • 10