Collections
Discover the best community collections!
Collections including paper arxiv:2411.02327
-
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale
Paper • 2406.19280 • Published • 60 -
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
Paper • 2411.02327 • Published • 11 -
MagicQuill: An Intelligent Interactive Image Editing System
Paper • 2411.09703 • Published • 35
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 12 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 53 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 85 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 31
-
TextCraftor: Your Text Encoder Can be Image Quality Controller
Paper • 2403.18978 • Published • 13 -
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
Paper • 2404.02733 • Published • 20 -
OmniFusion Technical Report
Paper • 2404.06212 • Published • 74 -
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Paper • 2404.07448 • Published • 11
-
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Paper • 2403.06775 • Published • 3 -
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Paper • 2010.11929 • Published • 6 -
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition
Paper • 2110.07040 • Published • 2 -
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks
Paper • 1811.00056 • Published • 2