matlok
's Collections
Papers - Image - Fine-tuning
updated
DocLLM: A layout-aware generative language model for multimodal document
understanding
Paper
•
2401.00908
•
Published
•
181
Visual Instruction Tuning
Paper
•
2304.08485
•
Published
•
13
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
Paper
•
2403.09622
•
Published
•
16
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper
•
2401.12945
•
Published
•
86
Model Stock: All we need is just a few fine-tuned models
Paper
•
2403.19522
•
Published
•
10
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Paper
•
2404.01197
•
Published
•
30
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept
Matching
Paper
•
2404.03653
•
Published
•
33
RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Paper
•
2404.03673
•
Published
•
14
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper
•
2301.07093
•
Published
•
3
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Paper
•
2404.12803
•
Published
•
29
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image
Synthesis
Paper
•
2404.13686
•
Published
•
28
Capabilities of Gemini Models in Medicine
Paper
•
2404.18416
•
Published
•
23
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
Paper
•
1707.02968
•
Published
•
1
NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for
Document Enhancement
Paper
•
2404.05669
•
Published
•
1
Geodesic Multi-Modal Mixup for Robust Fine-Tuning
Paper
•
2203.03897
•
Published
•
1
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper
•
2411.10440
•
Published
•
113
DETRs Beat YOLOs on Real-time Object Detection
Paper
•
2304.08069
•
Published
•
13