-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 602 -
CLEAR: Character Unlearning in Textual and Visual Modalities
Paper • 2410.18057 • Published • 193 -
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders
Paper • 2410.22366 • Published • 70
Collections
Discover the best community collections!
Collections including paper arxiv:2410.22366
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 31 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 25 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 121 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 21
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 21 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 80 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 143 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
Learning Video Representations without Natural Videos
Paper • 2410.24213 • Published • 13 -
Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks
Paper • 2410.24032 • Published • 7 -
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders
Paper • 2410.22366 • Published • 70 -
Stealing User Prompts from Mixture of Experts
Paper • 2410.22884 • Published • 13
-
MotionCLR: Motion Generation and Training-free Editing via Understanding Attention Mechanisms
Paper • 2410.18977 • Published • 13 -
FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors
Paper • 2410.16271 • Published • 80 -
GS^3: Efficient Relighting with Triple Gaussian Splatting
Paper • 2410.11419 • Published • 10 -
ZeroComp: Zero-shot Object Compositing from Image Intrinsics via Diffusion
Paper • 2410.08168 • Published • 7
-
FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model
Paper • 2410.13925 • Published • 21 -
BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities
Paper • 2410.14672 • Published • 7 -
Scalable Ranked Preference Optimization for Text-to-Image Generation
Paper • 2410.18013 • Published • 14 -
DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation
Paper • 2410.18666 • Published • 17
-
On the Scalability of Diffusion-based Text-to-Image Generation
Paper • 2404.02883 • Published • 17 -
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
Paper • 2404.02733 • Published • 20 -
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Paper • 2404.03653 • Published • 33 -
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Paper • 2404.07987 • Published • 47
-
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Paper • 2403.06775 • Published • 3 -
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Paper • 2010.11929 • Published • 6 -
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition
Paper • 2110.07040 • Published • 2 -
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks
Paper • 1811.00056 • Published • 2