-
aMUSEd: An Open MUSE Reproduction
Paper • 2401.01808 • Published • 28 -
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Paper • 2401.01885 • Published • 27 -
SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity
Paper • 2401.00604 • Published • 4 -
LARP: Language-Agent Role Play for Open-World Games
Paper • 2312.17653 • Published • 30
Collections
Discover the best community collections!
Collections including paper arxiv:2312.07532
-
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper • 2312.07987 • Published • 40 -
Interfacing Foundation Models' Embeddings
Paper • 2312.07532 • Published • 10 -
Point Transformer V3: Simpler, Faster, Stronger
Paper • 2312.10035 • Published • 17 -
TheBloke/quantum-v0.01-GPTQ
Text Generation • Updated • 19 • 2
-
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 118 -
stabilityai/stable-video-diffusion-img2vid-xt
Image-to-Video • Updated • 538k • 2.68k -
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Paper • 2311.13384 • Published • 50 -
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Paper • 2311.12454 • Published • 29
-
Multimodal Foundation Models: From Specialists to General-Purpose Assistants
Paper • 2309.10020 • Published • 40 -
Kosmos-2.5: A Multimodal Literate Model
Paper • 2309.11419 • Published • 50 -
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Paper • 2309.16058 • Published • 55 -
Jointly Training Large Autoregressive Multimodal Models
Paper • 2309.15564 • Published • 8