OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation Paper • 2412.09585 • Published 13 days ago • 10
VoiceGuider: Enhancing Out-of-Domain Performance in Parameter-Efficient Speaker-Adaptive Text-to-Speech via Autoguidance Paper • 2409.15759 • Published Sep 24 • 1
NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple Speakers Paper • 2409.15760 • Published Sep 24 • 1
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator Paper • 2411.15466 • Published Nov 23 • 34
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published Nov 22 • 55
Efficient Diffusion-Driven Corruption Editor for Test-Time Adaptation Paper • 2403.10911 • Published Mar 16
M-RewardBench: Evaluating Reward Models in Multilingual Settings Paper • 2410.15522 • Published Oct 20 • 11
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation Paper • 2410.01731 • Published Oct 2 • 16
The MAMe Dataset: On the relevance of High Resolution and Variable Shape image properties Paper • 2007.13693 • Published Jul 27, 2020
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25 • 104
Present and Future Generalization of Synthetic Image Detectors Paper • 2409.14128 • Published Sep 21 • 18
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published Aug 28 • 84
Self-Directed Synthetic Dialogues and Revisions Technical Report Paper • 2407.18421 • Published Jul 25
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models Paper • 2311.18232 • Published Nov 30, 2023 • 1