Llama-3.1-Nemotron-70B Collection SOTA models on Arena Hard and RewardBench as of 1 Oct 2024. • 6 items • Updated 3 days ago • 76
ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization Paper • 2406.04312 • Published Jun 6 • 1
CursorCore: Assist Programming through Aligning Anything Paper • 2410.07002 • Published 9 days ago • 12
NVLM 1.0 Collection A family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks and text-only tasks. • 1 item • Updated 17 days ago • 42
Oryx Collection Oryx: One Multi-Modal LLM for On-Demand Spatial-Temporal Understanding • 5 items • Updated 29 days ago • 11
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data Paper • 2409.03810 • Published Sep 5 • 30
Sapiens Collection Foundation models for human tasks. Code: https://github.com/facebookresearch/sapiens • 72 items • Updated 29 days ago • 40
Qwen2-VL Collection Vision-language model series based on Qwen2 • 15 items • Updated 30 days ago • 138
LongVILA Collection A series of VILA models that specialize for **long-context** abilities • 4 items • Updated Aug 21 • 4
XGen-MM-1 models and datasets Collection A collection of all XGen-MM (Foundation LMM) models! • 14 items • Updated 10 days ago • 34
Qwen2-Audio Collection Audio-language model series based on Qwen2 • 4 items • Updated 30 days ago • 41
Generative Multimodal Models are In-Context Learners Paper • 2312.13286 • Published Dec 20, 2023 • 34
TokenCompose: Grounding Diffusion with Token-level Supervision Paper • 2312.03626 • Published Dec 6, 2023 • 5
Single-Image 3D Human Digitization with Shape-Guided Diffusion Paper • 2311.09221 • Published Nov 15, 2023 • 20
ILLUME: Rationalizing Vision-Language Models through Human Interactions Paper • 2208.08241 • Published Aug 17, 2022 • 2
DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion Paper • 2303.09604 • Published Mar 16, 2023 • 6
Retentive Network: A Successor to Transformer for Large Language Models Paper • 2307.08621 • Published Jul 17, 2023 • 170
Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events Paper • 2307.06439 • Published Jul 12, 2023 • 9
Example-based Motion Synthesis via Generative Motion Matching Paper • 2306.00378 • Published Jun 1, 2023 • 6
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning Paper • 2307.04725 • Published Jul 10, 2023 • 64
Focused Transformer: Contrastive Training for Context Scaling Paper • 2307.03170 • Published Jul 6, 2023 • 11
Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing Paper • 2306.17848 • Published Jun 30, 2023 • 8
One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization Paper • 2306.16928 • Published Jun 29, 2023 • 38
Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust Paper • 2305.20030 • Published May 31, 2023 • 8
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture Paper • 2301.08243 • Published Jan 19, 2023 • 6
TART: A plug-and-play Transformer module for task-agnostic reasoning Paper • 2306.07536 • Published Jun 13, 2023 • 11
Weakly supervised information extraction from inscrutable handwritten document images Paper • 2306.06823 • Published Jun 12, 2023 • 4
Aladdin: Zero-Shot Hallucination of Stylized 3D Assets from Abstract Scene Descriptions Paper • 2306.06212 • Published Jun 9, 2023 • 9
FasterViT: Fast Vision Transformers with Hierarchical Attention Paper • 2306.06189 • Published Jun 9, 2023 • 30