-
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Paper • 2404.08801 • Published • 67 -
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Paper • 2404.07839 • Published • 46 -
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper • 2404.05892 • Published • 36 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 143
Collections
Discover the best community collections!
Collections including paper arxiv:2501.04519
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 126 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 52 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 14 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 66
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 65 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 41 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 48 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 30
-
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment
Paper • 2401.12474 • Published • 36 -
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Paper • 2403.12968 • Published • 25 -
RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models
Paper • 2310.00746 • Published • 1 -
LESS: Selecting Influential Data for Targeted Instruction Tuning
Paper • 2402.04333 • Published • 3
-
TheBloke/Nous-Capybara-34B-GGUF
Updated • 3.8k • 165 -
NousResearch/Nous-Hermes-2-SOLAR-10.7B
Text Generation • Updated • 5.67k • 205 -
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 128 -
bartowski/Codestral-22B-v0.1-GGUF
Text Generation • Updated • 7.25k • 185
-
DocGraphLM: Documental Graph Language Model for Information Extraction
Paper • 2401.02823 • Published • 36 -
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 64 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 180 -
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Paper • 2309.01131 • Published • 1
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 20 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 12 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 14 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 48
-
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning
Paper • 2310.03731 • Published • 29 -
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search
Paper • 2310.13227 • Published • 13 -
ChipNeMo: Domain-Adapted LLMs for Chip Design
Paper • 2311.00176 • Published • 9 -
Language Models can be Logical Solvers
Paper • 2311.06158 • Published • 23
-
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper • 2309.11495 • Published • 38 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 77 -
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 85 -
Language Modeling Is Compression
Paper • 2309.10668 • Published • 83