OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models Paper • 2411.04905 • Published Nov 7, 2024 • 114
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback Paper • 2501.12895 • Published 5 days ago • 48
YuLan-Mini Collection A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details. • 5 items • Updated 29 days ago • 12
view article Article Saving Memory Using Padding-Free Transformer Layers during Finetuning By mayank-mishra • Jun 11, 2024 • 15
OLMo 2 Collection Artifacts for the second set of OLMo models. • 22 items • Updated 21 days ago • 75
Multimodal Latent Language Modeling with Next-Token Diffusion Paper • 2412.08635 • Published Dec 11, 2024 • 44
Gated Delta Networks: Improving Mamba2 with Delta Rule Paper • 2412.06464 • Published Dec 9, 2024 • 10
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published Nov 19, 2024 • 49
Hierarchically Gated Recurrent Neural Network for Sequence Modeling Paper • 2311.04823 • Published Nov 8, 2023 • 2
Gated Linear Attention Transformers with Hardware-Efficient Training Paper • 2312.06635 • Published Dec 11, 2023 • 6
Gated Slot Attention for Efficient Linear-Time Sequence Modeling Paper • 2409.07146 • Published Sep 11, 2024 • 20
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler Paper • 2408.13359 • Published Aug 23, 2024 • 23
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models Mar 20, 2024 • 74
Fast Matrix Multiplications for Lookup Table-Quantized LLMs Paper • 2407.10960 • Published Jul 15, 2024 • 12
Parallelizing Linear Transformers with the Delta Rule over Sequence Length Paper • 2406.06484 • Published Jun 10, 2024 • 3