Llama Nemotron Collection Open, Production-ready Enterprise Models • 11 items • Updated 3 days ago • 65
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning Paper • 2505.01441 • Published Apr 28 • 39
view article Article Releasing the largest multilingual open pretraining dataset By Pclanglais and 2 others • Nov 13, 2024 • 102
Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published Jun 20, 2024 • 95
view article Article StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation By yuxiang630 and 8 others • Apr 29, 2024 • 79
LongNet: Scaling Transformers to 1,000,000,000 Tokens Paper • 2307.02486 • Published Jul 5, 2023 • 81
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers Paper • 2305.07185 • Published May 12, 2023 • 9