Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2410.02089

RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning

Paper • 2410.02089 • Published Oct 2 • 11

Research papers - find implementations in github

RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning

Paper • 2410.02089 • Published Oct 2 • 11

LLM Reasoning Papers

improve reasoning capabilities of LLMs

Let's Verify Step by Step

Paper • 2305.20050 • Published May 31, 2023 • 9
LLM Critics Help Catch LLM Bugs

Paper • 2407.00215 • Published Jun 28
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Paper • 2407.21787 • Published Jul 31 • 3
Generative Verifiers: Reward Modeling as Next-Token Prediction

Paper • 2408.15240 • Published Aug 27 • 13

Embodiment enables interaction of model with environment. Key is to anticipate what change could've come with its current action.

Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation

Paper • 2408.11812 • Published Aug 21 • 4
WebArena: A Realistic Web Environment for Building Autonomous Agents

Paper • 2307.13854 • Published Jul 25, 2023 • 23
Agent Workflow Memory

Paper • 2409.07429 • Published Sep 11 • 27
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Paper • 2409.08264 • Published Sep 12 • 43

ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12 • 62
sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28 • 39
Teaching Large Language Models to Reason with Reinforcement Learning

Paper • 2403.04642 • Published Mar 7 • 46
Best Practices and Lessons Learned on Synthetic Data for Language Models

Paper • 2404.07503 • Published Apr 11 • 29

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs