-
Video Creation by Demonstration
Paper • 2412.09551 • Published • 8 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 45 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 71 -
APOLLO: SGD-like Memory, AdamW-level Performance
Paper • 2412.05270 • Published • 38
Collections
Discover the best community collections!
Collections including paper arxiv:2401.07382
-
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Paper • 2305.11738 • Published • 8 -
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
Paper • 2402.14809 • Published • 3 -
DRLC: Reinforcement Learning with Dense Rewards from LLM Critic
Paper • 2401.07382 • Published • 2
-
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Paper • 2305.11738 • Published • 8 -
Shepherd: A Critic for Language Model Generation
Paper • 2308.04592 • Published • 31 -
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
Paper • 2402.14809 • Published • 3 -
DRLC: Reinforcement Learning with Dense Rewards from LLM Critic
Paper • 2401.07382 • Published • 2
-
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper • 2403.10704 • Published • 57 -
WARM: On the Benefits of Weight Averaged Reward Models
Paper • 2401.12187 • Published • 18 -
RewardBench: Evaluating Reward Models for Language Modeling
Paper • 2403.13787 • Published • 21 -
DreamReward: Text-to-3D Generation with Human Preference
Paper • 2403.14613 • Published • 35
-
Measuring the Effects of Data Parallelism on Neural Network Training
Paper • 1811.03600 • Published • 2 -
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Paper • 1804.04235 • Published • 2 -
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Paper • 1905.11946 • Published • 3 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 62
-
Moral Foundations of Large Language Models
Paper • 2310.15337 • Published • 1 -
Specific versus General Principles for Constitutional AI
Paper • 2310.13798 • Published • 2 -
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper • 2310.13639 • Published • 24 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper • 2309.00267 • Published • 47