Running 535 535 Scaling test-time compute π Enhance math problem solving by scaling test-time compute
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr β’ Feb 7 β’ 70
DocLLM: A layout-aware generative language model for multimodal document understanding Paper β’ 2401.00908 β’ Published Dec 31, 2023 β’ 180 β’ 25
paulofinardi/OIG_small_chip2_portuguese_brasil Viewer β’ Updated Mar 19, 2023 β’ 210k β’ 132 β’ 16
paulofinardi/OIG_small_chip2_portuguese_brasil Viewer β’ Updated Mar 19, 2023 β’ 210k β’ 132 β’ 16