mlfoundations-dev/instruction_filtering_scale_up_code_base_embedding_filter_mean_16K_new Text Generation • Updated about 3 hours ago
mlfoundations-dev/instruction_filtering_scale_up_code_base_embedding_filter_mean_per_domain_16K_new Text Generation • Updated about 3 hours ago
mlfoundations-dev/instruction_filtering_scale_up_code_base_askllm_16K_new Text Generation • Updated about 3 hours ago
mlfoundations-dev/instruction_filtering_scale_up_code_base_fasttext_per_domain_16K_new Text Generation • Updated about 4 hours ago
mlfoundations-dev/instruction_filtering_scale_up_code_base_gemini_length_16K_new Text Generation • Updated about 6 hours ago
mlfoundations-dev/instruction_filtering_scale_up_code_base_random_filtering_16K_new Text Generation • Updated about 7 hours ago
Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs Paper • 2502.19413 • Published 11 days ago • 19
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models Paper • 2502.17387 • Published 13 days ago • 5
MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use Paper • 2502.15872 • Published 16 days ago • 4
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Paper • 2501.07171 • Published Jan 13 • 50