Forgetting Transformer: Softmax Attention with a Forget Gate Paper โข 2503.02130 โข Published 12 days ago โข 27
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Paper โข 2502.18449 โข Published 18 days ago โข 68
Training Large Language Models to Reason in a Continuous Latent Space Paper โข 2412.06769 โข Published Dec 9, 2024 โข 78
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper โข 2410.17243 โข Published Oct 22, 2024 โข 90
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 โข 15 items โข Updated Dec 6, 2024 โข 575
Running on CPU Upgrade 7.89k 7.89k Kolors Virtual Try-On ๐ Upload images to try on clothes virtually
view article Article The LASER technique: Evaluating SVD compression By fractalego โข Apr 4, 2024 โข 8
nabla^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials Paper โข 2406.14347 โข Published Jun 20, 2024 โข 99