view post Post 3177 I was initially pretty sceptical about Meta's Coconut paper [1] because the largest perf gains were reported on toy linguistic problems. However, these results on machine translation are pretty impressive!https://x.com/casper_hansen_/status/1875872309996855343Together with the recent PRIME method [2] for scaling RL, reasoning for open models is looking pretty exciting for 2025![1] Training Large Language Models to Reason in a Continuous Latent Space (2412.06769)[2] https://huggingface.co/blog/ganqu/prime See translation 🔥 8 8 🧠 2 2 + Reply
Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published Dec 9, 2024 • 71
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated Nov 28, 2024 • 459
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published about 1 month ago • 136