Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Paper • 2402.19479 • Published Feb 29 • 32
Vision-Language Models Can Self-Improve Reasoning via Reflection Paper • 2411.00855 • Published Oct 30 • 4