LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization Paper β’ 2502.13922 β’ Published 22 days ago β’ 25
Scaling Pre-training to One Hundred Billion Data for Vision Language Models Paper β’ 2502.07617 β’ Published about 1 month ago β’ 29
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment Paper β’ 2502.04328 β’ Published Feb 6 β’ 29
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback Paper β’ 2501.12895 β’ Published Jan 22 β’ 57
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper β’ 2501.13106 β’ Published Jan 22 β’ 85
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper β’ 2501.12380 β’ Published Jan 21 β’ 84
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Paper β’ 2501.11873 β’ Published Jan 21 β’ 63
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Paper β’ 2501.00599 β’ Published Dec 31, 2024 β’ 41
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper β’ 2501.00958 β’ Published Jan 1 β’ 100
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper β’ 2410.17243 β’ Published Oct 22, 2024 β’ 90
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio Paper β’ 2410.12787 β’ Published Oct 16, 2024 β’ 31
Running on CPU Upgrade 12.7k 12.7k Open LLM Leaderboard π Track, rank and evaluate open LLMs and chatbots
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages Paper β’ 2407.19672 β’ Published Jul 29, 2024 β’ 56
view post Post If you're trying to run MoE Mixtral-8x7b under DeepSpeed w/ HF Transformers it's likely to hang on the first forward.The solution is here https://github.com/microsoft/DeepSpeed/pull/4966?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=en-US#issuecomment-1989671378and you need deepspeed>=0.13.0Thanks to Masahiro Tanaka for the fix. π 7 7 + Reply