zhanghang's picture

zhanghang

hangzhang-nlp

·

hangzhang-nlp

AI & ML interests

None yet

Recent Activity

upvoted a paper 21 days ago

Qwen2.5-VL Technical Report

upvoted a paper 22 days ago

LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization

upvoted a paper 29 days ago

Scaling Pre-training to One Hundred Billion Data for Vision Language Models

View all activity

Organizations

hangzhang-nlp's activity

upvoted a paper 21 days ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published 22 days ago • 161

upvoted a paper 22 days ago

LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization

Paper • 2502.13922 • Published 22 days ago • 25

upvoted a paper 29 days ago

Scaling Pre-training to One Hundred Billion Data for Vision Language Models

Paper • 2502.07617 • Published about 1 month ago • 29

upvoted a paper about 1 month ago

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

Paper • 2502.04328 • Published Feb 6 • 29

liked a Space about 1 month ago

VideoLLaMA3

Frontier Foundation Models for Video Understanding

upvoted 4 papers about 2 months ago

Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback

Paper • 2501.12895 • Published Jan 22 • 57

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

Paper • 2501.13106 • Published Jan 22 • 85

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published Jan 21 • 84

Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models

Paper • 2501.11873 • Published Jan 21 • 63

upvoted 2 papers 2 months ago

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published Dec 31, 2024 • 41

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 100

upvoted a paper 3 months ago

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 352

liked a dataset 5 months ago

BAAI/Infinity-Instruct

Viewer • Updated 16 days ago • 20.4M • 5.72k • 601

upvoted 2 papers 5 months ago

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Paper • 2410.17243 • Published Oct 22, 2024 • 90

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Paper • 2410.12787 • Published Oct 16, 2024 • 31

liked a Space 7 months ago

Open LLM Leaderboard

Track, rank and evaluate open LLMs and chatbots

upvoted a paper 8 months ago

SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

Paper • 2407.19672 • Published Jul 29, 2024 • 56

liked a model 8 months ago

openvla/openvla-7b

Image-Text-to-Text • Updated Sep 16, 2024 • 108k • 99

liked a Space 8 months ago

VideoLLaMA2

Media understanding

reacted to stas's post with 👍 10 months ago

Post

If you're trying to run MoE Mixtral-8x7b under DeepSpeed w/ HF Transformers it's likely to hang on the first forward.

The solution is here https://github.com/microsoft/DeepSpeed/pull/4966?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=en-US#issuecomment-1989671378

and you need deepspeed>=0.13.0

Thanks to Masahiro Tanaka for the fix.