1 82 20

Maozhou Ge

Gmc2

GHGmc2

AI & ML interests

None yet

Recent Activity

liked a Space 8 days ago

nanotron/ultrascale-playbook

upvoted an article 18 days ago

Open R1: Update #2

upvoted an article 21 days ago

Open-source DeepResearch – Freeing our search agents

View all activity

Organizations

None yet

Gmc2's activity

liked a Space 8 days ago

1.78k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters

upvoted an article 18 days ago

Article

Open R1: Update #2

and 6 others •

18 days ago

• 191

upvoted an article 21 days ago

Article

Open-source DeepResearch – Freeing our search agents

25 days ago

• 1.11k

upvoted an article 29 days ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28

• 782

liked a model about 1 month ago

deepseek-ai/DeepSeek-R1

Text Generation • Updated 5 days ago • 4.63M • • 10.5k

upvoted a paper about 2 months ago

DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 53

liked a model about 2 months ago

facebook/multi-token-prediction

Updated Jun 18, 2024 • 364

upvoted a paper 3 months ago

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

Paper • 2411.14405 • Published Nov 21, 2024 • 58

liked a model 4 months ago

deepseek-ai/DeepSeek-V2-Lite

Text Generation • Updated Jun 25, 2024 • 60.7k • 126

upvoted a paper 4 months ago

GPT-4o System Card

Paper • 2410.21276 • Published Oct 25, 2024 • 84

upvoted 3 papers 5 months ago

Baichuan-Omni Technical Report

Paper • 2410.08565 • Published Oct 11, 2024 • 85

Pixtral 12B

Paper • 2410.07073 • Published Oct 9, 2024 • 64

Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

Paper • 2408.14158 • Published Aug 26, 2024 • 3

upvoted a paper 6 months ago

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

Paper • 2409.06666 • Published Sep 10, 2024 • 56

liked a Space 6 months ago

Pipeline Parallellism with Controllable Memory

🏆

Calculate and visualize different scheduling strategies

upvoted a paper 6 months ago

To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 42

upvoted 3 papers 7 months ago

upvoted a paper 8 months ago

Scaling Diffusion Transformers to 16 Billion Parameters

Paper • 2407.11633 • Published Jul 16, 2024 • 26