Yihua Zhang's picture

1 3 4

Yihua Zhang

NormalUhr

·

https://www.yihua-zhang.com

AI & ML interests

None yet

Recent Activity

commented on their article 5 days ago

MLA: Redefining KV-Cache Through Low-Rank Projections and On-Demand Decompression

published an article 6 days ago

Re-understanding KL Approximation from an RL-for-LLM Lens: Notes on “Approximating KL Divergence”

published an article 8 days ago

From GRPO to DAPO and GSPO: What, Why, and How

View all activity

Organizations

commented on MLA: Redefining KV-Cache Through Low-Rank Projections and On-Demand Decompression 5 days ago

The issues have been fixed. Thanks for letting me know. This is because the math engine in HuggingFace is different from the one we usually use. It uses "\(" and "\)" to wrap in-line equations, rather than "(" and ")". Obviously, I was not aware of that before.

published an article 6 days ago

Article

Re-understanding KL Approximation from an RL-for-LLM Lens: Notes on “Approximating KL Divergence”

By

•

6 days ago

published an article 8 days ago

Article

From GRPO to DAPO and GSPO: What, Why, and How

By

•

8 days ago

• 9

liked a model 12 days ago

openai/gpt-oss-20b

Text Generation • 22B • Updated 4 days ago • 3.42M • • 3.04k

upvoted a paper 14 days ago

Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26 • 57

published an article 2 months ago

Article

Decorators in Machine Learning

By

•

Jun 8

upvoted a paper 3 months ago

Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment

Paper • 2505.11821 • Published May 17 • 14

published an article 6 months ago

Article

DualPipe Explained: A Comprehensive Guide to DualPipe That Anyone Can Understand—Even Without a Distributed Training Background

By

•

Feb 28

• 11

published an article 6 months ago

Article

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

By

•

Feb 11

• 58

published an article 6 months ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

By

•

Feb 7

• 207

published an article 6 months ago

Article

A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons

By

•

Feb 4

• 14

published an article 6 months ago

Article

From Zero to Reasoning Hero: How DeepSeek-R1 Leverages Reinforcement Learning to Master Complex Reasoning

By

•

Feb 4

• 16

published an article 6 months ago

Article

MLA: Redefining KV-Cache Through Low-Rank Projections and On-Demand Decompression

By

•

Feb 4

• 15

upvoted an article 11 months ago

Article

Optimizing your LLM in production

By

•

Sep 15, 2023

• 19

New activity in OPTML-Group/UnlearnCanvas about 1 year ago

NonMatchingSplitsSizeError

#2 opened over 1 year ago by

authored a paper over 1 year ago

UnlearnCanvas: A Stylized Image Dataset to Benchmark Machine Unlearning for Diffusion Models

Paper • 2402.11846 • Published Feb 19, 2024 • 1

updated a dataset over 1 year ago

OPTML-Group/UnlearnCanvas

Viewer • Updated Mar 6, 2024 • 1.76k • 1.14k • 2

liked a dataset over 1 year ago

OPTML-Group/UnlearnCanvas

Viewer • Updated Mar 6, 2024 • 1.76k • 1.14k • 2

liked a Space over 1 year ago

UnlearnCanvas Benchmark

Filter and compare unlearning methods for benchmarking

liked a Space about 2 years ago

MusicGen

Generate music from text descriptions and optional melodies