1 1 3

Yihua Zhang

NormalUhr

AI & ML interests

None yet

Recent Activity

published an article about 9 hours ago

DualPipe Explained: A Comprehensive Guide to DualPipe That Anyone Can Understand—Even Without a Distributed Training Background

published an article 17 days ago

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

published an article 21 days ago

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

View all activity

Organizations

NormalUhr's activity

published an article about 9 hours ago

Article

DualPipe Explained: A Comprehensive Guide to DualPipe That Anyone Can Understand—Even Without a Distributed Training Background

•

about 9 hours ago

published an article 17 days ago

Article

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

•

17 days ago

• 6

published an article 21 days ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

•

21 days ago

• 44

published an article 24 days ago

Article

A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons

•

24 days ago

• 2

published an article 24 days ago

Article

From Zero to Reasoning Hero: How DeepSeek-R1 Leverages Reinforcement Learning to Master Complex Reasoning

•

24 days ago

• 11

published an article 24 days ago

Article

MLA: Redefining KV-Cache Through Low-Rank Projections and On-Demand Decompression

•

24 days ago

• 5