Kaiyue Wen's picture

2

Kaiyue Wen

KaiyueWen

AI & ML interests

None yet

Recent Activity

authored a paper 18 days ago

Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization

authored a paper 18 days ago

RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval

authored a paper 18 days ago

Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models

View all activity

Organizations

None yet

KaiyueWen's activity

authored 3 papers 18 days ago

Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization

Paper • 2307.11007 • Published Jul 20, 2023

RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval

Paper • 2402.18510 • Published Feb 28, 2024

Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models

Paper • 2501.11873 • Published 19 days ago • 63

upvoted a paper 2 months ago

APOLLO: SGD-like Memory, AdamW-level Performance

Paper • 2412.05270 • Published Dec 6, 2024 • 38

upvoted a paper 7 months ago

Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On

Paper • 2407.08348 • Published Jul 11, 2024 • 51