This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Sung Kim
sungkim.bsky.social
did:plc:cq4gg3odxz2pzmkx2fuac3u3
Muon is Scalable for LLM Training
They found that Muon optimizer can be scaled up using the follow techniques:
• Adding weight decay
• Carefully adjusting the per-parameter update scale
📚 Code: https://github.com/MoonshotAI/Moonlight
🤗 Model: huggingface.co/moonshotai
📜 Paper: https://github.com/MoonshotAI/Moonlight/blob/master/Moonlight.pdf
[contains quote post or other embedded content]
2025-02-22T21:12:34.349Z