Implementation of PSGD optimizer in JAX
-
Updated
Dec 31, 2024 - Python
Implementation of PSGD optimizer in JAX
Preconditioned Optimizers for MoE Training at scale, with out-of-the-box support for MuP and FSDP support for Muon, built on top of Megatron-LM and TransformerEngine.
Add a description, image, and links to the psgd topic page so that developers can more easily learn about it.
To associate your repository with the psgd topic, visit your repo's landing page and select "manage topics."