Keras/TF implementation of AdamW, SGDW, NadamW, Warm Restarts, and Learning Rate multipliers
-
Updated
Jan 6, 2022 - Python
Keras/TF implementation of AdamW, SGDW, NadamW, Warm Restarts, and Learning Rate multipliers
Implements https://arxiv.org/abs/1711.05101 AdamW optimizer, cosine learning rate scheduler and "Cyclical Learning Rates for Training Neural Networks" https://arxiv.org/abs/1506.01186 for PyTorch framework
Newton-Muon + Preconditioned Optimizers for MoE Training at scale, with out-of-the-box support for MuP and FSDP support for Muon, built on top of Megatron-LM and TransformerEngine.
Quasi Hyperbolic Rectified DEMON Adam/Amsgrad with AdaMod, Gradient Centralization, Lookahead, iterative averaging and decorrelated Weight Decay
Pytorch implementation of lookahead optimizer(https://arxiv.org/pdf/1907.08610.pdf) and RAdam(https://arxiv.org/pdf/1908.03265.pdf)
Nadir: Cutting-edge PyTorch optimizers for simplicity & composability! 🔥🚀💻
Literature survey of convex optimizers and optimisation methods for deep-learning; made especially for optimisation researchers with ❤️
[ICLR 2026] LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning
SCAO is a sparse, second-order PyTorch optimizer designed as a high-throughput, drop-in replacement for AdamW.
GYRO is an optimizer for deep neural networks that augments Adam with a geometric rotation step applied to the gradient before momentum buffers are updated.
Lightweight, zero-dependency C++ Feedforward & Recurrent Neural Network library with native Python bindings (via pybind11).
Clean-room GPT-2/GPT-3 implementation: tokenizers, architecture blocks, training loop with AdamW + cosine decay, CLI scripts, inference tools, and pytest suite. Covers OpenWebText-10k & WikiText-103 workflows. Designed as an academic reference for understanding and scaling decoder-only transformers
Kaggle's plant disease image classification competition. Finetuning pre-trained CNN models, loss functions, and optimizers in order to achieve better results.
Computational Graph Library for Neural Network Training
reproduce Adam, AdamW, Adafactor optimizors with pytorch, and introduce popular optimizers in the training of the LLMs.
Drop-in PyTorch optimizer that beats AdamW with lower variance
11th place solution for the U-Tokyo Deep Learning Course MLP Competition (Top 0.8%). High-performance MLP implemented from scratch in NumPy, featuring AdamW, EMA, SWA, and MC Dropout.
Add a description, image, and links to the adamw topic page so that developers can more easily learn about it.
To associate your repository with the adamw topic, visit your repo's landing page and select "manage topics."