Comments (4)
Thanks for the update!
Yeah disabling weight decay for both optimizers is pretty meaningful and fair, thank you!
from lion-pytorch.
@xiangning-chen Hi Xiangning! Thank you for this interesting paper
So far I have been only testing with weight decay turned off. There are a lot of networks that are still trained with just plain Adam, and I wanted to see how Lion fares against Adam alone
from lion-pytorch.
@xiangning-chen but yes, I have noted the section in the paper where you said the weight decay needs to be higher
Let me add that to the readme to increase the chances people train it correctly
from lion-pytorch.
@xiangning-chen ok good luck! hope this technique holds up to scrutiny!
from lion-pytorch.
Related Issues (20)
- Instability when resuming trains HOT 22
- any new update? HOT 4
- What is the best learning rate you have found? for lora and dreambooth ty
- KeyError in update_fn_kernel when use_triton=True HOT 7
- add an 8-bit version with bitsandbytes HOT 5
- Loss explodes when resuming using trion implementation. HOT 3
- Add the implementation to official pytorch repo HOT 3
- Same amount of VRAM is taken as in AdamW HOT 5
- Strange Results on first step HOT 1
- Strange Results on first step HOT 11
- Does the Lion optimizer work with grad accumulation? HOT 1
- Performance experiments over AdamW HOT 1
- Do you have the actual weights trained from the paper?
- Convergence guarantees for Lion
- AMD ROCM versions
- Always getting NaNs in long training HOT 3
- This issue actually still persists. My python environment:
- Using Triton with PyTorch 2.0 for AMP training results in tensors containing inf values.
- Learning rate scaling for distributed training? HOT 3
- Adaptive learning-rate optimization HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lion-pytorch.