Coder Social home page Coder Social logo

leopard-ai / betty Goto Github PK

View Code? Open in Web Editor NEW
322.0 9.0 28.0 2.91 MB

Betty: an automatic differentiation library for generalized meta-learning and multilevel optimization

Home Page: https://leopard-ai.github.io/betty/

License: Apache License 2.0

Python 100.00%
autodiff automatic-differentiation bilevel-optimization meta-learning multilevel-optimization hyperparameter-optimization neural-architecture-search reinforcement-learning artificial-intelligence machine-learning

betty's People

Contributors

dakshitbabbar avatar ramtinhoss avatar ruiyiz5 avatar sangkeun00 avatar willieneis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

betty's Issues

[Suggestion] torch.func in Pytorch 2.1

Have you considered changing the implementation of IterativeProblem from functorch to torch.func() since the functorch APIs will be deprecated in future versions?

WandB logging issue

wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 2
wandb: You chose 'Use an existing W&B account'
wandb: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
wandb: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:
wandb: Appending key for api.wandb.ai to your netrc file: /root/.netrc
wandb: ERROR Error while calling W&B API: entity pod-tuning not found during upsertBucket (<Response [404]>)

[BUG] Training gets slower over iterations with IterativeProblem

Training with IterativeProblem gets slower over iterations. Below is the log from the logistic regression example. The same bug was also observed with the maml example.

[2022-07-01 11:01:53] [INFO] [Problem "outer"] [Global Step 1000] [Local Step 10] loss: 0.5296119451522827
[2022-07-01 11:01:56] [INFO] [Problem "outer"] [Global Step 2000] [Local Step 20] loss: 0.3373554050922394
[2022-07-01 11:01:59] [INFO] [Problem "outer"] [Global Step 3000] [Local Step 30] loss: 0.31969979405403137
[2022-07-01 11:02:02] [INFO] [Problem "outer"] [Global Step 4000] [Local Step 40] loss: 0.31455692648887634
[2022-07-01 11:02:05] [INFO] [Problem "outer"] [Global Step 5000] [Local Step 50] loss: 0.31011053919792175
[2022-07-01 11:02:08] [INFO] [Problem "outer"] [Global Step 6000] [Local Step 60] loss: 0.3047352433204651
[2022-07-01 11:02:12] [INFO] [Problem "outer"] [Global Step 7000] [Local Step 70] loss: 0.301718533039093
[2022-07-01 11:02:15] [INFO] [Problem "outer"] [Global Step 8000] [Local Step 80] loss: 0.30068764090538025
[2022-07-01 11:02:19] [INFO] [Problem "outer"] [Global Step 9000] [Local Step 90] loss: 0.29966291785240173
[2022-07-01 11:02:22] [INFO] [Problem "outer"] [Global Step 10000] [Local Step 100] loss: 0.2992149293422699
[2022-07-01 11:02:27] [INFO] [Problem "outer"] [Global Step 11000] [Local Step 110] loss: 0.2989771068096161
[2022-07-01 11:02:31] [INFO] [Problem "outer"] [Global Step 12000] [Local Step 120] loss: 0.2986523509025574
[2022-07-01 11:02:36] [INFO] [Problem "outer"] [Global Step 13000] [Local Step 130] loss: 0.29848340153694153
[2022-07-01 11:02:42] [INFO] [Problem "outer"] [Global Step 14000] [Local Step 140] loss: 0.29845142364501953
[2022-07-01 11:02:47] [INFO] [Problem "outer"] [Global Step 15000] [Local Step 150] loss: 0.2984345257282257
[2022-07-01 11:02:53] [INFO] [Problem "outer"] [Global Step 16000] [Local Step 160] loss: 0.2983992397785187
[2022-07-01 11:03:00] [INFO] [Problem "outer"] [Global Step 17000] [Local Step 170] loss: 0.2983682453632355
[2022-07-01 11:03:08] [INFO] [Problem "outer"] [Global Step 18000] [Local Step 180] loss: 0.29832738637924194
[2022-07-01 11:03:15] [INFO] [Problem "outer"] [Global Step 19000] [Local Step 190] loss: 0.29827484488487244
[2022-07-01 11:03:23] [INFO] [Problem "outer"] [Global Step 20000] [Local Step 200] loss: 0.29820242524147034
[2022-07-01 11:03:31] [INFO] [Problem "outer"] [Global Step 21000] [Local Step 210] loss: 0.29809457063674927
[2022-07-01 11:03:38] [INFO] [Problem "outer"] [Global Step 22000] [Local Step 220] loss: 0.2979184091091156

[Request] Improve distributed training performance

PyTorch minimizes throughput degradation by overlapping communication and computation in distributed training.
However, Betty currently performs computation first and then manually perform gradient synchronization, not using the computation-communication overlapping technique.
This is mainly due to the fact that hypergradient calculation oftentimes requires second-order gradient computation as well as multiple forward-backward propagations.
To improve distributed training performance we can:

  1. make the use of PyTorch's native communication-computation overlap by replacing torch.autograd.grad with torch.autograd.backward
  2. keep most computations in hypergradient calculation local, and perform gradient synchronization at the end once.

How to control the times of optimizer.step() for different level?

Hello, I'm not an expert on MLO. If I understand correctly, in one iteration, the level-2 module and level-1 module are all updated once (that is, their optimizer.step() is called both once)?

My question is how can I control this. For example, I want to have level-2 calls optimizer.step() for 100 times and then calls optimizer.step() 1 time for level-1. Thanks!

[REQUEST] Distributed data parallel training

Currently, Betty only supports torch.nn.DataParallel. Compared to torch.nn.parallel.DistributedDataParallel, torch.nn.DataParallel is much slower even in the single-machine multi-gpu settings. Therefore, we need to replace torch.nn.DataParallel with torch.nn.parallel.DistributedDataParallel for better training speed and the multi-machine multi-gpu support.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.