cooper-org / cooper Goto Github PK
View Code? Open in Web Editor NEWA general-purpose, deep learning-first library for constrained optimization in PyTorch
Home Page: https://cooper.readthedocs.io/
License: MIT License
A general-purpose, deep learning-first library for constrained optimization in PyTorch
Home Page: https://cooper.readthedocs.io/
License: MIT License
The current package configuration requires Python 38. I believe there is nothing specific keeping us from enabling Python 37. This can be helpful to make Cooper easier to use in Colab which currently supports Python 37.
Enable "schedulers" for the constraints.
Consider a base CMP with objective function
$$ \min_x f(x) \quad \text{s.t.} \quad g(x) \le \epsilon + \psi_t$$
such that the "slack"
Theoretically, this sequence of optimization problems should yield equivalent solutions to that of the base CMP. However, specific (implementations of) algorithms can benefit from relaxing the optimization problem, especially towards the beginning of the optimization.
In the end, we might care about achieving an (approximately) feasible solution of the base CMP that has a good value of the objective function. Thus, there is no need to be overly strict at the beginning of training trying to achieve the "final" constraint level
"Curriculum learning" (Bengio et al., 2009) successfully the idea of gradually adjusting the problem difficulty for supervised learning tasks in ML.
Pytorch's learning rate schedulers are usually tied to a particular optimizer. For this reason they might not be directly portable for implementing constraint schedulers, but some of their scheduler framework and implementations could be re-used.
Context: Many current training pipelines involve tweaking the step size/learning rate throughout training to achieve good performance. Two common scenarios for this are:
Proposal: Add capabilities to use Pytorch's LR schedulers along with Cooper.
Challenges:
primal_optimizer
as it is "fully instantiated" by the user before creating the ConstrainedOptimizer
.dual_optimizer
as we don't have access to the full optimizer before the Lagrange multipliers have been initialized.lr_scheduler.step()
?Originally posted by @gallego-posada in #16
Make Cooper as easy as possible to integrate with a vanilla Pytorch pipeline.
In particular, allow for the user to populate a CMPState
and provide it to Cooper without having to create a custom ConstrainedMinimizationProblem.
Note that we should not remove the CMP altogether. In particular, extrapolation and alternating updates would still require internal calls to cmp.closure
and cmp.defect_fn
.
This would require a major overhaul of the documentation, tutorials, and examples, showcasing that both the previous CMP approach and this new approach are admissible. Love for the documentation has also been requested in #53 and #29.
Users are currently required to implement a custom ConstrainedMinimizationProblem with a closure method.
This overhead can be detrimental to Cooper attracting ML researchers and practitioners who would otherwise compute the loss and constraint violations themselves.
For minimal integrations with Cooper which do not need "fancy features" like Augmented Lagrangian or Extrapolation, simply asking for the loss and constraint defects makes the user's life easier.
It may be possible to remove the CMP completely. Nonetheless, we would still require a closure
for an internal call during extrapolation steps and a defect_fc
for internal use by the AlternatingConstrainedOptimizer
.
Checkpoint the ConstrainedOptimizer. This entails saving and loading both the primal_optimizer.state_dict()
and dual_optimizer.state_dict()
.
This enhancement would facilitate checkpointing for projects which depend on Cooper. Currently, optimizer checkpoints must be handled manually by the user.
Pytorch's Saving and Loading a General Checkpoint: https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_a_general_checkpoint.html
The call to dual_scheduler.step()
happens at the end of ConstrainedOptimizer.step()
. This means that dual_scheduler
steps happen at the end of every optimization step and not every optimization epoch.
cooper/cooper/constrained_optimizer.py
Lines 235 to 240 in eae7c5a
This goes against the Pytorch convention on learning rate schedulers.
Calls to dual_scheduler.step()
should happen at the end of every epoch.
We could:
(i) Remove the call to dual_scheduler.step()
from inside the ConstrainedOptimizer
and let the user handle it. The instantiation of the dual scheduler could still be performed internally so that the partial_lr_scheduler
logic is preserved (and for the user to not worry about it).
(ii) Indicate whether a step is the last of its epoch and only then call dual_scheduler.step()
.
(iii) Create a class which wraps both the primal and dual schedulers (like how the ConstrainedOptimizer
wraps primal and dual optimizers), whose step()
method is manually called by the user at the end of an epoch. The class could internally handle the initialization of the dual scheduler. This approach is similar to (i)
Implement a "Multiplier Model" in Cooper. This idea comes from the paper by Narasimhan et al.
Instead of tracking the Lagrange Multipliers of a constrained optimization problem explicitly, consider a model which takes as input features associated with each constraint and outputs their corresponding multiplier.
This requires a new kind of Multiplier in Cooper different from a DenseMultiplier. In turn, we might want to create and abstract class for multipliers outlining how a custom multiplier needs to be implemented.
This also requires a new formulation or the modification of an existing one. Note that the dual_parameters
do not match the multipliers in this new functionality. Multipliers are the outputs of the Multiplier Model, whereas the dual_parameters
would be the actual weights of said model.
Consider problems with an extremely large (or even infinite) number of constraints. It becomes computationally intractable to evaluate and keep track of each constraint. Moreover, when solving the problem with an (explicit) Lagrangian formulation, there is the requirement to store and update an equally large number of multipliers.
With a Multiplier Model, Lagrange multipliers can be computed "on demand", given the constraints that were evaluated at each time step. As long as the Multiplier Model has less parameters than the total possible number of multipliers, there is a gain in terms of storage.
Also, note that the Multiplier Model's parameters are shared across constraints. Therefore, they can learn to have similar multipliers for constraints which tend to be similar to one another.
Remove existing aliases to objects in torch.optim
. Update documentation accordingly.
Having these aliases makes the interface opaque for the user: it is hard to tell that cooper.optim.SGD
is in fact the same implementation as torch.optim.SGD
.
This PR #52 made significant changes to the structure of the code for constrained optimizers. However, this change has not been fully integrated into the docs.
To aid the understanding of this great library one could add a very simple example problem, a proposal is shown below.
Currently, the optimization problem in the examples are rather complex. To make using the library more intuitive, one can add a dummy example that is so simple that it can be performed by hand, but that would similarly make the approach easy to understand.
I propose solving the following: minimize the distance to the origin such that it is on the line
an example problem discussed in the (Platt and Barr 1987) paper on constrained optimizations and led to the development of another Lagrange Multiplier Optimization library called mdmm
. The cooper
implementation is:
"""
Pet problem from Constrain Differential Optimization (Plat and Barr 1987).
The paper discusses general problems of the following format.
min f(x)
s.t.: g(x)
And discusses among others the example of finding the closest point to the origin such that it is on the line y=1-x.
This can be found by minimizing the following equation:
min x^2 + y^2
s.t.: y + x - 1 =0
Which is being solved by this example.
Bram van der Heijden
2023
"""
import matplotlib.pyplot as plt
import numpy as np
import cooper
import torch
class ConstraintOptimizer(cooper.ConstrainedMinimizationProblem):
def __init__(self, loss, constraint):
self.loss = loss
self.constraint = constraint
super().__init__(is_constrained=True)
def closure(self, x):
# Compute the cost function.
loss = self.loss(x)
# Compute the violation of the constraint function.
eq_defect = self.constraint(x)
return cooper.CMPState(loss=loss, eq_defect=eq_defect)
def f(x):
"""Cost function representing the square distance from the origin"""
loss = x[0]**2 + x[1]**2
return loss
def g(x):
"""Constraint function, representing a linear line y=1-x"""
loss = x[0] + x[1] - 1
return loss
# Define optimization problem.
cmp = ConstraintOptimizer(f, g)
formulation = cooper.LagrangianFormulation(cmp)
# Define the primal variables and optimizer.
x = torch.tensor([4, 3], dtype=torch.float32, requires_grad=True)
primal_optimizer = cooper.optim.ExtraAdam([x], lr=0.2)
# Define the dual optimizer, not fully instantiated.
dual_optimizer = cooper.optim.partial_optimizer(cooper.optim.ExtraAdam, lr=0.05)
# Wrap the optimizer together.
constraint_optimizer = cooper.ConstrainedOptimizer(formulation, primal_optimizer, dual_optimizer)
# Perform the training.
max_itr = 250
itr = np.arange(max_itr)
loss_itr = np.zeros(max_itr)
cons_itr = np.zeros(max_itr)
for i in range(max_itr):
# Print iteration updates.
loss_itr[i] = float(f(x))
cons_itr[i] = float(g(x))
print(f"Itr: {i:7,d}, loss {loss_itr[i]:4g}, conts {cons_itr[i]:4g}")
# Perform actual minimization.
constraint_optimizer.zero_grad()
lagrangian = formulation.composite_objective(cmp.closure, x)
formulation.custom_backward(lagrangian)
constraint_optimizer.step(cmp.closure, x)
# Exact solution is at (0.5, 0.5)
print("")
print("Result is: ", x.detach().numpy())
print("Error to exact is: ", float(torch.norm(x-torch.tensor([0.5, 0.5]))))
# Plot the convergence graph.
fig, ax = plt.subplots(1, 2)
ax[0].set_ylabel(f"Cost")
ax[0].set_xlabel(f"Iteration number")
ax[0].plot(itr, loss_itr)
ax[1].set_ylabel(f"Constraint")
ax[1].set_xlabel(f"Iteration number")
ax[1].plot(itr, cons_itr)
plt.show()
When using the bibtex in the README: "How to cite this work?", the name of Cooper does not render the correct capital letters.
The bibtex entry in the README requires an edit on its curly braces in order to avoid this.
When a dual restart is triggered, the dual variables are reset to their initial value 0.
Nonetheless, the state of the primal and dual optimizer remains the same. This may include running averages for momentum mechanisms.
These could be reset along with dual restarts when feasibility is achieved.
This would represent a full reset of the optimization protocol when the constraint is being satisfied. Currently, the reset is "half baked" in the sense that only dual variables are reset.
Reset the state of a Pytorch optimizer: https://discuss.pytorch.org/t/reset-adaptive-optimizer-state/14654/5
Modularize the constrained_optimizer.py
script to have a BaseConstrainedOptimizer
class from which specific optimization approaches can stem.
For instance, a SimultaneousOptimizer
, AlternatingOptimizer
, ExtrapolationOptimizer
, and in the future perhaps an ExtrapolationFromThePastOptimizer
.
Current implementation of ConstrainedOptimizer
is very flexible: it allows for unconstrained optimization, constrained optimization with simultaneous and alternating updates and extragradient updates.
Because of this, the logic inside the ConstrainedOptimizer
class has become overly complex.
cooper/cooper/constrained_optimizer.py
Lines 268 to 270 in f1d9554
A more modularized implementation would make the code easier to understand and less error prone. The added flexibility would enable implementing new optimization approaches, like extrapolation from the past [1].
A similar approach was taken with the Formulations, where a BaseLagrangianFormulation
underpins the LagrangianFormulation
and ProxyLagrangianFormulation
.
Hi, I was wondering if cooper can use cuda to accelerate the computation. (I thought it could, but there is no place to claim something like probs.cuda() )
Checkpointing capacities were enabled in #27, but the documentation and a minimal example have not been written.
There is a minus sign on the documentation of Lagrangian formulations which should not be there.
The Lagrange multipliers aim to maximize the Lagrangian (Boyd, $5.1). Therefore,
The current file says that the multiplier maximizes -L instead.
The minus sign should be erased, so that the explanation is mathematically correct.
Correct documentation.
No changes required in the source code.
No environment needed.
Error on documentation
Pytorch optimizers include a maximize
flag (Pytorch Issue)*. When set to True
, the sign of gradients is flipped inside optimizer.step()
before computing parameter updates. This enables gradient ascent steps natively.
NOTE: This sign flip does not affect the parameter's
.grad
attribute.
Cooper currently populates the gradients of dual variables with their negative value, so that descent steps performed by the dual optimizer are in fact ascent steps towards optimizing the problem formulation.
cooper/cooper/constrained_optimizer.py
Lines 401 to 406 in 09df759
We should not do the sign flipping manually but rather force set maximize=True
when instantiating the dual optimizer.
* This has been implemented on on Pytorch's master branch for every optimizer but LBFGS . On v1.12, Adam, SGD and AdaGrad support the flag, but not RMSProp. An assert could be included to ensure that the requested dual optimizer supports the flag.
โ This change would break compatibility with versions of Pytorch prior to 1.12.
Manually flipping gradients immediately after calculating them (thus ensuring that this happens before calls to dual_optimizer.step()
) is error prone.
Moreover, keeping track of the fact that gradients have a sign flipped is inconvenient.
By implementing this change we would adopt the official Pytorch approach for performing ascent steps.
The current implementation is functional.
Create wrappers cooper.backward
for formulation.custom_backward
and cooper.compute_lagrangian
for formulation.composite_objective
.
This would change the way a user interacts with Cooper. However, it does not represent a change in Cooper's back-end.
A cleaner pipeline for the user, where there is no need to access the methods of a formulation:
for step in range(num_steps):
...
constrained_optimizer.zero_grad()
lagrangian = cooper.compute_lagrangian(formulation, ...)
cooper.backward(formulation, lagrangian)
constrained_optimizer.step()
As opposed to the current pipeline:
for step in range(num_steps):
...
constrained_optimizer.zero_grad()
lagrangian = formulation.composite_objective(...)
formulation.custom_backward(lagrangian)
constrained_optimizer.step()
Alternating updates are currently untested. Should add a test for this.
Implement the augmented Lagrangian method (ALM; aka "method of multipliers").
The ALM (approximately) solves a sequence of unconstrained minimization problems on the primal variables, and updates the multipliers at the end of each minimization. Explicit Lagrange multiplier estimates help avoid the ill-conditioning that is inherent in the quadratic penalty function.
Consider a CMP with objective function
Given a positive penalty parameter sequence
$$ x^k = \arg \min_x L_{c^k}(x, \lambda^k) \triangleq f(x) + {\lambda^k}^{\top} h(x) + \frac{c^k}{2} ||h(x)||^2$$
with respect to
$$ \lambda^{k+1} = \lambda^{k} + c^k h(x^k).$$
The main advantage of the ALM compared to the quadratic penalty method is that (under some reasonable assumptions), the algorithm can be successful without increasing the penalty parameter
Bertsekas (1999) Section 4.2.1 suggests a way to handle inequality constraints.
Tseng and Bertsekas (1993) propose the "exponential multiplier method" (EMM), where an exponential, rather than a quadratic penalty is used. The resulting Lagrangian in EMM is twice differentiable if the objective and constraint functions are.
However, 2nd-order methods are challenging to scale to modern ML problems. Implementing EMM could be considered as an addition to the "standard" ALM, rather than a replacement.
constrained_optimizer.primal_optimizer
to be applied simultaneously across the set of optimizers.See Pytorch Lightning: Optimization for an example where this is implemented.
Consider optimization problems where different groups of variables are optimized with different approaches (optimizer classes).
In addition, consider setting different optimization hyper-parameters for each group of variables.
This is currently not supported by Cooper, as one primal optimizer is used to update all primal variables.
In the paper Controlled Sparsity via Constrained Optimization or: How I Learned to Stop Tuning Penalties and Love Constraints, the authors employ different learning rates for the "weights" and "gates" parameters of their models.
They implement this in a hacky way using Cooper: by defining one primal optimizer and setting different parameter_groups
to have different learning rates. If they wanted to use different optimizer classes for weights and gates, this approach would not work.
Finalize the code coverage addition.
Code coverage action and badge are part of the Cooper codebase, but they are not fully functional yet.
The documentation of this library is very limited and it is hard to parse how it should be applied for different applications. For instance, in Many RL algorithms such as PPO and MPO, an entropy regularization terms or the KL constraints between policies employed to stabilise standard RL objectives. Is it possible to give a practical example of how one can use this library for a dual problem with lagrangian multipliers for such applications?
Many thanks.
The StateLogger
in cooper.utils.state_logger.py
is not compatible with the latest version of Cooper's dev branch.
It assumes that a Formulation
will have a ConstrainedMinimizationProblem
as an attribute:
cooper/cooper/utils/state_logger.py
Lines 42 to 48 in 3e0aa6f
This is not the case since merging #56.
Update the interface of StateLogger
. store_metrics
should receive a CMPState
or the unpacked metrics.
Enable the user to provide several partially instantiated "dual optimizers". These could be handled "as a single" optimizer by applying all operations currently applied to constrained_optimizer.dual_optimizer simultaneously across the set of optimizers.
See #45 for a PR which enables support for multiple primal optimizers.
Note that dual optimizers are currently instantiated internally by Cooper after the instantiation of dual variables by the formulation. For a LagrangianFormulation
, this happens after a first call to cmp.closure
.
Therefore, too much granularity or flexibility for the the grouping of dual variables under a shared optimizer is trickier than in #45.
If restricted to groupings of constraints which are easily derived from the formulation/cmp (like inequality and equality constraints), this feature should be relatively easy.
Setting up different optimizer classes or hyperparameters across (groups of) constraints. For instance, one for equality constraints and another for inequality constraints.
A natural follow-up feature for [#45] would be [to allow for] multiple (partially instantiated) dual optimizers.
It might be tricky to allow for too much granularity (say a different optimizer for each constraint). But allowing for different optimizers for equality and inequality constraints could be a good first step.
However, this does not seem to be a pressing feature to implement at the moment.
Originally posted by @gallego-posada in #45 (comment)
Implement the extrapolation from the past algorithm (Popov, 1980). A good and modern source is Gidel et al. (2019).
This is an algorithm for computing parameter updates similar to extragradient: it computes the direction for updating parameters based on a "lookahead step". It is less intensive computationally than extragradient and enjoys similar convergence results for some class problems (Gidel et al., 2019).
Whereas extragradient requires two gradient computations per parameter update, extrapolation from the past stores and re-uses gradients from previous extrapolation steps for use during current extrapolation steps. This means less computational intensity in terms of gradient calculations, which may be helpful in some settings.
However, storing the previous gradients still means an overhead in terms of storage as opposed to gradient descent-ascent.
The instantiation of the dual_scheduler
inside ConstrainedOptimizer.load_from_state_dict
method has a bug. The variable containing the dual scheduler class is called dual_scheduler_class
as opposed to dual_scheduler
.
cooper/cooper/constrained_optimizer.py
Line 491 in f75dff3
Replace current line to dual_scheduler = dual_scheduler_class(dual_optimizer)
This should enable the loading of a dual_scheduler
from a checkpoint.
A tutorial example for the generation of time series data w.r.t. to constrains on the values.
An open question to me is if the inequalities below should result in a scalar ineq_defect
tensor, i.e., so I need to sum up the defects per time step, or if it is fine the way it is.
I want to use cooper to create experiments, e.g. for system identification. In order to ensure that the commanded inputs do not destroy the device, I want the resulting trajectories to obey some device-dependent constraints. I am interested how non-differentiable constraints can be added in this context (yes, I saw the classification demo but am still a bit confused).
There are for sure plenty, since I just started playing with cooper 1h ago :)
I know this code is unpolished, but I first wanted to know what you think of this idea before I put work into it.
from typing import Callable
import cooper
import matplotlib.pyplot as plt
import torch
def generate_traj(time: torch.Tensor, coeffs: torch.Tensor) -> torch.Tensor:
""" Helper function to generate a 5th order polynomial. """
return (
coeffs[0]
+ coeffs[1] * time
+ coeffs[2] * time ** 2
+ coeffs[3] * time ** 3
+ coeffs[4] * time ** 4
+ coeffs[5] * time ** 5
)
class MinDeviation(cooper.ConstrainedMinimizationProblem):
"""
Generate a trajectory which has the minimal L2 distance from a reference trajectory, which satisfying
an (arbitrarily chosen) inequality constraint on its values.
"""
def __init__(
self,
traj_ref: torch.Tensor,
time_ref: torch.Tensor,
traj_gen_fcn: Callable[[torch.Tensor, torch.Tensor], torch.Tensor],
):
# I know this class is far from elegant, but let's keep it for now.
super().__init__(is_constrained=True)
self.traj_ref = traj_ref.detach().clone()
self.time_ref = time_ref.detach().clone()
self.traj_gen_fcn = traj_gen_fcn
self.loss_fcn = torch.nn.MSELoss()
def closure(self, coeffs: torch.Tensor) -> cooper.CMPState:
# Generate the trajectory which we want to constrain.
traj = self.traj_gen_fcn(self.time_ref, coeffs)
# Evaluate the cost function.
loss = self.loss_fcn(self.traj_ref, traj)
# Evaluate the equality constraints g(x) = 0, i.e., the violation thereof.
eq_defect = None
# Evaluate the inequality constraints h(x) <= 0, i.e., the violation thereof.
ineq_defect = traj - torch.tensor([7.0]) # all values should be less or equal than 7
return cooper.CMPState(loss, ineq_defect, eq_defect)
if __name__ == "__main__":
# Configure the experimental design.
time = torch.linspace(0, 2, 51)
coeffs = torch.tensor([4, -3, 3, 2, 0.5, -1])
fig, axs = plt.subplots(1, 1, figsize=(12, 8))
traj_unconstr = generate_traj(time, coeffs)
axs.plot(time.numpy(), traj_unconstr.numpy())
# Define the problem and its Lagrangian formulation.
cmp = MinDeviation(traj_ref=traj_unconstr, time_ref=time, traj_gen_fcn=generate_traj)
formulation = cooper.LagrangianFormulation(cmp)
# Define the primal parameters and optimizer.
coeffs_param = torch.nn.Parameter(coeffs.clone()) # start with the param values of the unconstrained traj
primal_optimizer = cooper.optim.ExtraSGD([coeffs_param], lr=3e-3, momentum=0.7)
# Define the dual optimizer. Note that this optimizer has NOT been fully instantiated
# yet. Cooper takes care of this, once it has initialized the formulation state.
dual_optimizer = cooper.optim.partial_optimizer(cooper.optim.ExtraSGD, lr=9e-4, momentum=0.7)
# Wrap the formulation and both optimizers inside a ConstrainedOptimizer
coop = cooper.ConstrainedOptimizer(formulation, primal_optimizer, dual_optimizer)
# Here is the actual training loop.
# The steps follow closely the `loss -> backward -> step` Pytorch workflow.
for iter in range(3000):
coop.zero_grad()
lagrangian = formulation.composite_objective(cmp.closure, coeffs_param)
formulation.custom_backward(lagrangian)
coop.step(cmp.closure, coeffs_param)
assert not torch.any(torch.isnan(coeffs_param)), f"Observed the first NaN value at iter {iter}!"
# Get trajectory with the final coefficients.
with torch.no_grad():
traj_constr = generate_traj(time, coeffs_param)
axs.plot(time.numpy(), traj_constr.numpy(), ls="--")
plt.show()
Even after reading the classification tutorial, I did not fully understand when and how to use the proxy_ineq_defect
.
In my case, I am running a dynamics simulation given a set of inputs that result from a parameterized module.
These parameters should be optimized such that the observations made from the (differentiable) sim suffice some criteria.
These criteria could be that at all time steps the 2nd dim of the obs should be < 0.5 or that 90% of the values of the 3rd obs dim should be > 0.8. Let's say these criteria can be arbitrary.
What works so far is, that I have a function that computes the ineq_defect
passed to the CMP's state.
Using plain inequalities, I can optimize the input generator such that these criteria are met.
During this process, I run exactly one forward sim of my deterministic system.
Now, I understood your tutorial in a way that I thought if I want to have a non-differentiable criterion (e.g., the 90% one form above) I need to use the proxy_ineq_defect
.
However, when I use the same function as for the ineq_defect
(while removing that one from the CMP state), I get this error:
AttributeError: 'NoneType' object has no attribute 'mul_'
. I originates from the dual_step
method.
Any hints what I am missing?
P.S: When opening this issue, I could only choose between bug and enhancement which both don't apply here.
Modify README to include a more comprehensive example of how to use Cooper for machine learning problems. Some features to cover include:
torch.nn.Module
We could to have two versions:
Maybe only include the first one in the README, and keep the second version for the docs.
__init__
files throughout Cooper are not fully consistent.
For instance, Cooper.__init__
imports UnconstrainedFormulation
and LagrangianFormulation
, but not AugmentedLagrangianFormulation
.
Lines 18 to 22 in a8b3400
For consistency, we should import the AugmentedLagrangianFormulation
as well.
Note: Creating an issue as there may be other instances of this throughout the library.
Hello, i have discovered your library recently and i am trying to use it for my research. While trying to solve a constrained minimization problem, i encountered an error.
Here is the full error stack in a simplified case
Traceback (most recent call last):
File "/home/david.danan/psinns/Tests/ReportIssue.py", line 137, in <module>
EnergySolver(nodes=gridNodes,
File "/home/david.danan/psinns/Tests/ReportIssue.py", line 112, in EnergySolver
coop.step(loss.closure,predictedField,positions)
File "/opt/miniconda/envs/FEMINNSEnv/lib/python3.9/site-packages/cooper/constrained_optimizer.py", line 222, in step
lagrangian = self.formulation.composite_objective(
File "/opt/miniconda/envs/FEMINNSEnv/lib/python3.9/site-packages/cooper/lagrangian_formulation.py", line 232, in composite_objective
cmp_state = closure(*closure_args, **closure_kwargs)
File "/home/david.danan/psinns/Tests/ReportIssue.py", line 84, in closure
duxdxyz = grad(displacement[:, 0].unsqueeze(1), position, torch.ones(position.size()[0], 1, device=device), create_graph=True, retain_graph=True)[0]
File "/opt/miniconda/envs/FEMINNSEnv/lib/python3.9/site-packages/torch/autograd/__init__.py", line 300, in grad
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
As far as i can understand, the case where the loss function requires an explicit call to autograd to compute its value is not handled for now.
I had tried to redefine the behaviour of cooper.LagrangianFormulation by inheriting from it and adding an option to the backward call
class LagrangianFormulationConserveGraph(cooper.LagrangianFormulation):
@no_type_check
def _populate_gradients(
self, lagrangian: torch.Tensor, ignore_primal: bool = False
):
if ignore_primal and self.cmp.is_constrained:
pass
else:
lagrangian.backward(retain_graph=True)
if self.cmp.is_constrained:
for violation_for_update in self.state_update:
dual_vars = [_ for _ in self.state() if _ is not None]
violation_for_update.backward(inputs=dual_vars)
The error did disappear but the optimization was stucked (neither the loss nor the constraints changes throught the iterations). Therefore, i guess it was not the correct way to do it.
The full script is enclosed below and should be enough to reproduce this bug.
Be able to run the script and seing actual change throught the iteration process regarding the loss and constraints
For cooper, i followed the instruction on the Readme (pip)
cooper: 0.1.dev8+geae7c5a
All the other dependencies where installed within a conda environment:
-python=3.9.16
-numpy=1.22.3
-torch: 1.13.1
Since the whole script has some external dependencies not related to this issue, i share below the problematic part mentionned in the error stack in a simplified shorter version.
In a nutshell, here is what i am trying to do:
-i had an input field that i know (input) and another that i don't (output)
-both fields live on a grid ( 3 values per nodes in both cases)
-Using a neural network, i would like to describe the relationship between these 2 fields
-The loss function denotes a property of the output field, the optimal solution should minimize this quantity
-The constraints enforce the value of the field on a specific part of the grid (here, 0).
The only explicit dependencies within the script are numpy, cooper and torch.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import numpy as np
import torch
from torch.autograd import grad
import torch.nn as nn
import cooper
from cooper.problem import CMPState, Formulation
device = torch.device('cpu')
if torch.cuda.is_available():
print("CUDA is available, running on GPU")
device = torch.device('cuda')
torch.set_default_tensor_type('torch.cuda.FloatTensor')
else:
print("CUDA not available, running on CPU")
class TorchStraightForwardBuilder():
def __init__(self,
operations):
self.operations=operations
def BuildModel(self):
seqOperations=self.BuildSeqOperationsFromBlockOperations()
return TorchClassicalStraightForwardNetwork(operations=seqOperations)
def BuildSeqOperationsFromBlockOperations(self):
seqOperations=[self.BuildNNComponent(nnComponent) for nnComponent in self.operations]
return seqOperations
def BuildNNComponent(self,nnComponent:tuple):
brickName,params=nnComponent
if params is None:
return CreateTorchComponent(brickName)
return CreateTorchComponent(brickName,params)
class TorchClassicalStraightForwardNetwork(nn.Module):
def __init__(self,operations:list):
super(TorchClassicalStraightForwardNetwork, self).__init__()
self.allComponents=operations
self.torchComponents = nn.ModuleList([op for op in operations if isinstance(op,nn.Module)])
def forward(self, x):
inputVal=x
for component in self.allComponents:
output=component(inputVal)
inputVal=output
return output
def CreateTorchComponent(name,ops=None):
if ops==None:
if name=="ReLU":
return nn.ReLU()
else:
raise Exception("Activation function not recognized")
else:
if name=="Linear":
return nn.Linear(**ops)
else:
raise Exception("Layer not recognized")
def trapz(y,dx,axis=-1):
nd = y.ndimension()
slice1 = [slice(None)] * nd
slice2 = [slice(None)] * nd
slice1[axis] = slice(1, None)
slice2[axis] = slice(None, -1)
integralValue = torch.sum(dx * (y[tuple(slice1)] + y[tuple(slice2)]) / 2.0, axis)
return integralValue
class MinimizationEnergyProblem(cooper.ConstrainedMinimizationProblem):
def __init__(self,
constraintCondition,
integrationDomain):
super().__init__(is_constrained=True)
self.constraintIndices,self.constraintValues=constraintCondition
self.integrationDomain=integrationDomain
def closure(self, displacement, position):
duxdxyz = grad(displacement[:, 0].unsqueeze(1), position, torch.ones(position.size()[0], 1, device=device), create_graph=True, retain_graph=True)[0]
internalEnergy = duxdxyz[:, 0].unsqueeze(1) + 1
dx,dy,dz=self.integrationDomain["spacing"]
nx,ny,nz=self.integrationDomain["nbNodes"]
integrand = internalEnergy.reshape(nx,ny,nz)
internalEnergyIntegrated=trapz(trapz(trapz(integrand, dx=dz), dx=dy), dx=dx)
eq_defect=torch.flatten(displacement[self.constraintIndices] - self.constraintValues)
return cooper.CMPState(loss=internalEnergyIntegrated, eq_defect=eq_defect)
def EnergySolver(nodes,model,constraintCondition,integrationDomain):
modelFC=TorchStraightForwardBuilder(model)
modelOutput=modelFC.BuildModel().to(device)
loss = MinimizationEnergyProblem(constraintCondition=constraintCondition,
integrationDomain=integrationDomain)
formulation = cooper.LagrangianFormulation(loss)#LagrangianFormulationConserveGraph(loss)
positions=torch.from_numpy(nodes).to(device).requires_grad_(True)
predictedField = modelOutput(positions)
primal_optimizer = cooper.optim.ExtraSGD(modelOutput.parameters(), lr=3e-3, momentum=0.5)
dual_optimizer = cooper.optim.partial_optimizer(cooper.optim.ExtraSGD, lr=9e-3, momentum=0.5)
coop = cooper.ConstrainedOptimizer(formulation, primal_optimizer, dual_optimizer)
state_history = cooper.StateLogger(save_metrics=["loss", "eq_defect", "eq_multipliers"])
for iter_num in range(10):
coop.zero_grad()
lagrangian = formulation.composite_objective(loss.closure,predictedField,positions)
formulation.custom_backward(lagrangian)
coop.step(loss.closure,predictedField,positions)
if __name__ == '__main__':
model=[
("Linear",{"in_features":3, "out_features":50}),
("ReLU",None),
("Linear",{"in_features":50, "out_features":50}),
("ReLU",None),
("Linear",{"in_features":50, "out_features":3})
]
nNodesX,nNodesY,nNodesZ=10,20,30
x = np.linspace(0, 1, nNodesX).astype(np.float32)
y = np.linspace(0, 1, nNodesY).astype(np.float32)
z = np.linspace(0, 1, nNodesZ).astype(np.float32)
x,y,z=np.meshgrid(x,y,z)
x,y,z=x.flatten(),y.flatten(),z.flatten()
gridNodes=np.vstack((x,y,z)).transpose()
constraintIndices=np.where(x==0)[0]
constraintCondition=np.where(x==0)[0],torch.zeros((len(constraintIndices),3))
hSpacings=1/(nNodesX-1),1/(nNodesY-1),1/(nNodesZ-1)
integrationDomain={"spacing":hSpacings,"nbNodes":(nNodesX,nNodesY,nNodesZ)}
EnergySolver(nodes=gridNodes,
model=model,
constraintCondition=constraintCondition,
integrationDomain=integrationDomain)
The current code for tests includes substantial code repetition. A more systematic use of fixtures can help reduce this code redundancy.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.