Coder Social home page Coder Social logo

corail-research / seapearl.jl Goto Github PK

View Code? Open in Web Editor NEW
169.0 6.0 10.0 8.91 MB

Julia hybrid constraint programming solver enhanced by a reinforcement learning driven search.

Home Page: https://corail-research.github.io/SeaPearl.jl/dev/

License: BSD 3-Clause "New" or "Revised" License

Julia 100.00%
constraint-programming reinforcement-learning julialang dynamic-programming research graphs

seapearl.jl's Introduction

codecov license Docs Build TagBot

drawing

SeaPearl is a Constraint Programming solver that can use Reinforcement Learning agents as value-selection heuristics, using graphs as inputs for the agent's approximator. It is to be seen as a tool for researchers that gives the possibility to go above and beyond what has already been done with it.

The paper accompanying this solver can be found on the arXiv. If you use SeaPearl in your research, please cite our work.

The RL agents are defined using ReinforcementLearning.jl, their inputs are dealt with using Flux.jl. The CP part, inspired from MiniCP, is focused on readability. The code is meant to be clear and modular so that researchers could easily get access to CP data and use it as input for their ML model.

Installation

]add SeaPearl

Use

Working examples can be found in SeaPearlZoo and documentation can be found here.

SeaPearl can be used either as a classic CP solver that uses predefined variable and value selection heuristics or as Reinforcement Learning driven CP solver that is capable of learning through solving automatically generated instances of a given problem (knapsack, tsptw, graphcoloring, EternityII ...).

SeaPearl as a classic CP solver :

To use SeaPearl as a classic CP solver, one needs to :

  1. declare a variable selection heuristic :
YourVariableSelectionHeuristic{TakeObjective} <: SeaPearl.AbstractVariableSelection{TakeObjective}
  1. declare a value selection heuristic :
BasicHeuristic <: ValueSelection
  1. create a Constraint Programming Model :
trailer = SeaPearl.Trailer()
model = SeaPearl.CPModel(trailer)

#create variable : 
SeaPearl.addVariable!(...)

#add constraints : 
SeaPearl.addConstraint!(model, SeaPearl.AbstractConstraint(...))

#add optionnal objective function : 
SeaPearl.addObjective!(model, ObjectiveVar)

SeaPearl as a RL-driven CP solver :

To use SeaPearl as a RL-driven CP solver, one needs to :

  1. declare a variable selection heuristic :
CustomVariableSelectionHeuristic{TakeObjective} <: SeaPearl.AbstractVariableSelection{TakeObjective}
  1. declare a value selection learnedheuristic :
LearnedHeuristic{SR<:AbstractStateRepresentation, R<:AbstractReward, A<:ActionOutput} <: ValueSelection
  1. define an agent :
agent = RL.Agent(
policy=(...),
trajectory=(...),
)
  1. optionally, declare a custom reward :
CustomReward <: SeaPearl.AbstractReward 
  1. optionally, declare a custom StateRepresentation ( instead of the Default tripartite-graph representation ) :
CustomStateRepresentation <: SeaPearl.AbstractStateRepresentation
  1. optionally, declare a custom featurization for the StateRepresentation :
CustomFeaturization <: SeaPearl.AbstractFeaturization
  1. create a generator for your given problem, that will create different instances of the specific problem used during the learning process.
CustomProblemGenerator <: AbstractModelGenerator
  1. set a number of training epochs, declare an evaluator, a Strategy, a metric for benchmarking
nb_epochs = 3000
CustomStrategy <: SearchStrategy #DFS, RBS, ILDS 

CustomEvaluator <: AbstractEvaluator #or use predefined one : SeaPearl.SameInstancesEvaluator(...)
function CustomMetricsFun
  1. launch the training :
metricsArray, eval_metricsArray = SeaPearl.train!(
valueSelectionArray=valueSelectionArray,
generator=tsptw_generator,
nbEpisodes=nbEpisodes,
strategy=strategy,
eval_strategy=eval_strategy,
variableHeuristic=variableSelection,
out_solver = true,
verbose = true,
evaluator=SeaPearl.SameInstancesEvaluator(valueSelectionArray,tsptw_generator; evalFreq = evalFreq, nbInstances = nbInstances, evalTimeOut = evalTimeOut),
restartPerInstances = restartPerInstances
)

Contributing to SeaPearl

All contributions are welcome! Have a look at our contributing guidelines.

seapearl.jl's People

Contributors

ilancoulon avatar felixchalumeau avatar jardinetsouffleton avatar navaxel avatar pierretsr avatar github-actions[bot] avatar max9294d avatar 3rdcore avatar gostreap avatar marco-novaes98 avatar qcappart avatar louis-gautier avatar sandertom avatar kimriouxparadis avatar casbex avatar pitmonticone avatar ziadelassal avatar

Stargazers

Dan Moorehead | Power Web5 | Free Only Web5 No-Code/Low-Code Apps, DB, Smart Spreadsheets Platform avatar Alexandre Chanson avatar Adrian avatar Bocheng Lin avatar Misael Cureño avatar Luiza Santos avatar Yu Sun avatar Swann M Bessa avatar Joseph Willem Ricci avatar Elvin Zeynalli avatar andrew sòlanto avatar Ebrahim Pichka avatar Mohammad Reza avatar  avatar Yoann Sabatier avatar Mistress Alexis avatar Gordon Fierce avatar Corneliu Cofaru avatar Pascal Quach avatar Liliane-Caroline Demers avatar Joey avatar  avatar  avatar Marco Dos Santos avatar  avatar  avatar  avatar Ochibobo avatar  avatar  avatar  avatar Bo Du avatar Ali Kawtharani avatar Sobihan Surendran avatar  avatar Elian Morel avatar Albert Lam avatar  avatar ebigram avatar  avatar Elias Carvalho avatar Marius Fersigan avatar Shayan Davoodi avatar Wit Jakuczun avatar Andries Rosseau avatar Caio Luke avatar Daniel Kloimwieder avatar  avatar Thomas Jacquet avatar Loïc Grumiaux avatar  avatar Vincent Beiglig avatar  avatar Mina Parham avatar Haitham Saleh avatar  avatar  avatar  avatar  avatar  avatar Guillaume Blanché avatar  avatar  avatar Pablo Balaguer avatar HerloConnell avatar Daniel Mahler avatar  avatar Jean-Philippe Adielou avatar Páll Haraldsson avatar Félix-Antoine Constantin avatar  avatar Ngoma avatar Jino avatar  avatar Jean Boisvert avatar  avatar  avatar  avatar  avatar Kobi Felton avatar Jose Cohenca avatar Ruth Misener avatar Andrea Micheli avatar  avatar  avatar  avatar  avatar  avatar dentalFloss avatar  avatar Vartan Benohanian avatar Patrick Dube avatar  avatar  avatar Pat Steeves avatar Alexander Blostein avatar  avatar  avatar Benjamin Langlois-Therien avatar  avatar

Watchers

 avatar cheng zhang avatar  avatar Wit Jakuczun avatar Mohamed Tarek avatar  avatar

seapearl.jl's Issues

[CP] No easy way to generate the opposite of a variable, y = -x

By now, to generate the opposite of a variable y = -x, one needs to proceed as follow :

x = SeaPearl.IntVar(1, 5, "x", trailer)
y = SeaPearl.IntDomainViewOpposite(x.domain)

It would be nice to somehow overwrite the base operator - to automatically generate a IntDomainViewOpposite variable. This would be a good feature to facilitate future CP problem implementation and can be a nice good first issue for a newcomer.

Action space of variable size

I'm working on that in #17 , the approximator needs to give a vector of Q-values that will have the same size as the number of possible actions given in the input (it is actually the size of the domain we are branching on).

This is not trivial, but I think I'll be able to do something not that ugly.
The issue I'm seeing there is that it will be impossible to use batches with that: I think you cannot batch on vectors of (potentially very) different size.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Adding common generic value ordering heuristics

It would be easy (and convenient) to have at least the following basic generic value heuristics:

  • Lexicographic (see #63)
  • ReverseLexicographic
  • Median
  • Random

Edit: After discussing briefly with Gilles, Median is not that great of an idea: it cannot be done easily in O(1) and is not experimentally reported as being better than Lexicographic .

CPModification not correctly updated in presence of views

In this example, the CPModification object does not get correctly updated:

trailer = SeaPearl.Trailer()
vars = SeaPearl.AbstractIntVar[]

x = SeaPearl.IntVar(2, 3, "x", trailer)
push!(vars, x)
y = SeaPearl.IntVar(5, 15, "y", trailer)
ax = SeaPearl.IntVarViewMul(x, 3, "3x")
push!(vars, ax)

minusY = SeaPearl.IntVarViewOpposite(y, "-y")
push!(vars, minusY)
constraint = SeaPearl.SumGreaterThan(vars, 5, trailer)

toPropagate = Set{SeaPearl.Constraint}()
prunedDomains = SeaPearl.CPModification()
SeaPearl.propagate!(constraint, toPropagate, prunedDomains)

After the propagate! call, we have

prunedDomains = Dict{String,Array{Int64,N} where N}(
   "3x" => [6],
   "-y" => [-8, -9, -10, -11, -12, -13, -14, -15]
)

but the value 2 was removed from x.domain and this modification is not visible.

Change the way evaluation of a learned heursitic is made

As proposed by @qcappart during the last meeting, it could be interesting to select a "pool" of n instances before the training and to evaluate the learned heuristic on this pool regularly during the training. Although a bit slower, it is a more thorough way of evaluating policies.

DP model vs CP model

We are now able to launch experiences to see the effect of the DP modelisation to solve problems: I guess the knapsack could be the first problem we could try this on as Ilan has already created both models.
We only need to create a generator for DP modelisation of the knapsack but this shouldn't be very long.

Make it possible to change easily the way RL represention is made

It is possible to change easily features, rewards and nn architecture. But the last important point one would like to change while doing researches is the "RL representation". We should definitely think about a clean way to make this possible for a user to change the RL representation outside the package.

Try to give some metrics as input of the RL agent

It could be interesting to try to add some metrics to the input of the RL agent, basically concatenating metrics to an existing state vector.

It's far from being a priority but I'll put it here as it could be interesting to try one day or another.

Training with graphs of variable number of nodes

We must work in order to let our agents train on graph of mutliple size in terms of nodes. This is already needed when working with Knapsack as stated in #17.

The issue is actually in the buffer which has a fixed size. What we could do in the beginning is actually stating a maximum number of nodes, and storing the number of nodes in the array of the state as a column (e.g. a one-hot encoder telling which rows should be considered as nodes).

I think I'll work on that.

Create more advanced possibilities for reward engineering

The fact that our RL Agent is training inside the solver offers a lot of opportunities in term of reward engineering to give the agent very complete insights about the consequences of his choices. We should definitely make it possible to create more precise and insightful rewards.

A first possibility is to get more statistics about what is happening inside the solver: we want to know when a solution is found, when we reached an unfeasible point etc... But we also want to know how it happens: as a matter of fact, getting fast to a unfeasible solutions is not that bad (for instance, if you reach the best solution in your first tries, you will go to a few Unfeasible points before stopping the search and yielding the solution with the optimal status: this shows that encountering Unfeasible points is not bad, it will always happen. What is bad is rather going all the way down in the tree and finding an Unfeasible solution.)

This quick example just to tell that we could achieve very intereting reward engineering with more insights about what is happening during a search.

I can definitely assign myself to this.

Cleaning SeaPearlZoo

SeaPearlZoo would deserve a bit of cleaning. We can clearly delete a lot of useless files, and clean the remaining ones.
Plus the READMEs should be filled.

Get more insights about what the agent is doing

I sometime find hard to analyse what is happening with the agent I train, for example:

  • is he always throwing minimum values or is he able to do something more sophisticated ?
  • is he excellent at finding a good solution but very bad at proving optimality ?

We should definitely work on those aspects to be able to have a more consistant "research process" !

Using of MOI

MAthOptInterface does not seem ready for our CPModel as it is.

My PR right there jump-dev/MathOptInterface.jl#1101 could help with that but I had trouble getting it merged.

To simplify, I think the easiest way will be to develop our own intermediary structure, and do the bridges by ourselves. That's what I'll be working on for now.

Bug in the TSPTW test

An inconsistant behavior has been spotted on the test Search known instance line 166 in test/datagen/tsptw.jl.

The test aims at checking the correct generation of tsptw instances, by solving already resolved problems and controling the existence of a solution. However, this test randomly fails despite the specification of a seed, which make the construction of the instance deterministic.

It appears that the issue doesn't come from the problem generation, but from the CP solver itself. After a quick investigation, I managed to find the point where the bug occurs in the search tree. Indeed, as both heuristics are deterministic in this test, the solver always explores the same path in the search tree. But at some point, a variable assignment triggers a random behavior in fixpoint! resulting in a valid or invalid assignment without any obvious explanation.

I tweaked the original test a little bit to reproduce the bug:

using SeaPearl

trailer = SeaPearl.Trailer()
model = SeaPearl.CPModel(trailer)

n_city = 21
grid_size = 150
max_tw_gap = 3
max_tw = 8

generator = SeaPearl.TsptwGenerator(n_city, grid_size, max_tw_gap, max_tw)

SeaPearl.fill_with_generator!(model, generator; seed=42)

SeaPearl.assign!(model.variables["a_1"], 19)
SeaPearl.assign!(model.variables["a_2"], 20)
SeaPearl.assign!(model.variables["a_3"], 16)
SeaPearl.assign!(model.variables["a_4"], 11)
SeaPearl.assign!(model.variables["a_5"], 14)
SeaPearl.assign!(model.variables["a_6"], 8)
SeaPearl.assign!(model.variables["a_7"], 17)

SeaPearl.fixPoint!(model)

SeaPearl.assign!(model.variables["a_8"], 5)

status, prunedDomains = SeaPearl.fixPoint!(model, SeaPearl.getOnDomainChange(model.variables["a_8"]))

On my system on successive runs, fixPoint! returns false a third of the time.
On every failed run I've had in the original test, this last assignment always was the one triggering the failure.

I double-checked that the distance matrix and time_windows generation was consistant (supposedly the only variables randomly generated in TsptwGenerator). I haven't been able to check that the initialization of every variable/constraint is the same at each run, but they aren't suppose to be random. I haven't been able to check yet which constraint/part of fixPoint! is responsible for this behavior.

However, this part becomes tricky because there are 1500+ constraints in the model and fixPoint! doesn't seem to call them all in the same order. I'll keep digging and will post any new lead.

My current leads are:

  • a failing constraint pruning to much values
  • a mishandling of a variable in fixPoint!

[CP] No easy way to generate an offsetted variable, y = x + c

By now, to generate an multiple of a variable of the form y = ax, one needs to proceed as follow :

c = 3
x = SeaPearl.IntVar(1, 5, "x", trailer)
y= SeaPearl.IntDomainViewOffset(x.domain, c)

It would be nice to somehow overwrite the base operator + to automatically generate a IntDomainViewOffset variable. This would be a good feature to facilitate future CP problem implementation and can be a nice good first issue for a newcomer.

Make Lexicographic the default value of a BasicHeuristic

I think that the Lexicographic value ordering BasicHeuristic(x -> minimum(x.domain)) is more common in CP than BasicHeuristic(x -> maximum(x.domain)). The naming is also more explicit. I propose simply replacing BasicHeuristic by a lexicographic ordering. What do you think?

Because this can cause problems of reproducibility in your personal experiments, I can wait your green light.

Representable variables

Inspired by #71 . In the same way that we can choose if a variable is branchable, it could be interesting to have the flexibility to choose if a variable should (or not) be added to the DefaultStateRepresentation.

Not a priority though.

metricsFun can not handle more than 1 classic heuristic

BUG While running experiences, the metricsFun doesn't make any difference between all possible classic heuristic while filling MetricsArray, making the results not relevant for analysis.

SeaPearl should maybe provide some classic Metrics Functions to help the user analyze the performance of his model. -

Explore the idea of having two (or more) agents to solve a problem

Finding as fast a possible the best solution and proving optimality is not the same challenge at all and it is very hard for an agent to understand that he should "switch mode". It would hence be very interesting to try working with two agents, one learning to find the best solution asap and one finding the optimality asap.

prunedDomains update check

During the propagate! function, the variable prunedDomains is supposed to store all the domain modifications. However, @ilancoulon warned us that the update isn't entirely reliable yet. As it is a very useful feature, it would be great to ensure that this variable is always properly updated.

Make it possible to give rewards later (not directly after an action is taken)

Sometimes, we are much more able to judge an actions later by analysing subtrees that were explored. Thus it would be very interesting to make the framework able to give associated rewards later instead of just after.

This is not straight forward and not a priority but rather an interesting thing to try if possible.

[CP] No easy way to generate a multiple of a variable, y = ax

By now, to generate an multiple of a variable of the form y = ax, one needs to proceed as follow :

a = 3
x = SeaPearl.IntVar(1, 5, "x", trailer)
y = SeaPearl.IntDomainViewMul(x.domain, a)

It would be nice to somehow overwrite the base operator * to automatically generate a IntDomainViewMul variable. This would be a good feature to facilitate future CP problem implementation and can be a nice good first issue for a newcomer.

Unpractical type casting for Vector Constraints

Currently all the constraints that take a vector of variables as input are casted as Vector{AbstractIntVar}. This is useful in order to define constraints on IntVar but also on IntVarView.
However, by doing it that way, the user is forced to cast its vector as AbstractIntVar eventhough the vector contains only a more specific variable type, which is quite unpractical.

This could be fixed easily by casting the input of these constraints as Vector{<:AbstractIntVar}, and it would even allow a more specific dispatch in the constraints if needed.

Making the package more flexible to research experiences

I find the pacakge not adapted enough to experiences at the moment as we need to go to the source code almost everytime we wanna test something. Thus, it would be great to change some stuffs:

  • be able to redefine rewards easily
  • be able to define the node features
  • split the GCN in something more "modulable": at the moment, if we create a layer like "filter" which extract the node feature we want from the feature matrix, we will be able to create our approximator directly inside our "experiences files" and we will be able to change it very easily, without having to change the source code !

I can assign myself to this if you also think that we should improve those aspects !

[CP] No easy way to generate a linear constraint of the form x + y + ... = 0

Given two or more IntVar or IntVarView variables x and y, it is not possible to set up a constraint in this way : x + y = 0. You have to proceed as follow which is laborious :

x = SeaPearl.IntVar(2, 3, "x", trailer)
y = SeaPearl.IntVar(2, 3, "y", trailer)
constraint = SeaPearl.SumToZero([x,y], trailer)

It would be nice to somehow overwrite base operators + between AbstractIntVar and = to automatically generate aSumToZero` constraint. This would be a good feature to facilitate future CP problem implementation and can be a nice good first issue for a newcomer.

Possible TableCT optimisation

After discussing the problem Eternity 2 with @tomsander1998 we figured out a potential optimisation of the table constraint.

Indeed, in Eternity 2, each piece is represented by a tuple of 5 variables which must obey a table constraint. However, each table constraint has the same table and thus the same supports.

With the current implementation, the table is copied each time, and the support is rebuilt each time which #pieces times more memory than if we had a single table/support dictionnary.

What should be done is defining a new constructor for TableConstraint that take a table & support dictionnary as input and set the corresponding attributes as references to these.

Tackle the "stochastic" issue while training: taking the difficulty of an episode into account.

The title might not be explicit enough, let me explain this point.

When training an agent to play mountainCar, CartPole, CarRacing etc... the best scores he can get are rather close from an episode to another. With the generator we have (at least for Graph coloring atm), the difficulty can change a lot from an episode to another (in term of number of nodes visited). This can lead to a good policy appearing bad, which we should avoid !

A rather simple solution would be to use a simple heuristic to give insights about the current episode (it is indeed very hard to control the difficulty from the generator point of view).
This won't be supervised learning at all as we are not trying to do what the heuristic does, we are just using the heuristic to give use more information about what's happening.
The time of a heuristic search won't make our experiment explose as the training process is much bigger in terms of time consumption.

This is a first basic idea which can be completed later for sure !

TSPTW as a great benchmark tool.

It would be great to make it possible to work on the TSPTW as it is one of the problem used by Quentin in his paper.

In addition, even it could be a little time consuming, I think it could be very interesting to try to reproduce his experiment with our solver. Not a perfect reproduction as we are learning inside the solver, but using the same Graph representation as him.

Even if this won't be inside the "unified representation goal" we are trying to achieve, I would see it as a "sanity check", to make sure our performance are good and to prove that learning in the solver is a real breakthrough.

To do it, we should:

  • create a tsptw generator (similar as Quentin's one)
  • resolve #21 first (making sure we can easily change the RL representation outside the package).
  • make sure we can use similar architecture as his (not necessarly exactly the same though)

PS: the comparison with Quentin can also be done with knapsack or portfolio but in the end, I think we should have the tsptw in our examples

Make SeaPearl finds on its own the dimension variables

We should make sure the user does not have to guess the dimensions of what is generated by our generators.

It could be as simple as creating new functions like get_max_constraint_number, get_max_variable_number and get_max_value_number that would have to be implemented for each generator.

We could then create, for the DefaultStateRepresentation, a get_maxnumberofCPnodes function.
By the way, the cpnodes_max argument of the LearnedHeuristic constructor is a bit too specific don't you think. We should find a way to avoid that.

Maybe we should do all of that together next week.

TstptwReward weird behaviour with numberOfDecisions

I don't know why but when computing the DecisionPhase reward of TsptwReward, the current_turn variable has to be set the value total_decisions - 1 in order to have non zero rewards during training.
However, for the test, the reward will be zero if the current_turn is not set to total_decisions (I had to comment the test in cef068b ).

I think the latter behaviour is the one we want. We should investigate to know why there seems to be a "hidden" decision when training with generated TSPTW instances.

Set up documentation

Before/while becoming public, we should set up a nice documentation using Documenter.jl.
There are already a lot of comments in the code that should allow the documentation to be quite full very quickly.
We just need to write a few pages to explain the overall project (of course using some parts of the paper would be a good idea imo).

Warnings during precompile

We currently have those warning during the precompilation:

WARNING: using ReinforcementLearningBase.render in module CPRL conflicts with an existing identifier.
WARNING: using ReinforcementLearningBase.update! in module CPRL conflicts with an existing identifier.
WARNING: using JuMP.index in module CPRL conflicts with an existing identifier.

I think this slows down the compilation very much, and we should remove them. Not that complicated.
I'll remove the JuMP.index one, @felixchalumeau can you deal with the other two?

Add a LatinSquares generator

Latin Square is a common and simple benchmark problem in CP. It would not be too hard to provide a default generator.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.