SeaPearl is a Constraint Programming solver that can use Reinforcement Learning agents as value-selection heuristics, using graphs as inputs for the agent's approximator. It is to be seen as a tool for researchers that gives the possibility to go above and beyond what has already been done with it.
The paper accompanying this solver can be found on the arXiv. If you use SeaPearl in your research, please cite our work.
The RL agents are defined using ReinforcementLearning.jl, their inputs are dealt with using Flux.jl. The CP part, inspired from MiniCP, is focused on readability. The code is meant to be clear and modular so that researchers could easily get access to CP data and use it as input for their ML model.
]add SeaPearl
Working examples can be found in SeaPearlZoo and documentation can be found here.
SeaPearl can be used either as a classic CP solver that uses predefined variable and value selection heuristics or as Reinforcement Learning driven CP solver that is capable of learning through solving automatically generated instances of a given problem (knapsack, tsptw, graphcoloring, EternityII ...).
To use SeaPearl as a classic CP solver, one needs to :
- declare a variable selection heuristic :
YourVariableSelectionHeuristic{TakeObjective} <: SeaPearl.AbstractVariableSelection{TakeObjective}
- declare a value selection heuristic :
BasicHeuristic <: ValueSelection
- create a Constraint Programming Model :
trailer = SeaPearl.Trailer()
model = SeaPearl.CPModel(trailer)
#create variable :
SeaPearl.addVariable!(...)
#add constraints :
SeaPearl.addConstraint!(model, SeaPearl.AbstractConstraint(...))
#add optionnal objective function :
SeaPearl.addObjective!(model, ObjectiveVar)
To use SeaPearl as a RL-driven CP solver, one needs to :
- declare a variable selection heuristic :
CustomVariableSelectionHeuristic{TakeObjective} <: SeaPearl.AbstractVariableSelection{TakeObjective}
- declare a value selection learnedheuristic :
LearnedHeuristic{SR<:AbstractStateRepresentation, R<:AbstractReward, A<:ActionOutput} <: ValueSelection
- define an agent :
agent = RL.Agent(
policy=(...),
trajectory=(...),
)
- optionally, declare a custom reward :
CustomReward <: SeaPearl.AbstractReward
- optionally, declare a custom StateRepresentation ( instead of the Default tripartite-graph representation ) :
CustomStateRepresentation <: SeaPearl.AbstractStateRepresentation
- optionally, declare a custom featurization for the StateRepresentation :
CustomFeaturization <: SeaPearl.AbstractFeaturization
- create a generator for your given problem, that will create different instances of the specific problem used during the learning process.
CustomProblemGenerator <: AbstractModelGenerator
- set a number of training epochs, declare an evaluator, a Strategy, a metric for benchmarking
nb_epochs = 3000
CustomStrategy <: SearchStrategy #DFS, RBS, ILDS
CustomEvaluator <: AbstractEvaluator #or use predefined one : SeaPearl.SameInstancesEvaluator(...)
function CustomMetricsFun
- launch the training :
metricsArray, eval_metricsArray = SeaPearl.train!(
valueSelectionArray=valueSelectionArray,
generator=tsptw_generator,
nbEpisodes=nbEpisodes,
strategy=strategy,
eval_strategy=eval_strategy,
variableHeuristic=variableSelection,
out_solver = true,
verbose = true,
evaluator=SeaPearl.SameInstancesEvaluator(valueSelectionArray,tsptw_generator; evalFreq = evalFreq, nbInstances = nbInstances, evalTimeOut = evalTimeOut),
restartPerInstances = restartPerInstances
)
All contributions are welcome! Have a look at our contributing guidelines.
seapearl.jl's People
Forkers
daunfamily pitmonticone zebrajack ziadelassal l9kd1 scorpxoy gantao01 ochibobo loudni-s cka-yseapearl.jl's Issues
[CP] No easy way to generate the opposite of a variable, y = -x
By now, to generate the opposite of a variable y = -x
, one needs to proceed as follow :
x = SeaPearl.IntVar(1, 5, "x", trailer)
y = SeaPearl.IntDomainViewOpposite(x.domain)
It would be nice to somehow overwrite the base operator -
to automatically generate a IntDomainViewOpposite
variable. This would be a good feature to facilitate future CP problem implementation and can be a nice good first issue for a newcomer.
Make branching on objective an option rather than a default behaviour
It's great to have the flexibility to branch on the objetive if needed but I really think this needs to be an option rather than a default behaviour.
Action space of variable size
I'm working on that in #17 , the approximator needs to give a vector of Q-values that will have the same size as the number of possible actions given in the input (it is actually the size of the domain we are branching on).
This is not trivial, but I think I'll be able to do something not that ugly.
The issue I'm seeing there is that it will be impossible to use batches with that: I think you cannot batch on vectors of (potentially very) different size.
TagBot trigger issue
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
Adding common generic value ordering heuristics
It would be easy (and convenient) to have at least the following basic generic value heuristics:
Lexicographic
(see #63)ReverseLexicographic
Median
Random
Edit: After discussing briefly with Gilles, Median
is not that great of an idea: it cannot be done easily in O(1) and is not experimentally reported as being better than Lexicographic
.
CPModification not correctly updated in presence of views
In this example, the CPModification
object does not get correctly updated:
trailer = SeaPearl.Trailer()
vars = SeaPearl.AbstractIntVar[]
x = SeaPearl.IntVar(2, 3, "x", trailer)
push!(vars, x)
y = SeaPearl.IntVar(5, 15, "y", trailer)
ax = SeaPearl.IntVarViewMul(x, 3, "3x")
push!(vars, ax)
minusY = SeaPearl.IntVarViewOpposite(y, "-y")
push!(vars, minusY)
constraint = SeaPearl.SumGreaterThan(vars, 5, trailer)
toPropagate = Set{SeaPearl.Constraint}()
prunedDomains = SeaPearl.CPModification()
SeaPearl.propagate!(constraint, toPropagate, prunedDomains)
After the propagate!
call, we have
prunedDomains = Dict{String,Array{Int64,N} where N}(
"3x" => [6],
"-y" => [-8, -9, -10, -11, -12, -13, -14, -15]
)
but the value 2
was removed from x.domain
and this modification is not visible.
Change the way evaluation of a learned heursitic is made
As proposed by @qcappart during the last meeting, it could be interesting to select a "pool" of n instances before the training and to evaluate the learned heuristic on this pool regularly during the training. Although a bit slower, it is a more thorough way of evaluating policies.
DP model vs CP model
We are now able to launch experiences to see the effect of the DP modelisation to solve problems: I guess the knapsack could be the first problem we could try this on as Ilan has already created both models.
We only need to create a generator for DP modelisation of the knapsack but this shouldn't be very long.
Make it possible to change easily the way RL represention is made
It is possible to change easily features, rewards and nn architecture. But the last important point one would like to change while doing researches is the "RL representation". We should definitely think about a clean way to make this possible for a user to change the RL representation outside the package.
Try to give some metrics as input of the RL agent
It could be interesting to try to add some metrics to the input of the RL agent, basically concatenating metrics to an existing state vector.
It's far from being a priority but I'll put it here as it could be interesting to try one day or another.
Testset for CPreward2
add some testset for CPreward with second term order.
Originally posted by @3rdCore in #189 (comment)
Training with graphs of variable number of nodes
We must work in order to let our agents train on graph of mutliple size in terms of nodes. This is already needed when working with Knapsack as stated in #17.
The issue is actually in the buffer which has a fixed size. What we could do in the beginning is actually stating a maximum number of nodes, and storing the number of nodes in the array of the state as a column (e.g. a one-hot encoder telling which rows should be considered as nodes).
I think I'll work on that.
Create more advanced possibilities for reward engineering
The fact that our RL Agent is training inside the solver offers a lot of opportunities in term of reward engineering to give the agent very complete insights about the consequences of his choices. We should definitely make it possible to create more precise and insightful rewards.
A first possibility is to get more statistics about what is happening inside the solver: we want to know when a solution is found, when we reached an unfeasible point etc... But we also want to know how it happens: as a matter of fact, getting fast to a unfeasible solutions is not that bad (for instance, if you reach the best solution in your first tries, you will go to a few Unfeasible points before stopping the search and yielding the solution with the optimal status: this shows that encountering Unfeasible points is not bad, it will always happen. What is bad is rather going all the way down in the tree and finding an Unfeasible solution.)
This quick example just to tell that we could achieve very intereting reward engineering with more insights about what is happening during a search.
I can definitely assign myself to this.
Cleaning SeaPearlZoo
SeaPearlZoo would deserve a bit of cleaning. We can clearly delete a lot of useless files, and clean the remaining ones.
Plus the READMEs should be filled.
Get more insights about what the agent is doing
I sometime find hard to analyse what is happening with the agent I train, for example:
- is he always throwing minimum values or is he able to do something more sophisticated ?
- is he excellent at finding a good solution but very bad at proving optimality ?
We should definitely work on those aspects to be able to have a more consistant "research process" !
Using of MOI
MAthOptInterface does not seem ready for our CPModel as it is.
My PR right there jump-dev/MathOptInterface.jl#1101 could help with that but I had trouble getting it merged.
To simplify, I think the easiest way will be to develop our own intermediary structure, and do the bridges by ourselves. That's what I'll be working on for now.
Make the variable heuristic an RL agent learned heuristic as well
Our first concern is working on value selection and we will wait to be strong on value selection before moving to this topic but let's not forget that in a second time, it would be great to make our agent able to learn how to select a variable as well !
Bug in the TSPTW test
An inconsistant behavior has been spotted on the test Search known instance
line 166 in test/datagen/tsptw.jl
.
The test aims at checking the correct generation of tsptw instances, by solving already resolved problems and controling the existence of a solution. However, this test randomly fails despite the specification of a seed, which make the construction of the instance deterministic.
It appears that the issue doesn't come from the problem generation, but from the CP solver itself. After a quick investigation, I managed to find the point where the bug occurs in the search tree. Indeed, as both heuristics are deterministic in this test, the solver always explores the same path in the search tree. But at some point, a variable assignment triggers a random behavior in fixpoint!
resulting in a valid or invalid assignment without any obvious explanation.
I tweaked the original test a little bit to reproduce the bug:
using SeaPearl
trailer = SeaPearl.Trailer()
model = SeaPearl.CPModel(trailer)
n_city = 21
grid_size = 150
max_tw_gap = 3
max_tw = 8
generator = SeaPearl.TsptwGenerator(n_city, grid_size, max_tw_gap, max_tw)
SeaPearl.fill_with_generator!(model, generator; seed=42)
SeaPearl.assign!(model.variables["a_1"], 19)
SeaPearl.assign!(model.variables["a_2"], 20)
SeaPearl.assign!(model.variables["a_3"], 16)
SeaPearl.assign!(model.variables["a_4"], 11)
SeaPearl.assign!(model.variables["a_5"], 14)
SeaPearl.assign!(model.variables["a_6"], 8)
SeaPearl.assign!(model.variables["a_7"], 17)
SeaPearl.fixPoint!(model)
SeaPearl.assign!(model.variables["a_8"], 5)
status, prunedDomains = SeaPearl.fixPoint!(model, SeaPearl.getOnDomainChange(model.variables["a_8"]))
On my system on successive runs, fixPoint!
returns false
a third of the time.
On every failed run I've had in the original test, this last assignment always was the one triggering the failure.
I double-checked that the distance matrix and time_windows generation was consistant (supposedly the only variables randomly generated in TsptwGenerator
). I haven't been able to check that the initialization of every variable/constraint is the same at each run, but they aren't suppose to be random. I haven't been able to check yet which constraint/part of fixPoint!
is responsible for this behavior.
However, this part becomes tricky because there are 1500+ constraints in the model and fixPoint!
doesn't seem to call them all in the same order. I'll keep digging and will post any new lead.
My current leads are:
- a failing constraint pruning to much values
- a mishandling of a variable in
fixPoint!
[CP] No easy way to generate an offsetted variable, y = x + c
By now, to generate an multiple of a variable of the form y = ax
, one needs to proceed as follow :
c = 3
x = SeaPearl.IntVar(1, 5, "x", trailer)
y= SeaPearl.IntDomainViewOffset(x.domain, c)
It would be nice to somehow overwrite the base operator +
to automatically generate a IntDomainViewOffset
variable. This would be a good feature to facilitate future CP problem implementation and can be a nice good first issue for a newcomer.
Weird output shape of FixedOutputGCN
Discussed in #6
We should investigate
Create a tutorial for an easy onboarding
Let's create a tutorial to easily get started with CPRL.jl.
We should try to have it ready for friday !
Make Lexicographic the default value of a BasicHeuristic
I think that the Lexicographic
value ordering BasicHeuristic(x -> minimum(x.domain))
is more common in CP than BasicHeuristic(x -> maximum(x.domain))
. The naming is also more explicit. I propose simply replacing BasicHeuristic
by a lexicographic ordering. What do you think?
Because this can cause problems of reproducibility in your personal experiments, I can wait your green light.
Add a tsp generator (even a tsptw if possible)
Would be great to have a tsp generator soon.
Add the all different constraint
All different is a very common constraint in CP. It would thus be cool to have it in the solver !
Representable variables
Inspired by #71 . In the same way that we can choose if a variable is branchable, it could be interesting to have the flexibility to choose if a variable should (or not) be added to the DefaultStateRepresentation.
Not a priority though.
metricsFun can not handle more than 1 classic heuristic
BUG While running experiences, the metricsFun doesn't make any difference between all possible classic heuristic while filling MetricsArray, making the results not relevant for analysis.
SeaPearl should maybe provide some classic Metrics Functions to help the user analyze the performance of his model. -
Explore the idea of having two (or more) agents to solve a problem
Finding as fast a possible the best solution and proving optimality is not the same challenge at all and it is very hard for an agent to understand that he should "switch mode". It would hence be very interesting to try working with two agents, one learning to find the best solution asap and one finding the optimality asap.
prunedDomains update check
During the propagate!
function, the variable prunedDomains
is supposed to store all the domain modifications. However, @ilancoulon warned us that the update isn't entirely reliable yet. As it is a very useful feature, it would be great to ensure that this variable is always properly updated.
Make JuMP GraphColoring tests deterministic
The results is different on my machine and on Travis.
I don't really know why, maybe a question of order of the variables in the variable dictionnary.
See commit a417602
Make it possible to give rewards later (not directly after an action is taken)
Sometimes, we are much more able to judge an actions later by analysing subtrees that were explored. Thus it would be very interesting to make the framework able to give associated rewards later instead of just after.
This is not straight forward and not a priority but rather an interesting thing to try if possible.
[CP] No easy way to generate a multiple of a variable, y = ax
By now, to generate an multiple of a variable of the form y = ax
, one needs to proceed as follow :
a = 3
x = SeaPearl.IntVar(1, 5, "x", trailer)
y = SeaPearl.IntDomainViewMul(x.domain, a)
It would be nice to somehow overwrite the base operator *
to automatically generate a IntDomainViewMul
variable. This would be a good feature to facilitate future CP problem implementation and can be a nice good first issue for a newcomer.
Graph coloring generator should be improved now
The actual one was a good start to reach as fast as possible the first experiment but it should be improved yet.
We explore existing ones: http://webdocs.cs.ualberta.ca/~joe/Coloring/#Graph.generators
Or at least solving some issues with the actual one:
- it can theorically be non-connexe: it is very unlikely to happen with a high density but it can happen with lower one though.
- it sometimes creates double links (5 diff from 1 AND 1 diff from 5)
Create default variable selection heuristics
For now, only 2:
- MinDomainVariableSelection
- RandomVariableSelection
Add other type of search strategy
It could be great to have CPRL.jl work with other kinds of strategy (and the learning part supporting it).
Unpractical type casting for Vector Constraints
Currently all the constraints that take a vector of variables as input are casted as Vector{AbstractIntVar}
. This is useful in order to define constraints on IntVar
but also on IntVarView
.
However, by doing it that way, the user is forced to cast its vector as AbstractIntVar
eventhough the vector contains only a more specific variable type, which is quite unpractical.
This could be fixed easily by casting the input of these constraints as Vector{<:AbstractIntVar}
, and it would even allow a more specific dispatch in the constraints if needed.
Making the package more flexible to research experiences
I find the pacakge not adapted enough to experiences at the moment as we need to go to the source code almost everytime we wanna test something. Thus, it would be great to change some stuffs:
- be able to redefine rewards easily
- be able to define the node features
- split the GCN in something more "modulable": at the moment, if we create a layer like "filter" which extract the node feature we want from the feature matrix, we will be able to create our approximator directly inside our "experiences files" and we will be able to change it very easily, without having to change the source code !
I can assign myself to this if you also think that we should improve those aspects !
[CP] No easy way to generate a linear constraint of the form x + y + ... = 0
Given two or more IntVar or IntVarView variables x
and y
, it is not possible to set up a constraint in this way : x + y = 0
. You have to proceed as follow which is laborious :
x = SeaPearl.IntVar(2, 3, "x", trailer)
y = SeaPearl.IntVar(2, 3, "y", trailer)
constraint = SeaPearl.SumToZero([x,y], trailer)
It would be nice to somehow overwrite base operators +
between AbstractIntVar
and
= to automatically generate a
SumToZero` constraint. This would be a good feature to facilitate future CP problem implementation and can be a nice good first issue for a newcomer.
Possible TableCT optimisation
After discussing the problem Eternity 2 with @tomsander1998 we figured out a potential optimisation of the table constraint.
Indeed, in Eternity 2, each piece is represented by a tuple of 5 variables which must obey a table constraint. However, each table constraint has the same table and thus the same supports.
With the current implementation, the table is copied each time, and the support is rebuilt each time which #pieces
times more memory than if we had a single table/support dictionnary.
What should be done is defining a new constructor for TableConstraint
that take a table & support dictionnary as input and set the corresponding attributes as references to these.
Tackle the "stochastic" issue while training: taking the difficulty of an episode into account.
The title might not be explicit enough, let me explain this point.
When training an agent to play mountainCar, CartPole, CarRacing etc... the best scores he can get are rather close from an episode to another. With the generator we have (at least for Graph coloring atm), the difficulty can change a lot from an episode to another (in term of number of nodes visited). This can lead to a good policy appearing bad, which we should avoid !
A rather simple solution would be to use a simple heuristic to give insights about the current episode (it is indeed very hard to control the difficulty from the generator point of view).
This won't be supervised learning at all as we are not trying to do what the heuristic does, we are just using the heuristic to give use more information about what's happening.
The time of a heuristic search won't make our experiment explose as the training process is much bigger in terms of time consumption.
This is a first basic idea which can be completed later for sure !
Weird offset for selecting variable feature in fixedOutputGCN
Discussed in #6
TSPTW as a great benchmark tool.
It would be great to make it possible to work on the TSPTW as it is one of the problem used by Quentin in his paper.
In addition, even it could be a little time consuming, I think it could be very interesting to try to reproduce his experiment with our solver. Not a perfect reproduction as we are learning inside the solver, but using the same Graph representation as him.
Even if this won't be inside the "unified representation goal" we are trying to achieve, I would see it as a "sanity check", to make sure our performance are good and to prove that learning in the solver is a real breakthrough.
To do it, we should:
- create a tsptw generator (similar as Quentin's one)
- resolve #21 first (making sure we can easily change the RL representation outside the package).
- make sure we can use similar architecture as his (not necessarly exactly the same though)
PS: the comparison with Quentin can also be done with knapsack or portfolio but in the end, I think we should have the tsptw in our examples
New feature : making it possible to save an agent
Asked by @qcappart.
We should add a possiblity to save an agent after an experiment and to load it afterward.
Make SeaPearl finds on its own the dimension variables
We should make sure the user does not have to guess the dimensions of what is generated by our generators.
It could be as simple as creating new functions like get_max_constraint_number
, get_max_variable_number
and get_max_value_number
that would have to be implemented for each generator.
We could then create, for the DefaultStateRepresentation, a get_maxnumberofCPnodes
function.
By the way, the cpnodes_max
argument of the LearnedHeuristic
constructor is a bit too specific don't you think. We should find a way to avoid that.
Maybe we should do all of that together next week.
Support Max sense for @objective
From #4, we should be able to maximize objective functions (not that complicated).
This could be tagged as a good first issue.
TstptwReward weird behaviour with numberOfDecisions
I don't know why but when computing the DecisionPhase reward of TsptwReward, the current_turn
variable has to be set the value total_decisions - 1
in order to have non zero rewards during training.
However, for the test, the reward will be zero if the current_turn
is not set to total_decisions
(I had to comment the test in cef068b ).
I think the latter behaviour is the one we want. We should investigate to know why there seems to be a "hidden" decision when training with generated TSPTW instances.
Update ReinforcementLearning & ReinforcementLearningCore
Set up documentation
Before/while becoming public, we should set up a nice documentation using Documenter.jl.
There are already a lot of comments in the code that should allow the documentation to be quite full very quickly.
We just need to write a few pages to explain the overall project (of course using some parts of the paper would be a good idea imo).
Current base NN dense layer doesn't use any activation functions....
The basic agent declaration uses GNN and Dense Neural Network for Q approximation. We discover that the dense layer doesn't include activation functions : The dense NN only performs a linear regression on our data...
Warnings during precompile
We currently have those warning during the precompilation:
WARNING: using ReinforcementLearningBase.render in module CPRL conflicts with an existing identifier.
WARNING: using ReinforcementLearningBase.update! in module CPRL conflicts with an existing identifier.
WARNING: using JuMP.index in module CPRL conflicts with an existing identifier.
I think this slows down the compilation very much, and we should remove them. Not that complicated.
I'll remove the JuMP.index one, @felixchalumeau can you deal with the other two?
Add a LatinSquares generator
Latin Square
is a common and simple benchmark problem in CP. It would not be too hard to provide a default generator.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.