Coder Social home page Coder Social logo

rl-integer-programming's Introduction

IEOR4575 Project

Instructor: Professor Shipra Agrawal
Assistants: Yunhao Tang, Abhi Gupta

Info

The final report generated can be found under 4575_final_report_tkt2120_submission.pdf


To Run

To run any of the training scripts on instances generated or downloaded, use

python run_ppo.py

For now, hyperparameters must be changed within the .py file.

State-Action Description

State s is an array with give components

  • s[0]: constraint matrix $A$of the current LP ($\max -c^Tx \text{ s.t. }Ax \le b$) . Dimension is $m \times n$. See by printing s[0].shape. Here $n$ is the (fixed) number of variables. For instances of size 60 by 60 used in the above command, $n$ will remain fixed as 60. And $m$ is the current number of constraints. Initially, $m$ is to the number of constraints in the IP instance. (For instances generated with --num-c=60, $m$ is 60 at the first step). But $m$ will increase by one in every step of the episode as one new constraint (cut) is added on taking an action.
  • s[1]: rhs $b$ for the current LP ($Ax\le b$). Dimension same as the number $m$ in matrix A.
  • s[2]: coefficient vector $c$ from the LP objective ($-c^Tx$). Dimension same as the number of variables, i.e., $n$.
  • s[3], s[4]: Gomory cuts available in the current round of Gomory's cutting plane algorithm. Each cut $i$ is of the form $D_i x\le d_i$. s[3] gives the matrix $D$ (of dimension $k \times n$) of cuts and s[4] gives the rhs $d$ (of dimension $k$). The number of cuts $k$ available in each round changes, you can find it out by printing the size of last component of state, i.e., s[4].size or s[-1].size.

Example

You can use the following script to familiarize yourself with the cutting plane environment that we have built for you.

$ python example.py

Training Performance Evaluation

There are two environment settings on which your training performance will be evaluated. These can be loaded by using the following two configs (see example.py). Each mode is characterized by a set of parameters that define the cutting plane environment.

The easy setup defines the environment as follows:

easy_config = {
    "load_dir"        : 'instances/train_10_n60_m60',
    "idx_list"        : list(range(10)),
    "timelimit"       : 50,
    "reward_type"     : 'obj'
}

For your reference, the maximum total sum of rewards achievable in any given episode in the easy mode is 2.947 +- 0.5469.

The hard setup defines the environment as follows:

hard_config = {
    "load_dir"        : 'instances/train_100_n60_m60',
    "idx_list"        : list(range(99)),
    "timelimit"       : 50,
    "reward_type"     : 'obj'
}

On average, the maximum total sum of rewards achievable in any given episode in the hard mode is 2.985 +- 0.8427.

The main difference between the easy and hard modes is the number of training instances. Easy contains 10 instances while hard contains 100. Please read the example.py script would further details about what these environment parameters mean.

Generalization

For the first phase of the project, your task is to reach the best possible performance on the two training modes described above. We will introduce another test mode for the environment later in the semester where your agent will be tested on a cutting plane environment with unseen instances (of size 60 by 60).

Generating New Instances

To make sure your algorithm generalizes to instances beyond those in the instances folder, you can create new environments with random IP instances and train/test on those. To generate new instances, run the following script. This will create 100 new instances with 60 constraints and 60 variables.

$ python generate_randomip.py --num-v 60 --num-c 60 --num-instances 100

The above instances will be saved in a directory named 'instances/randomip_n60_m60'. Then, we can load instances into gym env and train a cutting agent. The following code loads the 50th instance and run an episode with horizon 50:

python testgymenv.py --timelimit 50 --instance-idx 50 --instance-name randomip_n60_m60

We should see the printing of step information till the episode ends.

If you do not provide --instance-idx, then the environment will load random instance out of the 100 instances in every episode. It is sometimes easier to train on a single instance to start with, instead of a pool of instances.

rl-integer-programming's People

Contributors

thomasundo2 avatar

Stargazers

XiyaLiu avatar

Watchers

James Cloos avatar  avatar

rl-integer-programming's Issues

Regarding the missing files

While running the code , it is showing some files are missing ๐Ÿ‘Ž
FileNotFoundError: [Errno 2] No such file or directory: 'records/train_10_n60_m60/idx_0_9/ppo_actor_dense_critic_None_rnd_None/models_20210419-224441/actor_500_20210421-135321.pt

Can you please tell how to generate or where I can find this file

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.