Coder Social home page Coder Social logo

l2d's Introduction

Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning

This repository is the official PyTorch implementation of the algorithms in the following paper:

Cong Zhang, Wen Song, Zhiguang Cao, Jie Zhang, Puay Siew Tan, Chi Xu. Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning. 34th Conference on Neural Information Processing Systems (NeurIPS), 2020. [PDF]

If you make use of the code/experiment or L2D algorithm in your work, please cite our paper (Bibtex below).

@inproceedings{NEURIPS2020_11958dfe,
 author = {Zhang, Cong and Song, Wen and Cao, Zhiguang and Zhang, Jie and Tan, Puay Siew and Chi, Xu},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},
 pages = {1621--1632},
 publisher = {Curran Associates, Inc.},
 title = {Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning},
 url = {https://proceedings.neurips.cc/paper/2020/file/11958dfee29b6709f48a9ba0387a2431-Paper.pdf},
 volume = {33},
 year = {2020}
}

Installation

Pytorch 1.6

Gym 0.17.3

Docker install

Clone this repo and within the repo folder run the following command.

Create image l2d-image:

sudo docker build -t l2d-image .

Create container l2d-container from l2d-image, and activate it:

sudo docker run --gpus all --name l2d-container -it l2d-image

Reproduce result in paper

Change the device type in Params.py file and run:

python3 test_learned.py

Or

Change the device type in Params.py file and run:

python3 test_learned_on_benchmark.py

for open benchmark

l2d's People

Contributors

zcaicaros avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

l2d's Issues

Remove the old arcs can make the graph embedding more accurate.

Maybe we should remove the old arc when a new operation inserts between two operations. Removing the old disjunctive arcs can make the graph embedding more accurate.
All we need is to add the following code to JSSP_Env.py -> def step(self, action).

if flag and precd != action and succd != action:  # Remove the old arc when a new operation inserts between two operations
    self.adj[succd, precd] = 0

Questions regarding the graph design

Thanks for the great work.
Still, after gone through the paper and the implementation, I got a few questions regarding the graph design and arcs update strategy.

  1. It seems add arcs strategy does not contain the information of future operations which require to use the same machine.

Since there is no arc between blue/red/green nodes at the beginning, it implies that the RL agent does not consider the relationship between future operations on the same machine during the scheduling process.
I understand that the state representation of add arch strategy is more efficient to be processed by GIN. But Isn't removing arcs strategy more reasonable?

  1. Why RL agent choose actions/operations by only looking back on the past scheduled nodes instead of considering future operations.

The aggregation only collects the information of incoming neighbors, which means that RL agent picks next operation based on the
past scheduled nodes. But shouldn't the RL agent consider the future operations while making the next action?
Is there any special reason for the author to adopt current design?

How to test bigger instance and irregular instance

Why your models trained on small instances can be tested on bigger instances, how to solve the difference of size.
Why each operation of your instance need different machines, no repetition in a job.
How to test on some irregular instances, i.e. some jobs have different amount of operations.

How to train the model?

Hi!
I read your paper and the code in the last days, and now I have a question about the code. In the project, I find test_learned.py and validation.py which is used to test and validate, and the network is loaded directly in the file. So I want to ask you how to train the model? @zcajiayin

关于问题建模

有一点我想不太明白,就是为什么要find the earliest feasible time period to allocate at on the required machine,而不是直接把 at 加到对应machine的队列末尾?

Question about the PPO

Thanks for your great work!

I found that both the new and the old policy use the same model at first which means that all the parameters in GNN/Actor/Critic network are same.

But why the same environment can get different logprob in policy_old and policy?

FJSP

Hi,
Me again521cc9d3-92f9-40bc-af2d-f5309211229d
Your paper is very good, but the problem of dynamic flexible shop scheduling is very common in actual production practice. How can this problem be improved based on your paper?

Bug on updating the LBs matrix!

I found there might be a bug on computing the lower bounds which is "LBs" in the code. Sometimes the updated lower bounds is wrong and this might be a critical issue!
Please set "env_num=1" and debug to see if the LBs is correct.

关于permissibleLS.py文件的一些疑问(Some questions about permissibleLS.py)

尊敬的作者:
您好!
我阅读了您的代码,对于permissinleLS.py中的函数,我有一些疑问,恳请您的解答:
1.mch_a = np.take(mchMat, a) - 1 这一语句的含义。在该模块中,机器数减一的语句大量出现,并且相关变量常出现在矩阵索引的位置,例如 opIDsOnMchs[mch_a][np.where(opIDsOnMchs[mch_a] >= 0)][-1]。我不太能理解其含义,在我的理解下,np.take(mchMat, a)提取了某一个action对应的机器编号,似乎和行索引没有什么关联。
2.变量opIDsOnMchs的含义。
3.对于action和Job工序中顺序约束的理解。通篇阅读后,我认为您对action的编码方式是 J * M,例如15 * 15的数据**有编号为0-224的225个action。然而permissinleLS.py测试代码的输出结果print('opIDsOnMchs\n', opIDsOnMchs) 动摇了我的理解。例如在某一次完整的迭代中最终opIDsOnMchs输出为[[3 1 8] [0 4 6] [2 5 7]],我不知道应该如何理解其中各个数字的含义,以及不知道如何从中看出Job的顺序约束。(该例子对应的机器矩阵为[[2 1 3] [1 2 3] [2 3 1]])
4.您在论文中提出“add arcs”的策略,然而在该策略下,训练过程中机器是如何判断共用同一台机器的Oij呢?在“remove arcs”策略下,使用同一台机器的Oij有析取弧相连,故只需要训练得出析取弧的方向即可;然而在“add arcs”策略下,事先没有析取弧约束,如何保证这一点呢?
恳请您拨冗回答,十分期待您的回复,感谢!

Dear author,

Hello!

I have read your code, and I have some questions about the function in permissinels. py.

1.mch_a = np.take(mchMat, a) -1 Meaning of this statement. In this module(permissinleLS.py), the number of machines reduced by one statement appears a lot, and the related variables often appear in the position of matrix index, such as opIDsOnMchs[mch_a][np-where (opIDsOnMchs[mch_a] >= 0)][-1]. I don't quite understand its meaning. In my understanding, np.take(mchMat, a) extracts the machine number corresponding to an action, which seems to have nothing to do with the row index.

  1. The meaning of variable opIDsOnMchs.

  2. Understanding of sequence constraints in action and Job processes. After reading the whole article, I think you encode the action as J * M. For example, there are 225 actions numbered 0-224 in the data of 15 * 15. However, the output of the permissinels.py test code print('opIDsOnMchs\n', opIDsOnMchs) shakes my understanding. For example, in a complete iteration, the final opIDsOnMchs output is [[3 1 8] [0 4 6] [2 5 7]]. I don't know how to understand the meaning of each number and how to see the sequential constraints of the Job. (The corresponding machine matrix for this example is [[2 1 3] [1 2 3] [2 3 1]])

  3. You put forward the strategy of "add arcs" in the paper. However, under this strategy, how does the machine judge the Oij sharing the same machine in the training process? Under the "remove arcs" strategy, Oij of the same machine is connected by disjunctive arcs, so only the direction of disjunctive arcs needs to be trained. However, under the "add arcs" strategy, there is no prior disjunctive arc constraint, how to ensure this?

I will feel grateful if you can take the time to reply. I am looking forward to your reply. Thank you very much!

How to load instances

Dear authors,

I would like to know how to load instances that are not present in your BenchDataNmpy folder.
Which format do you accept? How do you generate the .npy file?

The classic way to represent the instances is the Standard specification: http://jobshop.jjvh.nl/explanation.php#standard_def
How can I load standard specification instances to benchmark against your approach?

Best,
Pierre

PDRs result

Hello!
I noticed that the code did not include PDRS related codes such as SPT , MOPNR mentioned in the experiment part of the paper.
Is that so?

update过程中的一个小bug

在代码文件PPO_jssp_multiInstances.py的update函数处:
image

以6x6 size的instance为例,在第一句标红框的位置line-138处,rewards_all_env[i] 的shape为(36,),vals的shape为(36,1),由于torch.tensor的广播机制,运算后advantages的shape为(36,36),随后的surr1、surr2的shape也为(36,36),在第二句标红框处,即line-142,p_loss的shape也为(36,36),最后导致loss的大小也为(36,36)。
这个过程跟PPO公式应该是不符合的,不过模型训练是可以收敛的。

我将上述两句改为以下符合PPO公式的情况,模型训练也能正常收敛,且效果比原来的略好。
image
或者只修改line-142处,亦可达到同样目的。

模型训练及测试情况如下图所示,橙色是原版的情况,蓝色是修改后的情况:训练过程中Param中除instances size改为6x6外,其他没变过。
image
经过初步调参,测试结果是有提升的。
如下图所示,绿色线为经过调参后的结果,best result 为 566.8,论文中的结果为574.09,比论文中的结果略好。
image

How to get the final schedule?

Hi,

I have gone through your paper and code as well, It's very good. I have a question. How to get the obtained schedule from RL agent?

Confused about PPO update

I'm a bit confused about the PPO update process. In the line 110:
Screenshot from 2024-06-06 11-21-26
The rewards in a single episode ​​are normalized by subtracting the mean while divided by the variance. So why should the rewards be scaled? I found that though normalized, some truly bad rewards are scaled and important information is lost.

How to train the model?

Your work is very meaningful, but I don't know how to train the model, can you share your train code?

关于getActionNbghs函数

你好,关于getActionNbghs这个函数的定义有些不太确定的地方。
梳理了一下getActionNbghs函数输入输出:
输入:矩阵,整数N
输出:整数N,整数N右邻居对应的值(邻居 >= 0)。
整数N,整数N(邻居 < 0)

getActionNbghs函数在JSSP_Env中的调用代码:
# adj matrix
precd, succd = self.getNghbs(action, self.opIDsOnMchs)
self.adj[action] = 0
self.adj[action, action] = 1
if action not in self.first_col:
self.adj[action, action - 1] = 1
self.adj[action, precd] = 1
self.adj[succd, action] = 1
所以getActionNbghs函数是想要获得下一个operation的值是吗?如果下一个operation<0, 说明下一个operation不能被开始对吗?为什么[action, precd] 和[succd, action]都要设置成1呢?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.