zcaicaros / l2d Goto Github PK

Official implementation of paper "Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning"

Python 97.40% Dockerfile 2.60%

l2d's Introduction

Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning

This repository is the official PyTorch implementation of the algorithms in the following paper:

Cong Zhang, Wen Song, Zhiguang Cao, Jie Zhang, Puay Siew Tan, Chi Xu. Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning. 34th Conference on Neural Information Processing Systems (NeurIPS), 2020. [PDF]

If you make use of the code/experiment or L2D algorithm in your work, please cite our paper (Bibtex below).

@inproceedings{NEURIPS2020_11958dfe,
 author = {Zhang, Cong and Song, Wen and Cao, Zhiguang and Zhang, Jie and Tan, Puay Siew and Chi, Xu},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},
 pages = {1621--1632},
 publisher = {Curran Associates, Inc.},
 title = {Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning},
 url = {https://proceedings.neurips.cc/paper/2020/file/11958dfee29b6709f48a9ba0387a2431-Paper.pdf},
 volume = {33},
 year = {2020}
}

Installation

Pytorch 1.6

Gym 0.17.3

Docker install

Clone this repo and within the repo folder run the following command.

Create image l2d-image:

sudo docker build -t l2d-image .

Create container l2d-container from l2d-image, and activate it:

sudo docker run --gpus all --name l2d-container -it l2d-image

Reproduce result in paper

Change the device type in Params.py file and run:

python3 test_learned.py

Or

Change the device type in Params.py file and run:

python3 test_learned_on_benchmark.py

for open benchmark

l2d's People

Contributors

Stargazers

Watchers

Forkers

jiaodaxiaozi qiruiqwe running-sn tesla2fox hanbaoan123 dgo2dance pzhao28 jingwenli0312 gscr10 zxinat beniz theonll lanlan777 scorpjd yaoshj975 sx1616039 micheltho chupeng24 shengaopan stephan-who ygdwn455 dongzhuoran666 hayes-mao allenchen0958 kyrie-zhao hopingz xiaoguoguo666 debnathr zxkyjimmy chenzh12 theyun hansen1101 doheelab l-z-7 blacksmith-whu bulentsoykan congwang1998 khasang12-khmt dyutimoy xcarlosamm yangbing668 wzj9812 songhan89 xiaopingdeng leslie-tang ciongciong nguyensu whiskychoy nnihilism reallyhardtocreateaname netcrazy007 mark5282 robclaassen99 xuziteng2020 sanelyx decusest jimmy-inl william1234nn mengqiaolan t110598027hung1 marco-yue saipklvs tongshuai-4981 kaleb-nim seanigami gabbydgoat zjumhy97 tcjdsg thekael99 smarts027 chenxinw vvick hurryman1 everdyx wu33learn choipostech potato-zx jeroenvanriel rw-jhyt userwz chenrui2200 anlandong zhangcong0903 vegorovmsk kairsear yarikama chtai zhangzheming33

l2d's Issues

How to load instances

Dear authors,

I would like to know how to load instances that are not present in your BenchDataNmpy folder.
Which format do you accept? How do you generate the .npy file?

The classic way to represent the instances is the Standard specification: http://jobshop.jjvh.nl/explanation.php#standard_def
How can I load standard specification instances to benchmark against your approach?

Best,
Pierre

FJSP

Hi，
Me again
Your paper is very good, but the problem of dynamic flexible shop scheduling is very common in actual production practice. How can this problem be improved based on your paper?

Remove the old arcs can make the graph embedding more accurate.

Maybe we should remove the old arc when a new operation inserts between two operations. Removing the old disjunctive arcs can make the graph embedding more accurate.
All we need is to add the following code to JSSP_Env.py -> def step(self, action).

if flag and precd != action and succd != action:  # Remove the old arc when a new operation inserts between two operations
    self.adj[succd, precd] = 0

PDRs result

Hello!
I noticed that the code did not include PDRS related codes such as SPT , MOPNR mentioned in the experiment part of the paper.
Is that so?

Question about the PPO

Thanks for your great work！

I found that both the new and the old policy use the same model at first which means that all the parameters in GNN/Actor/Critic network are same.

But why the same environment can get different logprob in policy_old and policy？

Bug on updating the LBs matrix!

I found there might be a bug on computing the lower bounds which is "LBs" in the code. Sometimes the updated lower bounds is wrong and this might be a critical issue!
Please set "env_num=1" and debug to see if the LBs is correct.

How to test bigger instance and irregular instance

Why your models trained on small instances can be tested on bigger instances, how to solve the difference of size.
Why each operation of your instance need different machines, no repetition in a job.
How to test on some irregular instances, i.e. some jobs have different amount of operations.

Confused about PPO update

I'm a bit confused about the PPO update process. In the line 110:

The rewards in a single episode are normalized by subtracting the mean while divided by the variance. So why should the rewards be scaled? I found that though normalized, some truly bad rewards are scaled and important information is lost.

3D disjunction graph

Hi,
here`s a good idea for FJSP，using 3D disjunction graph
https://blog.csdn.net/hba646333407/article/details/122980425

confuse about initialize adj matrix

Hi，
The variable "conj_nei_low_stream" is not used,and Y?

What does the adj actually mean？Is that what I understand？

Where is the code to train network?

This looks like there is only code to test the model and no code to train network. Can you offer the training code?

关于permissibleLS.py文件的一些疑问(Some questions about permissibleLS.py)

尊敬的作者：
您好！
我阅读了您的代码，对于permissinleLS.py中的函数，我有一些疑问，恳请您的解答：
1.mch_a = np.take(mchMat, a) - 1 这一语句的含义。在该模块中，机器数减一的语句大量出现，并且相关变量常出现在矩阵索引的位置，例如 opIDsOnMchs[mch_a][np.where(opIDsOnMchs[mch_a] >= 0)][-1]。我不太能理解其含义，在我的理解下，np.take(mchMat, a)提取了某一个action对应的机器编号，似乎和行索引没有什么关联。
2.变量opIDsOnMchs的含义。
3.对于action和Job工序中顺序约束的理解。通篇阅读后，我认为您对action的编码方式是 J * M，例如15 * 15的数据**有编号为0-224的225个action。然而permissinleLS.py测试代码的输出结果print('opIDsOnMchs\n', opIDsOnMchs) 动摇了我的理解。例如在某一次完整的迭代中最终opIDsOnMchs输出为[[3 1 8] [0 4 6] [2 5 7]]，我不知道应该如何理解其中各个数字的含义，以及不知道如何从中看出Job的顺序约束。（该例子对应的机器矩阵为[[2 1 3] [1 2 3] [2 3 1]]）
4.您在论文中提出“add arcs”的策略，然而在该策略下，训练过程中机器是如何判断共用同一台机器的Oij呢？在“remove arcs”策略下，使用同一台机器的Oij有析取弧相连，故只需要训练得出析取弧的方向即可；然而在“add arcs”策略下，事先没有析取弧约束，如何保证这一点呢？
恳请您拨冗回答，十分期待您的回复，感谢！

Dear author,

Hello!

I have read your code, and I have some questions about the function in permissinels. py.

1.mch_a = np.take(mchMat, a) -1 Meaning of this statement. In this module(permissinleLS.py), the number of machines reduced by one statement appears a lot, and the related variables often appear in the position of matrix index, such as opIDsOnMchs[mch_a][np-where (opIDsOnMchs[mch_a] >= 0)][-1]. I don't quite understand its meaning. In my understanding, np.take(mchMat, a) extracts the machine number corresponding to an action, which seems to have nothing to do with the row index.

The meaning of variable opIDsOnMchs.
Understanding of sequence constraints in action and Job processes. After reading the whole article, I think you encode the action as J * M. For example, there are 225 actions numbered 0-224 in the data of 15 * 15. However, the output of the permissinels.py test code print('opIDsOnMchs\n', opIDsOnMchs) shakes my understanding. For example, in a complete iteration, the final opIDsOnMchs output is [[3 1 8] [0 4 6] [2 5 7]]. I don't know how to understand the meaning of each number and how to see the sequential constraints of the Job. (The corresponding machine matrix for this example is [[2 1 3] [1 2 3] [2 3 1]])
You put forward the strategy of "add arcs" in the paper. However, under this strategy, how does the machine judge the Oij sharing the same machine in the training process? Under the "remove arcs" strategy, Oij of the same machine is connected by disjunctive arcs, so only the direction of disjunctive arcs needs to be trained. However, under the "add arcs" strategy, there is no prior disjunctive arc constraint, how to ensure this?

I will feel grateful if you can take the time to reply. I am looking forward to your reply. Thank you very much!

only use the first possible insertion point?

L2D/permissibleLS.py

Line 13 in 7b2efbb

possiblePos = np.where(jobRdyTime_a < startTimesForMchOfa)[0]

关于问题建模

有一点我想不太明白，就是为什么要find the earliest feasible time period to allocate at on the required machine，而不是直接把 at 加到对应machine的队列末尾？

update过程中的一个小bug

在代码文件PPO_jssp_multiInstances.py的update函数处：

以6x6 size的instance为例，在第一句标红框的位置line-138处，rewards_all_env[i] 的shape为（36，），vals的shape为（36，1），由于torch.tensor的广播机制，运算后advantages的shape为（36，36），随后的surr1、surr2的shape也为（36，36），在第二句标红框处，即line-142，p_loss的shape也为（36，36），最后导致loss的大小也为（36，36）。
这个过程跟PPO公式应该是不符合的，不过模型训练是可以收敛的。

我将上述两句改为以下符合PPO公式的情况，模型训练也能正常收敛，且效果比原来的略好。

或者只修改line-142处，亦可达到同样目的。

模型训练及测试情况如下图所示，橙色是原版的情况，蓝色是修改后的情况：训练过程中Param中除instances size改为6x6外，其他没变过。

经过初步调参，测试结果是有提升的。
如下图所示，绿色线为经过调参后的结果，best result 为 566.8，论文中的结果为574.09，比论文中的结果略好。

How to train the model?

Your work is very meaningful, but I don't know how to train the model, can you share your train code?

How to train the model?

Hi!
I read your paper and the code in the last days, and now I have a question about the code. In the project, I find test_learned.py and validation.py which is used to test and validate, and the network is loaded directly in the file. So I want to ask you how to train the model? @zcajiayin

Questions regarding the graph design

Thanks for the great work.
Still, after gone through the paper and the implementation, I got a few questions regarding the graph design and arcs update strategy.

It seems add arcs strategy does not contain the information of future operations which require to use the same machine.

Since there is no arc between blue/red/green nodes at the beginning, it implies that the RL agent does not consider the relationship between future operations on the same machine during the scheduling process.
I understand that the state representation of add arch strategy is more efficient to be processed by GIN. But Isn't removing arcs strategy more reasonable?

Why RL agent choose actions/operations by only looking back on the past scheduled nodes instead of considering future operations.

The aggregation only collects the information of incoming neighbors, which means that RL agent picks next operation based on the
past scheduled nodes. But shouldn't the RL agent consider the future operations while making the next action?
Is there any special reason for the author to adopt current design?

How to get the final schedule?

Hi,

I have gone through your paper and code as well, It's very good. I have a question. How to get the obtained schedule from RL agent?

关于getActionNbghs函数

你好，关于getActionNbghs这个函数的定义有些不太确定的地方。
梳理了一下getActionNbghs函数输入输出：
输入：矩阵，整数N
输出：整数N,整数N右邻居对应的值（邻居 >= 0）。
整数N,整数N（邻居 < 0）

getActionNbghs函数在JSSP_Env中的调用代码：
# adj matrix
precd, succd = self.getNghbs(action, self.opIDsOnMchs)
self.adj[action] = 0
self.adj[action, action] = 1
if action not in self.first_col:
self.adj[action, action - 1] = 1
self.adj[action, precd] = 1
self.adj[succd, action] = 1
所以getActionNbghs函数是想要获得下一个operation的值是吗？如果下一个operation<0, 说明下一个operation不能被开始对吗？为什么[action, precd] 和[succd, action]都要设置成1呢？