I have git cloned the repository and run ./setup.py install<

Should I run setup again? about latplan HOT 9 CLOSED

guicho271828 commented on May 18, 2024

Should I run setup again?

from latplan.

Comments (9)

guicho271828 commented on May 18, 2024

I don't think regenerating the dataset is necessary.
Regarding setup.py, git diff HEAD..refs/tags/4.1.3 then if there are no diff it should be good to go

from latplan.

wn4github commented on May 18, 2024

Thank you for the suggestion. But in the end, I couldn't get the code to run due to the incompatibility issues between nvidia-tensorflow 1.15 and keras. I have to use the nvidia-tensorflow because RTX 30 series do not support CUDA 10 while prebuilt tensorflow 1.x does not support CUDA 11.

I'm wondering if you are currently porting the code to PyTorch?

from latplan.

guicho271828 commented on May 18, 2024

yes, that aspect is also something I am struggling with. My lab cluster is also transitioning to CUDA11, so I am attempting to rewrite part of the code, but the development is slow.

from latplan.

guicho271828 commented on May 18, 2024

you could try to build tf 1.15 for cuda 11.

from latplan.

guicho271828 commented on May 18, 2024

I just learned that NVIDIA (not Google) provides a backward-compatible version of tensorflow 1.15 that works on cuda 11.
tensorflow/tensorflow#43629
https://developer.nvidia.com/blog/accelerating-tensorflow-on-a100-gpus/
It seems its package name is nvidia-tensorflow.

from latplan.

wn4github commented on May 18, 2024

Thank you so much for you help, Asaiさん.

I too discovered the nvidia-tensorflow could work with CUDA 11 and I managed to get a working tf 1.15 with

nvidia-tensorflow 1.15.4+nv20.12
Keras 2.2.5
keras-adabound 0.6.0
Keras-Applications 1.0.8

Now I am using the 4.1.3 version from the release source code, and running ./setup-dataset.sh is successful. However, I notice the difference of setup-dataset.sh between 4.1.3 and the latest commit: In 4.1.3, no .npz files are downloaded.

Anyway the problem I run into now is the error when executing ./train_all.sh. I only uncomment the first line for training: task-planning learn_plot_dump_summary. The trace is as follows

Fancy Traceback (most recent call last):
  File ./strips.py line 294 function <module> : main()
                    mode = 'learn_plot_dump_summary'
                sae_path = 'puzzle_mnist_3_3_5000_None_None_None_False_ConcreteDetNormalizedLogitAddEffectTransitionAE_planning'
      default_parameters = {'epoch': 200, 'batch_size': 500, 'optimizer': 'radam', 'max_temperature': 5.0, 'min_temperature': 0.7, 'M': 2, 'train_gumbel': True, 'train_softmax': True, 'test_gumbel': False, 'test_softmax': False, 'locality': 0.0, 'locality_delay': 0.0, 'aeclass': 'ConcreteDetNormalizedLogitAddEffectTransitionAE'}
              parameters = {'beta': [-0.3, -0.1, 0.0, 0.1, 0.3], 'lr': [0.1, 0.01, 0.001], 'N': [100, 200, 500, 1000], 'M': [2], 'layer': [1000], 'clayer': [16], 'dropout': [0.4], 'noise': [0.4], 'dropout_z': [False], 'activation': ['relu'], 'num_actions': [100, 200, 400, 800, 1600], 'aae_width': [100, 300, 600], 'aae_depth': [0, 1, 2], 'aae_activation': ['relu', 'tanh'], 'aae_delay': [0], 'direct': [0.1, 1.0, 10.0], 'direct_delay': [0.05, 0.1, 0.2, 0.3, 0.5], 'zerosuppress': [0.1, 0.2, 0.5], 'zerosuppress_delay': [0.05, 0.1, 0.2, 0.3, 0.5], 'loss': ['BCE'], 'type': ['mnist'], 'width': [3], 'height': [3], 'num_examples': [5000], 'stop_gradient': [False], 'aeclass': ['ConcreteDetNormalizedLogitAddEffectTransitionAE'], 'comment': ['planning']}

  File ./strips.py line 290 function main : globals()[task](*map(myeval,sys.argv))
                    task = 'puzzle'

  File ./strips.py line 208 function puzzle : show_summary(ae, train, test)
                    type = 'mnist'
                   width = 3
                  height = 3
            num_examples = 5000
                       N = None
             num_actions = None
                  direct = None
           stop_gradient = False
                 aeclass = 'ConcreteDetNormalizedLogitAddEffectTransitionAE'
                 comment = 'planning'
                    name = 'comment'
                   value = 'planning'
                    path = '/home/wn/workspace/latplan/latplan-4.1.3_original/latplan/puzzles/puzzle-mnist-3-3.npz'
                    data = <numpy.ndarray float32  (5000, 2, 42, 42)>
             pre_configs = <numpy.ndarray float64  (5000, 9)>
             suc_configs = <numpy.ndarray float64  (5000, 9)>
                    pres = <numpy.ndarray float32  (5000, 42, 42)>
                    sucs = <numpy.ndarray float32  (5000, 42, 42)>
             transitions = <numpy.ndarray float32  (2, 5000, 42, 42)>
                  states = <numpy.ndarray float32  (10000, 42, 42)>
                   train = <numpy.ndarray float32  (4500, 2, 42, 42)>
                     val = <numpy.ndarray float32  (250, 2, 42, 42)>
                    test = <numpy.ndarray float32  (250, 2, 42, 42)>
                      ae = None

  File ./strips.py line 180 function show_summary : ae.summary()
                      ae = None
                   train = <numpy.ndarray float32  (4500, 2, 42, 42)>
                    test = <numpy.ndarray float32  (250, 2, 42, 42)>

AttributeError: 'NoneType' object has no attribute 'summary'

I am also considering to use the trained weights directly if training cannot be done, but I need some guidance.

from latplan.

guicho271828 commented on May 18, 2024

Now I am using the 4.1.3 version from the release source code, and running ./setup-dataset.sh is successful. However, I notice the difference of setup-dataset.sh between 4.1.3 and the latest commit: In 4.1.3, no .npz files are downloaded.

setup-dataset also downloads unrelated npz files that are not used in ijcai paper (but are used on other papers). Sorry for this confusion, this is because this entire repository is a kind of my "lab environment" which sets up everything I use for all of my papers. The failed ones for photorealistic-blocksworld are not used, so no worries. Instead, all datasets needed for reproducing the ijcai paper are rendered locally using a script included in this repo.

Anyway the problem I run into now is the error when executing ./train_all.sh.

Since you already have the trained weights, running this script is not necessary. All results including the csv dump and the PDDL domain file is included in the archive.

AttributeError: 'NoneType' object has no attribute 'summary'

Here is what is happening: task-planning learn_plot_dump_summary tries to run the training. However, since samples/*/grid_search.log already has more entires than the specified limit of hyperparameter configurations (300), it did not run the training. Thus the model instance (ae) is None.

If you want to regenerate the reconstructions etc., then task-planning plot_dump_summary would load the stored weights and produce a reconstruction plot and dump several files necessary for generating pddl files. Be sure that the archive is decompressed in a correct directory. It should be made so that samples/ directory is in the root of the repository.

The hyperparameter search is completely parallelized in the process level. So, if you have an 8-core 8-gpu machine, just run 8 processes in parallel.

from latplan.

guicho271828 commented on May 18, 2024

If you do want to train the model, you may also want to prune some hyperparameters by looking at samples/*/grid_search.log. For example, this one is the best hyperparameter for mandrill 15-puzzle. Then you can edit strips.py and edit the dictionary.

from latplan.

wn4github commented on May 18, 2024

My immediate goal is to use Cube-space AE to encode some MNIST 8-puzzle images. Then to better understand Latplan, I am planning to train the network, get my hands dirty with the implementation. But this is out of the topic of the issue, maybe I should open a new one. Thank you for all your help.

from latplan.

Should I run setup again? about latplan HOT 9 CLOSED

Comments (9)

Related Issues (16)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent