Comments (9)
I don't think regenerating the dataset is necessary.
Regarding setup.py, git diff HEAD..refs/tags/4.1.3 then if there are no diff it should be good to go
from latplan.
Thank you for the suggestion. But in the end, I couldn't get the code to run due to the incompatibility issues between nvidia-tensorflow 1.15 and keras. I have to use the nvidia-tensorflow because RTX 30 series do not support CUDA 10 while prebuilt tensorflow 1.x does not support CUDA 11.
I'm wondering if you are currently porting the code to PyTorch?
from latplan.
yes, that aspect is also something I am struggling with. My lab cluster is also transitioning to CUDA11, so I am attempting to rewrite part of the code, but the development is slow.
from latplan.
you could try to build tf 1.15 for cuda 11.
from latplan.
I just learned that NVIDIA (not Google) provides a backward-compatible version of tensorflow 1.15 that works on cuda 11.
tensorflow/tensorflow#43629
https://developer.nvidia.com/blog/accelerating-tensorflow-on-a100-gpus/
It seems its package name is nvidia-tensorflow
.
from latplan.
Thank you so much for you help, Asaiさん.
I too discovered the nvidia-tensorflow
could work with CUDA 11 and I managed to get a working tf 1.15 with
- nvidia-tensorflow 1.15.4+nv20.12
- Keras 2.2.5
- keras-adabound 0.6.0
- Keras-Applications 1.0.8
Now I am using the 4.1.3 version from the release source code, and running ./setup-dataset.sh
is successful. However, I notice the difference of setup-dataset.sh
between 4.1.3 and the latest commit: In 4.1.3, no .npz
files are downloaded.
Anyway the problem I run into now is the error when executing ./train_all.sh
. I only uncomment the first line for training: task-planning learn_plot_dump_summary
. The trace is as follows
Fancy Traceback (most recent call last):
File ./strips.py line 294 function <module> : main()
mode = 'learn_plot_dump_summary'
sae_path = 'puzzle_mnist_3_3_5000_None_None_None_False_ConcreteDetNormalizedLogitAddEffectTransitionAE_planning'
default_parameters = {'epoch': 200, 'batch_size': 500, 'optimizer': 'radam', 'max_temperature': 5.0, 'min_temperature': 0.7, 'M': 2, 'train_gumbel': True, 'train_softmax': True, 'test_gumbel': False, 'test_softmax': False, 'locality': 0.0, 'locality_delay': 0.0, 'aeclass': 'ConcreteDetNormalizedLogitAddEffectTransitionAE'}
parameters = {'beta': [-0.3, -0.1, 0.0, 0.1, 0.3], 'lr': [0.1, 0.01, 0.001], 'N': [100, 200, 500, 1000], 'M': [2], 'layer': [1000], 'clayer': [16], 'dropout': [0.4], 'noise': [0.4], 'dropout_z': [False], 'activation': ['relu'], 'num_actions': [100, 200, 400, 800, 1600], 'aae_width': [100, 300, 600], 'aae_depth': [0, 1, 2], 'aae_activation': ['relu', 'tanh'], 'aae_delay': [0], 'direct': [0.1, 1.0, 10.0], 'direct_delay': [0.05, 0.1, 0.2, 0.3, 0.5], 'zerosuppress': [0.1, 0.2, 0.5], 'zerosuppress_delay': [0.05, 0.1, 0.2, 0.3, 0.5], 'loss': ['BCE'], 'type': ['mnist'], 'width': [3], 'height': [3], 'num_examples': [5000], 'stop_gradient': [False], 'aeclass': ['ConcreteDetNormalizedLogitAddEffectTransitionAE'], 'comment': ['planning']}
File ./strips.py line 290 function main : globals()[task](*map(myeval,sys.argv))
task = 'puzzle'
File ./strips.py line 208 function puzzle : show_summary(ae, train, test)
type = 'mnist'
width = 3
height = 3
num_examples = 5000
N = None
num_actions = None
direct = None
stop_gradient = False
aeclass = 'ConcreteDetNormalizedLogitAddEffectTransitionAE'
comment = 'planning'
name = 'comment'
value = 'planning'
path = '/home/wn/workspace/latplan/latplan-4.1.3_original/latplan/puzzles/puzzle-mnist-3-3.npz'
data = <numpy.ndarray float32 (5000, 2, 42, 42)>
pre_configs = <numpy.ndarray float64 (5000, 9)>
suc_configs = <numpy.ndarray float64 (5000, 9)>
pres = <numpy.ndarray float32 (5000, 42, 42)>
sucs = <numpy.ndarray float32 (5000, 42, 42)>
transitions = <numpy.ndarray float32 (2, 5000, 42, 42)>
states = <numpy.ndarray float32 (10000, 42, 42)>
train = <numpy.ndarray float32 (4500, 2, 42, 42)>
val = <numpy.ndarray float32 (250, 2, 42, 42)>
test = <numpy.ndarray float32 (250, 2, 42, 42)>
ae = None
File ./strips.py line 180 function show_summary : ae.summary()
ae = None
train = <numpy.ndarray float32 (4500, 2, 42, 42)>
test = <numpy.ndarray float32 (250, 2, 42, 42)>
AttributeError: 'NoneType' object has no attribute 'summary'
I am also considering to use the trained weights directly if training cannot be done, but I need some guidance.
from latplan.
Now I am using the 4.1.3 version from the release source code, and running ./setup-dataset.sh is successful. However, I notice the difference of setup-dataset.sh between 4.1.3 and the latest commit: In 4.1.3, no .npz files are downloaded.
setup-dataset also downloads unrelated npz files that are not used in ijcai paper (but are used on other papers). Sorry for this confusion, this is because this entire repository is a kind of my "lab environment" which sets up everything I use for all of my papers. The failed ones for photorealistic-blocksworld are not used, so no worries. Instead, all datasets needed for reproducing the ijcai paper are rendered locally using a script included in this repo.
Anyway the problem I run into now is the error when executing ./train_all.sh.
Since you already have the trained weights, running this script is not necessary. All results including the csv dump and the PDDL domain file is included in the archive.
AttributeError: 'NoneType' object has no attribute 'summary'
Here is what is happening: task-planning learn_plot_dump_summary
tries to run the training. However, since samples/*/grid_search.log
already has more entires than the specified limit of hyperparameter configurations (300), it did not run the training. Thus the model instance (ae
) is None
.
If you want to regenerate the reconstructions etc., then task-planning plot_dump_summary
would load the stored weights and produce a reconstruction plot and dump several files necessary for generating pddl files. Be sure that the archive is decompressed in a correct directory. It should be made so that samples/
directory is in the root of the repository.
The hyperparameter search is completely parallelized in the process level. So, if you have an 8-core 8-gpu machine, just run 8 processes in parallel.
from latplan.
If you do want to train the model, you may also want to prune some hyperparameters by looking at samples/*/grid_search.log
. For example, this one is the best hyperparameter for mandrill 15-puzzle. Then you can edit strips.py
and edit the dictionary.
from latplan.
My immediate goal is to use Cube-space AE to encode some MNIST 8-puzzle images. Then to better understand Latplan, I am planning to train the network, get my hands dirty with the implementation. But this is out of the topic of the issue, maybe I should open a new one. Thank you for all your help.
from latplan.
Related Issues (16)
- lack of file latplan.py HOT 4
- How to use the Cube-Space AE (with trained weights) to encode images HOT 2
- Error when trying to train CubeSpace-AE HOT 12
- Issues with MagicFFI HOT 7
- grovelling +magic-version+ from MAGIC_VERSION in magic.h header file failed! HOT 14
- error while installing HOT 2
- some command line examples ?
- Compatibility issue + error HOT 1
- No definition of plot_transitions in network.py HOT 2
- Questions about train_propositional.sh HOT 1
- Hi, could you please explain more for your code? HOT 26
- No such file or directory: ...aux.json HOT 1
- Building Arrival validator HOT 2
- Can you provide an installation script for MacOS? HOT 10
- Could you please add more information for ZSAE? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from latplan.