How can I run these code? about tf-gqn HOT 13 CLOSED

ogroth commented on June 8, 2024

How can I run these code?

from tf-gqn.

Comments (13)

ogroth commented on June 8, 2024

Hey Yangshell, there's no official description to run the code, yet. We're still experimenting with the architecture to find a good training setup to replicate the paper's results on the rooms_ring_camera dataset. Once we have a stable version and working model snapshots, we will merge the dev branch into master and write a detailed Readme.
In the meanwhile, you can check https://github.com/ogroth/tf-gqn/blob/rooms_ring_camera_training/train_gqn_draw.py
This is the run script training the GQN (provided you have downloaded the training data). However, we haven't managed to produce great visual results, yet.

from tf-gqn.

ogroth commented on June 8, 2024

We have released a stable version of GQN which trains on the rooms_ring_camera dataset with the default parameters we provide. The training script is: https://github.com/ogroth/tf-gqn/blob/master/train_gqn_draw.py
Please see the Readme for detailed instructions on how to set up and run the code.

from tf-gqn.

wlred commented on June 8, 2024

i download the rooms_ring_camera dataset. but after i run the the script. the system will kill it. my computer has one 1080ti GPU. it not enough, needs better one ?

from tf-gqn.

ogroth commented on June 8, 2024

i download the rooms_ring_camera dataset. but after i run the the script. the system will kill it. my computer has one 1080ti GPU. it not enough, needs better one ?

Hi wlred, could you please give a more detailed version of the error you get? Which script have you run (with which parameters) and what happened after it had been launched? Has it run into an out-of-memory error? A GTX 1080Ti is definitely sufficient to train the model.

from tf-gqn.

wlred commented on June 8, 2024

Hi ogroth, my steps:
(1)download the rooms_ring_camera dataset
(2)run the command:
1>source venv/bin/activate
2>python3 train_gqn_draw.py --data_dir /tmp/data/gqn-dataset --dataset rooms_ring_camera --model_dir /tmp/models/gqn

the output log is these:
Training a GQN.
FLAGS: Namespace(batch_size=36, chkpt_steps=10000, data_dir='/tmp/data/gqn-dataset', dataset='rooms_ring_camera', debug=False, initial_eval=False, log_steps=100, memcap=1.0, model_dir='/tmp/models/gqn', queue_buffer=64, queue_threads=4, train_epochs=40)
UNPARSED_ARGV: ['--mode_dir', '/tmp/models/gqn']
INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_log_step_count_steps': 100, '_global_id_in_cluster': 0, '_task_id': 0, '_service': None, '_session_config': gpu_options {
per_process_gpu_memory_fraction: 1.0
allow_growth: true
}
, '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_master': '', '_task_type': 'worker', '_tf_random_seed': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fec2a484f28>, '_model_dir': '/tmp/models/gqn', '_save_checkpoints_steps': 10000, '_keep_checkpoint_every_n_hours': 10000, '_num_worker_replicas': 1, '_save_summary_steps': 100, '_keep_checkpoint_max': 5, '_train_distribute': None}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
kill

(3) after run the train_gqn_draw.py script. maybe 20 second. the system kill the process. and my computer is slow

from tf-gqn.

ogroth commented on June 8, 2024

You seem to have a typo in your CLI parameters when calling the script:

UNPARSED_ARGV: ['--mode_dir', '/tmp/models/gqn']

That should read: --model_dir /tmp/models/gqn

from tf-gqn.

wlred commented on June 8, 2024

run the command: python3 train_gqn_draw.py --data_dir /tmp/data/gqn-dataset --dataset rooms_ring_camera
still killed

from tf-gqn.

wlred commented on June 8, 2024

the output log is these:
Training a GQN.
FLAGS: Namespace(batch_size=36, chkpt_steps=10000, data_dir='/tmp/data/gqn-dataset', dataset='rooms_ring_camera', debug=False, initial_eval=False, log_steps=100, memcap=1.0, model_dir='/tmp/models/gqn', queue_buffer=64, queue_threads=4, train_epochs=40)
UNPARSED_ARGV: []
INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_log_step_count_steps': 100, '_global_id_in_cluster': 0, '_task_id': 0, '_service': None, '_session_config': gpu_options {
per_process_gpu_memory_fraction: 1.0
allow_growth: true
}
, '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_master': '', '_task_type': 'worker', '_tf_random_seed': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fec2a484f28>, '_model_dir': '/tmp/models/gqn', '_save_checkpoints_steps': 10000, '_keep_checkpoint_every_n_hours': 10000, '_num_worker_replicas': 1, '_save_summary_steps': 100, '_keep_checkpoint_max': 5, '_train_distribute': None}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
kill

from tf-gqn.

ogroth commented on June 8, 2024

Have you tried to monitor your system with htop and nvidia-smi to check whether there is any unusual behaviour in terms of CPU / GPU usage or memory allocation? That's the only thing I can think off the top of my head which could cause the OS to kill the process. Which OS are you using?

from tf-gqn.

wlred commented on June 8, 2024

ubuntu 16.04

from tf-gqn.

wlred commented on June 8, 2024

hi ogroth, How big is the memory of your computer?

from tf-gqn.

ogroth commented on June 8, 2024

We've trained on machines with 32GB of RAM, but training never occupied more than 8GB at any time.

from tf-gqn.

wlred commented on June 8, 2024

interesting, i had run your code on 3 computers. all can not run the code. all be killed. maybe
a lot of people have the same problem

from tf-gqn.

How can I run these code? about tf-gqn HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent