Coder Social home page Coder Social logo

How can I run these code? about tf-gqn HOT 13 CLOSED

ogroth avatar ogroth commented on June 8, 2024
How can I run these code?

from tf-gqn.

Comments (13)

ogroth avatar ogroth commented on June 8, 2024

Hey Yangshell, there's no official description to run the code, yet. We're still experimenting with the architecture to find a good training setup to replicate the paper's results on the rooms_ring_camera dataset. Once we have a stable version and working model snapshots, we will merge the dev branch into master and write a detailed Readme.
In the meanwhile, you can check https://github.com/ogroth/tf-gqn/blob/rooms_ring_camera_training/train_gqn_draw.py
This is the run script training the GQN (provided you have downloaded the training data). However, we haven't managed to produce great visual results, yet.

from tf-gqn.

ogroth avatar ogroth commented on June 8, 2024

We have released a stable version of GQN which trains on the rooms_ring_camera dataset with the default parameters we provide. The training script is: https://github.com/ogroth/tf-gqn/blob/master/train_gqn_draw.py
Please see the Readme for detailed instructions on how to set up and run the code.

from tf-gqn.

wlred avatar wlred commented on June 8, 2024

i download the rooms_ring_camera dataset. but after i run the the script. the system will kill it. my computer has one 1080ti GPU. it not enough, needs better one ?

from tf-gqn.

ogroth avatar ogroth commented on June 8, 2024

i download the rooms_ring_camera dataset. but after i run the the script. the system will kill it. my computer has one 1080ti GPU. it not enough, needs better one ?

Hi wlred, could you please give a more detailed version of the error you get? Which script have you run (with which parameters) and what happened after it had been launched? Has it run into an out-of-memory error? A GTX 1080Ti is definitely sufficient to train the model.

from tf-gqn.

wlred avatar wlred commented on June 8, 2024

Hi ogroth, my steps:
(1)download the rooms_ring_camera dataset
(2)run the command:
1>source venv/bin/activate
2>python3 train_gqn_draw.py --data_dir /tmp/data/gqn-dataset --dataset rooms_ring_camera --model_dir /tmp/models/gqn

the output log is these:
Training a GQN.
FLAGS: Namespace(batch_size=36, chkpt_steps=10000, data_dir='/tmp/data/gqn-dataset', dataset='rooms_ring_camera', debug=False, initial_eval=False, log_steps=100, memcap=1.0, model_dir='/tmp/models/gqn', queue_buffer=64, queue_threads=4, train_epochs=40)
UNPARSED_ARGV: ['--mode_dir', '/tmp/models/gqn']
INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_log_step_count_steps': 100, '_global_id_in_cluster': 0, '_task_id': 0, '_service': None, '_session_config': gpu_options {
per_process_gpu_memory_fraction: 1.0
allow_growth: true
}
, '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_master': '', '_task_type': 'worker', '_tf_random_seed': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fec2a484f28>, '_model_dir': '/tmp/models/gqn', '_save_checkpoints_steps': 10000, '_keep_checkpoint_every_n_hours': 10000, '_num_worker_replicas': 1, '_save_summary_steps': 100, '_keep_checkpoint_max': 5, '_train_distribute': None}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
kill

(3) after run the train_gqn_draw.py script. maybe 20 second. the system kill the process. and my computer is slow

from tf-gqn.

ogroth avatar ogroth commented on June 8, 2024

You seem to have a typo in your CLI parameters when calling the script:

UNPARSED_ARGV: ['--mode_dir', '/tmp/models/gqn']

That should read: --model_dir /tmp/models/gqn

from tf-gqn.

wlred avatar wlred commented on June 8, 2024

run the command: python3 train_gqn_draw.py --data_dir /tmp/data/gqn-dataset --dataset rooms_ring_camera
still killed

from tf-gqn.

wlred avatar wlred commented on June 8, 2024

the output log is these:
Training a GQN.
FLAGS: Namespace(batch_size=36, chkpt_steps=10000, data_dir='/tmp/data/gqn-dataset', dataset='rooms_ring_camera', debug=False, initial_eval=False, log_steps=100, memcap=1.0, model_dir='/tmp/models/gqn', queue_buffer=64, queue_threads=4, train_epochs=40)
UNPARSED_ARGV: []
INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_log_step_count_steps': 100, '_global_id_in_cluster': 0, '_task_id': 0, '_service': None, '_session_config': gpu_options {
per_process_gpu_memory_fraction: 1.0
allow_growth: true
}
, '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_master': '', '_task_type': 'worker', '_tf_random_seed': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fec2a484f28>, '_model_dir': '/tmp/models/gqn', '_save_checkpoints_steps': 10000, '_keep_checkpoint_every_n_hours': 10000, '_num_worker_replicas': 1, '_save_summary_steps': 100, '_keep_checkpoint_max': 5, '_train_distribute': None}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
kill

from tf-gqn.

ogroth avatar ogroth commented on June 8, 2024

Have you tried to monitor your system with htop and nvidia-smi to check whether there is any unusual behaviour in terms of CPU / GPU usage or memory allocation? That's the only thing I can think off the top of my head which could cause the OS to kill the process. Which OS are you using?

from tf-gqn.

wlred avatar wlred commented on June 8, 2024

ubuntu 16.04

from tf-gqn.

wlred avatar wlred commented on June 8, 2024

hi ogroth, How big is the memory of your computer?

from tf-gqn.

ogroth avatar ogroth commented on June 8, 2024

We've trained on machines with 32GB of RAM, but training never occupied more than 8GB at any time.

from tf-gqn.

wlred avatar wlred commented on June 8, 2024

interesting, i had run your code on 3 computers. all can not run the code. all be killed. maybe
a lot of people have the same problem

from tf-gqn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.