Training could start, but breaks halfway. I had to restart several times from where it last breaks.
Although it looks like hardware limitation, however i'm using a powerful laptop, Windows 10, GTX1060 GPU, 16GB DDR3, Intel i7&-7700HQ 2.8GHz, installed Cuda 9.0 and cudnn 7.05. I hope someone can give me advise on this. Thank you.
Below shows training in progress, and break error.
INFO:tensorflow:global step 345: loss = 0.2802 (0.166 sec/step)
INFO:tensorflow:global step 346: loss = 0.8893 (0.191 sec/step)
INFO:tensorflow:global step 346: loss = 0.8893 (0.191 sec/step)
2018-05-29 12:27:57.173433: W T:\src\github\tensorflow\tensorflow\core\framework\op_kernel.cc:1318] OP_REQUIRES failed at queue_ops.cc:105 : Invalid argument: Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,1,591,500,3]
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,1,591,500,3]
[[Node: batch/padding_fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_STRING, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, ..., DT_INT32, DT_INT32, DT_INT32, DT_STRING, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](batch/padding_fifo_queue, IteratorGetNext, Shape_9, IteratorGetNext:1, Shape_5, RandomHorizontalFlip/cond_1/Merge, Shape_1, IteratorGetNext:3, Shape_10, IteratorGetNext:4, Shape_8, IteratorGetNext:5, Shape_6, IteratorGetNext:6, Shape, IteratorGetNext:7, Shape_2, ExpandDims_1, Shape_4, IteratorGetNext:9, Shape_9, IteratorGetNext:10, Shape_9, IteratorGetNext:11, Shape_9)]]
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,1,591,500,3]
[[Node: batch/padding_fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_STRING, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, ..., DT_INT32, DT_INT32, DT_INT32, DT_STRING, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](batch/padding_fifo_queue, IteratorGetNext, Shape_9, IteratorGetNext:1, Shape_5, RandomHorizontalFlip/cond_1/Merge, Shape_1, IteratorGetNext:3, Shape_10, IteratorGetNext:4, Shape_8, IteratorGetNext:5, Shape_6, IteratorGetNext:6, Shape, IteratorGetNext:7, Shape_2, ExpandDims_1, Shape_4, IteratorGetNext:9, Shape_9, IteratorGetNext:10, Shape_9, IteratorGetNext:11, Shape_9)]]
INFO:tensorflow:global step 347: loss = 0.4380 (0.185 sec/step)
INFO:tensorflow:global step 347: loss = 0.4380 (0.185 sec/step)
INFO:tensorflow:Finished training! Saving model to disk.
INFO:tensorflow:Finished training! Saving model to disk.
Traceback (most recent call last):
File "train.py", line 184, in
tf.app.run()
File "C:\Users\default.LAPTOP-2CI68M4P\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\platform\app.py", line 126, in run
_sys.exit(main(argv))
File "train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "C:\tensorflow1\models\research\object_detection\trainer.py", line 399, in train
saver=saver)
File "C:\Users\default.LAPTOP-2CI68M4P\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\contrib\slim\python\slim\learning.py", line 784, in train
ignore_live_threads=ignore_live_threads)
File "C:\Users\default.LAPTOP-2CI68M4P\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\training\supervisor.py", line 828, in stop
ignore_live_threads=ignore_live_threads)
File "C:\Users\default.LAPTOP-2CI68M4P\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\training\coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "C:\Users\default.LAPTOP-2CI68M4P\Anaconda3\envs\tensorflow1\lib\site-packages\six.py", line 693, in reraise
raise value
File "C:\Users\default.LAPTOP-2CI68M4P\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\training\queue_runner_impl.py", line 252, in _run
enqueue_callable()
File "C:\Users\default.LAPTOP-2CI68M4P\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\client\session.py", line 1244, in _single_operation_run
self._call_tf_sessionrun(None, {}, [], target_list, None)
File "C:\Users\default.LAPTOP-2CI68M4P\Anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\client\session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,1,591,500,3]
[[Node: batch/padding_fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_STRING, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, ..., DT_INT32, DT_INT32, DT_INT32, DT_STRING, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](batch/padding_fifo_queue, IteratorGetNext, Shape_9, IteratorGetNext:1, Shape_5, RandomHorizontalFlip/cond_1/Merge, Shape_1, IteratorGetNext:3, Shape_10, IteratorGetNext:4, Shape_8, IteratorGetNext:5, Shape_6, IteratorGetNext:6, Shape, IteratorGetNext:7, Shape_2, ExpandDims_1, Shape_4, IteratorGetNext:9, Shape_9, IteratorGetNext:10, Shape_9, IteratorGetNext:11, Shape_9)]]
(tensorflow1) C:\tensorflow1\models\research\object_detection>