All datasets in this repository are released under the CC BY 4.0 International
license, which can be found here:
https://creativecommons.org/licenses/by/4.0/legalcode. All source files in this
repository are released under the Apache 2.0 license, the text of which can be
found in the LICENSE file.
Because the repo is large, we recommend you download only the subdirectory of
interest:
Use GitHub editor to open the project. To open the editor change the url from
github.com to github.dev in the address bar.
In the left navigation panel, right-click on the folder of interest and select
download.
If you'd like to submit a pull request, you'll need to clone the repository;
we recommend making a shallow clone (without history).
resnet cifar10 resnet_wide_1.0x_batchnorm_aug_decay_0.0_2
2019-07-31 13:22:08.859328: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6325
pciBusID: 0000:65:00.0
2019-07-31 13:22:08.859377: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-31 13:22:08.859386: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-07-31 13:22:08.859393: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-07-31 13:22:08.859405: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-07-31 13:22:08.859413: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-07-31 13:22:08.859420: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-07-31 13:22:08.859428: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-07-31 13:22:08.859802: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-07-31 13:22:08.859822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-31 13:22:08.859826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-07-31 13:22:08.859829: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-07-31 13:22:08.860214: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10481 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1)
Collecting 3072 neurons from 4 layers (5024 samples, 10 objects)
W0731 13:22:08.969132 139832240281408 deprecation.py:323] From ~/google-research/demogen/models/resnet.py:47: batch_normalization (from tensorflow.python.layers.normalization) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.BatchNormalization instead. In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.batch_normalization` documentation).
2019-07-31 13:22:10.279506: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key resnet/group_norm/beta not found in checkpoint
Traceback (most recent call last):
File "demogen/parse_tuning.py", line 84, in <module>
all_activations, samples_per_object, layer_names, layer_indices, layer_n_neurons = elu.extract_layers(input_fn, root_dir, model_config)
File "~/google-research/demogen/extract_layers_util.py", line 98, in extract_layers
model_config.load_parameters(param_path, sess)
File "~/google-research/demogen/model_config.py", line 262, in load_parameters
saver.restore(tf_session, model_dir)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 1302, in restore
err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
2 root error(s) found.
(0) Not found: Key resnet/group_norm/beta not found in checkpoint
[[node save_2/RestoreV2 (defined at ~/google-research/demogen/model_config.py:261) ]]
(1) Not found: Key resnet/group_norm/beta not found in checkpoint
[[node save_2/RestoreV2 (defined at ~/google-research/demogen/model_config.py:261) ]]
[[save_2/RestoreV2/_383]]
0 successful operations.
0 derived errors ignored.
Original stack trace for u'save_2/RestoreV2':
File "demogen/parse_tuning.py", line 84, in <module>
all_activations, samples_per_object, layer_names, layer_indices, layer_n_neurons = elu.extract_layers(input_fn, root_dir, model_config)
File "~/google-research/demogen/extract_layers_util.py", line 98, in extract_layers
model_config.load_parameters(param_path, sess)
File "~/google-research/demogen/model_config.py", line 261, in load_parameters
saver = tf.train.Saver(model_var_list)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 825, in __init__
self.build()
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 837, in build
self._build(self._filename, build_save=True, build_restore=True)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 875, in _build
build_restore=build_restore)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 508, in _build_internal
restore_sequentially, reshape)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 328, in _AddRestoreOps
restore_sequentially)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 575, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1696, in restore_v2
name=name)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
self._traceback = tf_stack.extract_stack()
Invalid argument:
resnet cifar10 resnet_wide_1.0x_groupnorm_aug_decay_0.0_1
W0731 13:25:40.192379 140184543594304 deprecation_wrapper.py:119] From ~/google-research/demogen/extract_layers_util.py:68: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2019-07-31 13:25:40.193601: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-07-31 13:25:40.530512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6325
pciBusID: 0000:65:00.0
2019-07-31 13:25:40.530699: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-31 13:25:40.531574: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-07-31 13:25:40.532371: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-07-31 13:25:40.532577: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-07-31 13:25:40.533520: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-07-31 13:25:40.534268: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-07-31 13:25:40.536462: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-07-31 13:25:40.537216: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-07-31 13:25:40.537544: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-07-31 13:25:40.596500: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b0d5471f80 executing computations on platform CUDA. Devices:
2019-07-31 13:25:40.596528: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2019-07-31 13:25:40.627506: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3300000000 Hz
2019-07-31 13:25:40.628479: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b0d3885e60 executing computations on platform Host. Devices:
2019-07-31 13:25:40.628495: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
2019-07-31 13:25:40.628967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6325
pciBusID: 0000:65:00.0
2019-07-31 13:25:40.629006: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-31 13:25:40.629014: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-07-31 13:25:40.629021: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-07-31 13:25:40.629036: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-07-31 13:25:40.629043: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-07-31 13:25:40.629066: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-07-31 13:25:40.629073: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-07-31 13:25:40.629738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-07-31 13:25:40.629757: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-31 13:25:40.630519: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-31 13:25:40.630526: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-07-31 13:25:40.630529: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-07-31 13:25:40.631262: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10481 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1)
W0731 13:25:40.637204 140184543594304 deprecation.py:323] From ~/.local/lib64/python2.7/site-packages/tensor2tensor/data_generators/problem.py:680: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W0731 13:25:40.649481 140184543594304 deprecation_wrapper.py:119] From ~/.local/lib64/python2.7/site-packages/tensor2tensor/data_generators/image_utils.py:169: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.
W0731 13:25:40.820377 140184543594304 deprecation.py:323] From ~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/image_ops_impl.py:1514: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
W0731 13:25:40.825278 140184543594304 deprecation.py:323] From ~/google-research/demogen/data_util.py:76: make_one_shot_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_one_shot_iterator(dataset)`.
W0731 13:25:40.837275 140184543594304 deprecation_wrapper.py:119] From ~/google-research/demogen/extract_layers_util.py:76: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
Collecting 3072 neurons from 4 layers (5024 samples, 10 objects)
W0731 13:25:40.838011 140184543594304 deprecation_wrapper.py:119] From ~/google-research/demogen/models/resnet.py:383: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.
W0731 13:25:40.838236 140184543594304 deprecation.py:323] From ~/google-research/demogen/models/resnet.py:136: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.keras.layers.Conv2D` instead.
W0731 13:25:41.450165 140184543594304 deprecation.py:323] From ~/google-research/demogen/models/resnet.py:430: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dense instead.
W0731 13:25:41.451108 140184543594304 deprecation.py:506] From ~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/init_ops.py:1251: calling __init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0731 13:25:42.371058 140184543594304 deprecation_wrapper.py:119] From ~/google-research/demogen/model_config.py:261: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.
W0731 13:25:42.421227 140184543594304 deprecation.py:323] From ~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
Traceback (most recent call last):
File "demogen/parse_tuning.py", line 84, in <module>
all_activations, samples_per_object, layer_names, layer_indices, layer_n_neurons = elu.extract_layers(input_fn, root_dir, model_config)
File "~/google-research/demogen/extract_layers_util.py", line 98, in extract_layers
model_config.load_parameters(param_path, sess)
File "~/google-research/demogen/model_config.py", line 262, in load_parameters
saver.restore(tf_session, model_dir)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 1322, in restore
err, "a mismatch between the current graph and the graph")
tensorflow.python.framework.errors_impl.InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
2 root error(s) found.
(0) Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [1,32,1,1] rhs shape= [32]
[[node save/Assign_50 (defined at ~/google-research/demogen/model_config.py:261) ]]
[[save/RestoreV2/_120]]
(1) Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [1,32,1,1] rhs shape= [32]
[[node save/Assign_50 (defined at ~/google-research/demogen/model_config.py:261) ]]
0 successful operations.
0 derived errors ignored.
Errors may have originated from an input operation.
Input Source operations connected to node save/Assign_50:
resnet/group_norm_15/beta (defined at ~/google-research/demogen/models/resnet.py:66)
Input Source operations connected to node save/Assign_50:
resnet/group_norm_15/beta (defined at ~/google-research/demogen/models/resnet.py:66)
Original stack trace for u'save/Assign_50':
File "demogen/parse_tuning.py", line 84, in <module>
all_activations, samples_per_object, layer_names, layer_indices, layer_n_neurons = elu.extract_layers(input_fn, root_dir, model_config)
File "~/google-research/demogen/extract_layers_util.py", line 98, in extract_layers
model_config.load_parameters(param_path, sess)
File "~/google-research/demogen/model_config.py", line 261, in load_parameters
saver = tf.train.Saver(model_var_list)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 825, in __init__
self.build()
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 837, in build
self._build(self._filename, build_save=True, build_restore=True)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 875, in _build
build_restore=build_restore)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 508, in _build_internal
restore_sequentially, reshape)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 350, in _AddRestoreOps
assign_ops.append(saveable.restore(saveable_tensors, shapes))
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saving/saveable_object_util.py", line 72, in restore
self.op.get_shape().is_fully_defined())
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/state_ops.py", line 227, in assign
validate_shape=validate_shape)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 66, in assign
use_locking=use_locking, name=name)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
self._traceback = tf_stack.extract_stack()
ValueError logs:
resnet cifar10 resnet_wide_1.0x_groupnorm__decay_0.002_lr_0.001_3
2019-07-31 13:19:29.317723: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6325
pciBusID: 0000:65:00.0
2019-07-31 13:19:29.317779: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-31 13:19:29.317789: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-07-31 13:19:29.317803: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-07-31 13:19:29.317811: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-07-31 13:19:29.317819: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-07-31 13:19:29.317826: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-07-31 13:19:29.317834: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-07-31 13:19:29.318213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-07-31 13:19:29.318235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-31 13:19:29.318239: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-07-31 13:19:29.318243: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-07-31 13:19:29.318637: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10481 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1)
Collecting 3072 neurons from 4 layers (5024 samples, 10 objects)
2019-07-31 13:19:36.919128: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key resnet/batch_normalization/beta not found in checkpoint
resnet cifar10 resnet_wide_2.0x_batchnorm_aug_decay_0.0_1
2019-07-31 13:19:37.275569: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6325
pciBusID: 0000:65:00.0
2019-07-31 13:19:37.275624: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-31 13:19:37.275643: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-07-31 13:19:37.275652: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-07-31 13:19:37.275659: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-07-31 13:19:37.275667: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-07-31 13:19:37.275675: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-07-31 13:19:37.275691: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-07-31 13:19:37.276063: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-07-31 13:19:37.276086: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-31 13:19:37.276090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-07-31 13:19:37.276094: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-07-31 13:19:37.276490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10481 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1)
Collecting 3072 neurons from 4 layers (5024 samples, 10 objects)
Traceback (most recent call last):
File "demogen/parse_tuning.py", line 84, in <module>
all_activations, samples_per_object, layer_names, layer_indices, layer_n_neurons = elu.extract_layers(input_fn, root_dir, model_config)
File "~/google-research/demogen/extract_layers_util.py", line 89, in extract_layers
end_points_collection=end_points_collection)
File "~/google-research/demogen/models/resnet.py", line 391, in __call__
strides=self.conv_stride, data_format=self.data_format)
File "~/google-research/demogen/models/resnet.py", line 136, in conv2d_fixed_padding
data_format=data_format)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/layers/convolutional.py", line 424, in conv2d
return layer.apply(inputs)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1479, in apply
return self.__call__(inputs, *args, **kwargs)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/layers/base.py", line 537, in __call__
outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 591, in __call__
self._maybe_build(inputs)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1881, in _maybe_build
self.build(input_shapes)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/keras/layers/convolutional.py", line 165, in build
dtype=self.dtype)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/layers/base.py", line 450, in add_weight
**kwargs)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 384, in add_weight
aggregation=aggregation)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/tracking/base.py", line 663, in _add_variable_with_custom_getter
**kwargs_for_getter)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1496, in get_variable
aggregation=aggregation)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1239, in get_variable
aggregation=aggregation)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 562, in get_variable
aggregation=aggregation)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 514, in _true_getter
aggregation=aggregation)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 869, in _get_single_variable
(name, shape, found_var.get_shape()))
ValueError: Trying to share variable resnet/conv2d/kernel, but specified shape (3, 3, 3, 32) and found shape (3, 3, 3, 16).
Hi @gariel-google, are the models that you provide trained on images 416x128? When I tried inference with other resolutions it doesn't work well at all.
If it's indeed 416x128, have you tried training with higher resolutions? I know some previous work use 416x128 for training, but recently most methods use higher resolutions and experiments have demonstrated higher resolutions lead to better results. Is it something related to the GPU memory issue?
We've been doing some symbolic computation/mathematics for PyMC in the symbolic-pymc project, and, since we're moving to TensorFlow [Probability] and you folks have also done related things in TFP & Edward2, I would like to get your input on this kind of work in the context of TF[P].
More specifically, is anyone else working on tools for symbolic assessment/manipulation of TF graphs? We've had to do a bit of work to make TF graphs "symbolically manipulatable" and I'm always wondering if there's a better way, or if I'm missing out on any larger, concerted efforts to do so.
File "C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 1449, in import_meta_graph
**kwargs)[0]
File "C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 1463, in _import_meta_graph_with_return_elements
meta_graph_def = meta_graph.read_meta_graph_file(meta_graph_or_file)
File "C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\meta_graph.py", line 695, in read_meta_graph_file
text_format.Merge(file_content.decode("utf-8"), meta_graph_def)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 1: invalid continuation byte
We tried to recreate some of the results in dql_grasping. After setting up the environment according to requirements, we ran run_random_collect_oss.sh, and then run_train_collect_eval_oss.sh with dqn on-policy and dqn off-policy. The results shown in images below suggest the training didn't converge on policies with expected success rate, what steps should we take to reproduce similar results to those presented in the paper?
I was trying to train a BAM model using the command like python -m bam.run_classifier rte-mrpc-bam-model $BAM_DIR '{"task_names": ["rte", "mrpc"], "distill": true, "teachers": {"rte": "rte-model", "mrpc": "mrpc-model"}}' given in the Readme.md,an error occured like:
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.1.2\helpers\pydev\pydevd.py", line 1758, in <module>
main()
File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.1.2\helpers\pydev\pydevd.py", line 1752, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.1.2\helpers\pydev\pydevd.py", line 1147, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.1.2\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/yirongli/Desktop/bam/run_classifier.py", line 281, in <module>
tf.app.run()
File "C:\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "C:/Users/yirongli/Desktop/bam/run_classifier.py", line 274, in main
model_runner.write_outputs([task], trial, split)
File "C:/Users/yirongli/Desktop/bam/run_classifier.py", line 196, in write_outputs
distill_input_fn, _, _ = self._preprocessor.prepare_predict(tasks, split)
File "C:\Users\yirongli\Desktop\bam\data\preprocessing.py", line 69, in prepare_predict
return self._serialize_dataset(tasks, False, split)
File "C:\Users\yirongli\Desktop\bam\data\preprocessing.py", line 108, in _serialize_dataset
self.serialize_examples(examples, is_training, tfrecords_path)
File "C:\Users\yirongli\Desktop\bam\data\preprocessing.py", line 127, in serialize_examples
tf_example = self._example_to_tf_example(example, is_training)
File "C:\Users\yirongli\Desktop\bam\data\preprocessing.py", line 136, in _example_to_tf_example
example, is_training))
File "C:\Users\yirongli\Desktop\bam\task_specific\classification\classification_tasks.py", line 136, in featurize
self._distill_inputs[eid])
KeyError: 2490
When I tried to figure out, i found that in configure.py line 131, model always load distill_outputs from _train_predictions_1.pkl no matter the student model requiring training or testing prediction by teacher model in run_classifier.py line 269.
So it obviously goes wrong when get the testing prediction, hope to fix it
@gariel-google Dear author, Thanks for sharing the source code of the paper.
I was trying to reproduce the result of the paper using your code. However, with your default setting (batch size=4, learning_rate=0.0002, etc.) training from scratch, the result I got it's quite far from what you stated in the paper (Abs Rel 0.147 for the best checkpoint within around 370k-th step vs 0.128 in the paper). For your information, I am using the evaluation code from sfmlearner as what struct2depth does.
Therefore, may I know what's setting for obtaining the paper's result? Or is there anything critical part missing in the current released code (maybe pretrained checkpoint for example)?
Thank you in advance.
I'm getting this error while trying to train the model on a new dataset:
tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __inference_<lambda>_59670}} Name: , Feature list 'frame_labels' is required but could not be found. Did you mean to include it in feature_list_dense_missing_assumed_empty or feature_list_dense_defaults? [[{{node ParseSingleSequenceExample/ParseSingleSequenceExample}}]] [[MultiDeviceIteratorGetNextFromShard]] [[RemoteCall]] [[IteratorGetNext]] [Op:__inference_<lambda>_59670]
While checking the unprocessing, I could not make model similar to the one uploaded in google drive.
When I tried to train the model in our environment, graph does not have short noise and read noise.
I got below error when I tried to denoise the model which is created using the code committed in github.
The name 'stddev/shot_noise:0' refers to a Tensor which does not exist. The operation, 'stddev/shot_noise', does not exist in the graph. The name 'stddev/shot_noise:0' refers to a Tensor which does not exist. The operation, 'stddev/shot_noise', does not exist in the graph.
Could you check training code shared is latest one.
Hi,
What's the best way to handle mismatched lines in source and target. Currently, the code fails on line 111:
# Check whether num_targets < num_predictions
if next(pred_gen, None) is not None or next(pred_gen, None) is not None:
raise ValueError("Must have equal number of lines across target and "
"prediction files. Mismatch between files: %s, %s." %
(target_filename, prediction_filename))
My predictions files have more sentences than the target and the pyrouge package which is built on the PERL rouge handles this without breaking. Should I be pre-processing my target files to have same no: of lines as the prediction?
Can this be handled the rouge package level instead similar to the perl one?
Commenting out this check leads to 10 points lower in all metrics as compares to the pyrouge outputs.
Let me know if any additional code/output files are required to better understand this issue.
hi,can you give the unprocessing_srgb_loss code,I run the process.py to process bayer RGGB into sRGB imag on DND dataset.But the color of the restored image is not right.
Hi, I am running into the issue related to FailedPreconditionError: 2 root error, when I am trying to run the train.py within Unprocessing Images for Learned Raw Denoising. Do you have any idea why this will happen and how to address it. I am using tf_gpu_1.12.0, python 3.5.8, cuda 9.0, Linux. Thanks
that is passed as the argument for the dist parameter in the align function,
By using the negative of the matmul call, I believe the dynamic time warping to be finding the worst possible path. Experimentally I have verified that the reconstruction error is 0 when removing the negation.
Hello, Im trying to get prediction results (depth and camera parameters) given Image1, Image2 and Masks. But Im having hard time how to, assuming I have these images as numpy matrixes, can i make this prediction by calling a single method? I tried doing this myself but code is overwhelmingly complicated and I pretty much got lost
Im running the train code with single step and its executing without any errors, but cant find prediction results anywhere, tensorboard Image folder is empty.
Essentially what i am trying to create is a method with following signature depth_image, camera_calibration = model.predict(previous_image, current_image, mask)
I have found a method that returns self.est_depth which i presume is depth map, but I couldnt find out how to supply the inputs to that method and how do i retrieve predicted camera calibration
+Update
Setting Summary freq to 1 and training for 10 episodes on single data instance using provided code seems to generate images on Tensorboard python -m depth_from_video_in_the_wild.train --checkpoint_dir=***\trained_models --data_dir=***\depth_from_video_in_the_wild\data_example --train_steps=10
@sarahooker I really enjoyed the paper. I'd like to engage in a little speculation and I hope you'll indulge me.
Knowledge transfer during iterative sparsification
The lottery ticket result surprises me. I think you should be able to retrain to much closer to the same accuracy given the same initialization and a sparse mask. However I speculate that the magnitude pruning method induces knowledge transfer which prevents this.
Because the sparsity inducing mask changes during the iterative process you're dealing with some number of subnets. If they were fully disjoint you would transfer knowledge using one as a teacher and the other as the student. However in the iterative process you have a gradual knowledge transfer. Which means that the representations (and ultimate accuracy of the sparsified network) are no longer a function of sparse initial weights + training, but the full initial weights and the sparsification procedure.
If this is the case I suspect that if you do a single-step sparsification at the end of training and use that sparse mask along with the same initial weights (lottery ticket) you should see much closer accuracies.
(Iterative pruning is still a better way to do pruning of course.)
Knowledge reconstitution
I'm curious how much work has been done in the area of densifying sparse nets. For example, can you perfectly reverse the accurracy loss curves by increasing sparsity and retraining? Does it work better if you do this in one step (go from 90% sparsity to 70% sparsity by initializing a lot of random weights) or iteratively (90->85->80->75->70)
Ultimately the question is do you think a sparse bottleneck + densification + retraining procedure can produce a highly efficient and compressed version of finetuning?
Hi,
I was wondering if you could provide the pre-trained models for CVPR2019 paper title 'Unprocessing Images for Learned Raw Denoising'. I have looked into 'unprocessing' dir, no models or inference code, do you intend to release that or not?
I am trying to run the demogen example.py in a conda environment, and in either 2.7 or 3.7
get the following error:
File ".../google-research/demogen/data_util.py", line 73, in input_data
dataset = prob.dataset(mode)
File ".../anaconda3/envs/demogen/lib/python3.7/site-packages/tensor2tensor/data_generators/problem.py", line 631, in dataset
assert data_dir
each .py files in /mol_dqn/experimental contains such codes:
from mol_dqn.chemgraph.mcts import deep_q_networks from mol_dqn.chemgraph.mcts import molecules as molecules_mdp from mol_dqn.chemgraph.mcts import run_dqn
I was trying to use the rouge package in a multi reference scenario? Is it possible to provide an example in the rouge/readme on how one can do that?
In the current example in "How to run" where do you define the directory of the files?
Also, I am wondering how do you compute rouge for multi-reference scenario. Looking into the code you do some sort of bootstrap aggregation while in the original paper here (https://www.aclweb.org/anthology/W04-1013) in section 2 looks like they do simple micro-averaging while in section 2.1 it seems they do maximization over pairewise computation in rouge!
Hi, what the hell tensorflow.google is? I checked you official tf API and related topics on websites, but found nothing. My tf version is 1.12.0.
Thanks
After working-around a problem in example.py as described in previous issue I could load NIN models but not resnet models. The error is below:
I0716 13:20:53.269759 140097876026944 saver.py:1280] Restoring parameters from data/demogen_models/RESNET_CIFAR10/resnet_wide_1.0x_batchnorm__decay_0.0_1/model.ckpt-150000
Traceback (most recent call last):
File "/usr/lib64/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "~/google-research/demogen/example.py", line 62, in <module>
tf.app.run(main)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "~/.local/lib64/python2.7/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "~/.local/lib64/python2.7/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "~/google-research/demogen/example.py", line 57, in main
load_and_run(model_config, root_dir)
File "~/google-research/demogen/example.py", line 44, in load_and_run
sess.run(logits)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnimplementedError: Generic conv implementation only supports NHWC tensor format for now.
[[node resnet/conv2d/Conv2D (defined at /tmp/tmpdCZJAJ.py:12) ]]
Errors may have originated from an input operation.
Input Source operations connected to node resnet/conv2d/Conv2D:
transpose (defined at demogen/data_util.py:79)
resnet/conv2d/kernel/read (defined at demogen/models/resnet.py:136)
Original stack trace for u'resnet/conv2d/Conv2D':
File "/usr/lib64/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "~/google-research/demogen/example.py", line 62, in <module>
tf.app.run(main)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "~/.local/lib64/python2.7/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "~/.local/lib64/python2.7/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "~/google-research/demogen/example.py", line 57, in main
load_and_run(model_config, root_dir)
File "~/google-research/demogen/example.py", line 41, in load_and_run
logits = model_fn(image, is_training=False)
File "demogen/models/resnet.py", line 391, in __call__
strides=self.conv_stride, data_format=self.data_format)
File "demogen/models/resnet.py", line 136, in conv2d_fixed_padding
data_format=data_format)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/layers/convolutional.py", line 424, in conv2d
return layer.apply(inputs)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1479, in apply
return self.__call__(inputs, *args, **kwargs)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/layers/base.py", line 537, in __call__
outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/autograph/impl/api.py", line 146, in wrapper
), args, kwargs)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/autograph/impl/api.py", line 450, in converted_call
result = converted_f(*effective_args, **kwargs)
File "/tmp/tmpdCZJAJ.py", line 12, in tf__call
outputs = ag__.converted_call('_convolution_op', self, ag__.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (inputs, self.kernel), None)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/autograph/impl/api.py", line 356, in converted_call
return _call_unconverted(f, args, kwargs)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/autograph/impl/api.py", line 255, in _call_unconverted
return f(*args)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1079, in __call__
return self.conv_op(inp, filter)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 635, in __call__
return self.call(inp, filter)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 234, in __call__
name=self.name)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1953, in conv2d
name=name)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1071, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "~/.local/lib64/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
self._traceback = tf_stack.extract_stack()
My understand is it seems like we don't need per frame label to train the TCC model, right? But when I feed the tfrecord without per frame label, it throws an error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: {{function_node __inference_<lambda>_61009}} Name: , Feature list 'frame_labels' is required but could not be found. Did you \
mean to include it in feature_list_dense_missing_assumed_empty or feature_list_dense_defaults?
[[{{node ParseSingleSequenceExample/ParseSingleSequenceExample}}]]
[[MultiDeviceIteratorGetNextFromShard]]
[[RemoteCall]]
[[IteratorGetNext]]
[[conv3_block4_1_bn/beta0/buckets/cond/else/_865/Identity/_2076]]
(1) Invalid argument: {{function_node __inference_<lambda>_61009}} Name: , Feature list 'frame_labels' is required but could not be found. Did you \
mean to include it in feature_list_dense_missing_assumed_empty or feature_list_dense_defaults?
[[{{node ParseSingleSequenceExample/ParseSingleSequenceExample}}]]
[[MultiDeviceIteratorGetNextFromShard]]
[[RemoteCall]]
[[IteratorGetNext]]
0 successful operations.
0 derived errors ignored. [Op:__inference_<lambda>_61009]
Seems like the frame_label is required here. So can I just fill in dummy labels here?
Hello,
Thanks for releasing the code for 'Unprocessing Images ... Raw Denoising'. Upon trying the training process, I see that the training simulation gets stuck at this point -
This is my run command - python train.py --model_dir='./ckpts/' --train_pattern=/disk1/aashishsharma/Datasets/MIRFlickr_Dataset/train/* --test_pattern=/disk1/aashishsharma/Datasets/MIRFlickr_Dataset/test/*
Anybody knows this problem? any workaround? Thanks!
Hello,
a section of the tcc code in visualize_alignment.py has high potential for confusion and misuse. The align function is defined as follows:
def align(candidate_feats, query_feats, use_dtw):
"""Align videos based on nearest neighbor in embedding space."""
if use_dtw:
_, _, _, path = dtw(candidate_feats, query_feats, dist=dist_fn)
_, uix = np.unique(path[0], return_index=True)
nns = path[1][uix]
else:
nns = []
for i in range(len(candidate_feats)):
nn_frame_id, _ = get_nn(query_feats, candidate_feats[i])
nns.append(nn_frame_id)
return nns
The function call is: nns.append(align(embs[query], embs[candidate], use_dtw))
The positional arguments for the query and candidate features are reversed. Clearly, we do not want to iterate over the candidate frame matching it to the reference. There is no logical error as the arguments are passed in to the function in reverse order but it may lead to issues downstream if these functions are built upon.
The function definition should read:
def align(query_feats,candidate_feats, use_dtw):
"""Align videos based on nearest neighbor in embedding space."""
if use_dtw:
_, _, _, path = dtw(query_feats,candidate_feats, dist=dist_fn)
_, uix = np.unique(path[0], return_index=True)
nns = path[1][uix]
else:
nns = []
for i in range(len(query_feats)):
nn_frame_id, _ = get_nn(query_feats[i], candidate_feats)
nns.append(nn_frame_id)
return nns
Hello, the Penn Action dataset referred to in your paper does not have the key events and phase labels you mentioned after downloading. Can you disclose your work in order to reproduce your work?@debidatta
Thank you
Hi, I doubt that data_utils.get_num_dialog_examples() returns correct number.
In dstc8_single_domain & train, data_utils.get_num_dialog_examples() returns 82588, but the number of examples in dstc8_single_domain_train_examples.tf_record is 41294. I think these two numbers should be the same. Is it right? (The former is around as double as the latter because get_num_dialog_examples() counts USER and SYSTEM turns together. I think it's wrong.)
[Line no 226, graph_attention_learning.py, Watch Your Step] return tf.transpose(d_sum) * GetNumNodes() * 80, feed_dict
Why is an arbitrary scaling by the number of nodes done? I am not sure if it is reported in the Watch Your Step paper.
When I remove the scaling, there is some decrease in the results for most of the datasets
Results and Ablation Study:
PPI(without scaling,learn attention) - 90.84
PPI(with scaling,learn attention) - 91.8
PPI(uniform attention,scaling) - 91.7
Configs:
Embedding dimension:128
Share Embeddings: False
Transition_powers: 5
Loss: nlgl
context_regularizer: 0.1
learnable attention - softmax over 5 hops
uniform attention - Equal attention of earch of 5 hops(0.2 for each hop)
A couple of more questions:
Why is a validation set(validation positive edges, negative edges) not chosen for stopping? I know that Learning Edge Representations via Low-Rank Asymmetric Projections, other baselines and many of the graph embedding literature doesn't use it for link prediction, but I feel that stopping based on validation from ROC-AUC scores is more appropriate than stopping by best train ROC-AUC.
2.a) I see that attention makes only little contribution to an increase in the performance. Uniform attention works well. For example, in PPI, the paper reports that the attention learned favors the first hop. But not learning any attention(or in other words, uniform attention) also performs equally well. This is true even for Soc-Facebook, Wiki-Vote, ca-HepTh, ca-AstroPh.
2.b)
Why is the stopping criteria in line 441
"if i - 100 > eval_metrics['i at best train']:
LogMsg('Reached peak a while ago. Terminating...')
break"
based on training error? Shouldn't stopping criteria be always based on validation error?
2.c) in line 340 "eval_metrics['test auc at best train'] = float(test_auc)"
Why log and report test AUC at best train? We always report metrics and save model that gives the best performance on the validation set. If we modify the code to report the AUC / precision scores at least validation error then there is little to no difference between having weights and trainable attention.
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/dhlee347_estsoft_com/google-research/schema_guided_dst/baseline/train_and_predict.py", line 908, in
tf.compat.v1.app.run(main)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/home/dhlee347_estsoft_com/google-research/schema_guided_dst/baseline/train_and_predict.py", line 854, in main
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2876, in train
rendezvous.raise_errors()
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 131, in raise_errors
six.reraise(typ, value, traceback)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
saving_listeners=saving_listeners)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1192, in _train_model_default
saving_listeners)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1484, in _train_with_estimator_spec
_, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 754, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1252, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1353, in run
raise six.reraise(*original_exc_info)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1338, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1411, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1169, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1158, in _run
self._graph, fetches, feed_dict_tensor, feed_handles=feed_handles)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 487, in init
self._assert_fetchable(graph, fetch.op)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 500, in _assert_fetchable
'Operation %r has been marked as not fetchable.' % op.name) ValueError: Operation u'truediv' has been marked as not fetchable.
ERROR:tensorflow:Closing session due to error Step was cancelled by an explicit call to Session::Close().
Questions regarding denoiser models from paper "Unprocessing Images for Learned Raw Denoising". Can I run the inference for some arbitrary already noisy image (RGB) if I don't have access to noise information (shot and read noises) from metadata?
Specifically in the 'dnd_denoise.py' file, I can see the feed_dict comprised of noisy image, read and shot noise tensors.
Hi,I have few questions for the Amazon-2M dataset
1.Can the code of cluster_gcn run on Amazon-2M dataset?
2.for the dataset Amazon-2M
(1) Will the stopwords be removed?
(2) What is the ratio of training test data?
I try to run run_classifier.sh in cpu, it runs ok. But when I run in gpu, sometimes good, sometimes bad. I think it might have something to do with seq_max_length and batch_size. But it's useless to reduce seq_max_length and batch_size.
2019-03-21 14:21:12.342706: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3
2019-03-21 14:21:13.673814: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-21 14:21:13.673891: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 1 2 3
2019-03-21 14:21:13.673904: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N N N N
2019-03-21 14:21:13.673912: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1: N N N N
2019-03-21 14:21:13.673918: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2: N N N N
2019-03-21 14:21:13.673925: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3: N N N N
2019-03-21 14:21:13.675250: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15119 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pc
i bus id: 0000:00:07.0, compute capability: 6.0)
2019-03-21 14:21:13.676003: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15119 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pc
i bus id: 0000:00:08.0, compute capability: 6.0)
2019-03-21 14:21:13.676420: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15119 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pc
i bus id: 0000:00:09.0, compute capability: 6.0)
2019-03-21 14:21:13.676764: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15119 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pc
i bus id: 0000:00:0a.0, compute capability: 6.0)
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into ./mayi_model_2/model.ckpt.
2019-03-21 14:22:44.995671: E tensorflow/core/kernels/check_numerics_op.cc:185] abnormal_detected_host @0x1085900ab00 = {1, 0} Found Inf or NaN global norm.
INFO:tensorflow:Error recorded from training_loop: Found Inf or NaN global norm. : Tensor had NaN values
[[node VerifyFinite/CheckNumerics (defined at /var/log//bert_chi/optimization.py:74) = CheckNumerics[T=DT_FLOAT, message="Found Inf or NaN global norm.", _device="/job:localhost/replica:0/task:0/device:GPU:0"](global_n
orm/global_norm)]]
I thought the embeddings should be independent with --frames_per_batch param and should be able to get consistent results, is there something I'm missing?
Hi there, a problem occured when running the bam/run_classifier.py.
Tensorflow seems to lock the events.out.tfevents file until the whole program end, when execute the command utils.rmkdir(config.checkpoints_dir) at run_classfier.py line 271, tf.gfile.DeleteRecursively appears to can't be done and raise the error given below
Traceback (most recent call last):
File "D:/bam/run_classifier.py", line 281, in <module>
tf.app.run()
File "C:\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "D:/bam/run_classifier.py", line 276, in main
utils.rmkdir(config.checkpoints_dir)
File "D:\bam\helpers\utils.py", line 71, in rmkdir
rmrf(path)
File "D:\bam\helpers\utils.py", line 60, in rmrf
tf.gfile.DeleteRecursively(path)
File "C:\Python36\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 563, in delete_recursively
delete_recursively_v2(dirname)
File "C:\Python36\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 577, in delete_recursively_v2
pywrap_tensorflow.DeleteRecursively(compat.as_bytes(path), status)
File "C:\Python36\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.FailedPreconditionError: Failed to remove a directory: bam_dir\models\debug-model\checkpoints; Directory not empty
The enviroment currently use : tensorflow-gpu 1.13.1 windows10.
However, when run this code on centos, the problem doesn't exist.
Really hope to give me some advice
if posted opening does not march what you can contribute best in contact information where you can discuss or send in a proposal or speculative application
Into the persona graph embedding?
Also node2vec provides several graph samples (like FB Egonet, etc.) in https://snap.stanford.edu/node2vec/, shall I first convert those graph to the NetworkX format before running in