pudae / tensorflow-densenet Goto Github PK
View Code? Open in Web Editor NEWTensorflow-DenseNet with ImageNet Pretrained Models
License: Apache License 2.0
Tensorflow-DenseNet with ImageNet Pretrained Models
License: Apache License 2.0
Thanks for making this available!
I couldn't figure out the naming scheme of DenseNet in the paper, and would like to ask a quick question here. In table 1 of the paper, there are "DenseNet-121", "DenseNet-169", "DenseNet-201" and "DenseNet-264", but didn't mention "DenseNet-161", which you provided in the pre-trained model. The paper did mention "our largest model (DenseNet-161) ...", but I couldn't find more details about it. Could you provide some pointer or more details about that please? Thanks!
Hi, pudae:
When I try to fine-tune in my own dataset use tf_densenet121 imagenet model, it gives the following error:
2017-11-09 11:42:15.694613: W tensorflow/core/framework/op_kernel.cc:1158] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ./model/tf-densenet121/tf_densenet121.ckpt
is there problem with the pre-trained model?
I understand that MNIST is a grayscale image so how to make this code work with GrayScale images, it works fine with RGB images
normal output:
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [1,1,1024,10] rhs shape= [1,1,1024,5]
[[Node: save/Assign_1299 = Assign[T=DT_FLOAT, _class=["loc:@densenet121/logits/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](densenet121/logits/weights/RMSProp, save/RestoreV2/_7)]]
after editing preprocessing output:
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [7,7,1,64] rhs shape= [7,7,3,64]
[[Node: save/Assign_8 = Assign[T=DT_FLOAT, _class=["loc:@densenet121/conv1/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](densenet121/conv1/weights, save/RestoreV2/_2667)]]
Hi all,
I use Python 2.7.13 and Tensorflow 1.3.0 on CPU.
I want to use DensNet( https://github.com/pudae/tensorflow-densenet ) for regression problem. My data contains 60000 jpeg images with 37 float labels for each image.
I saved my data into tfrecords files by:
`
def Read_Labels(label_path):
labels_csv = pd.read_csv(label_path)
labels = np.array(labels_csv)
return labels[:,1:]
def load_image(addr):
# read an image and resize to (224, 224)
img = cv2.imread(addr)
img = cv2.resize(img, (224, 224), interpolation=cv2.INTER_CUBIC)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = img.astype(np.float32)
return img
def Shuffle_images_with_labels(shuffle_data, photo_filenames, labels):
if shuffle_data:
c = list(zip(photo_filenames, labels))
shuffle(c)
addrs, labels = zip(*c)
return addrs, labels
def image_to_tfexample_mine(image_data, image_format, height, width, label):
return tf.train.Example(features=tf.train.Features(feature={
'image/encoded': bytes_feature(image_data),
'image/format': bytes_feature(image_format),
'image/class/label': _float_feature(label),
'image/height': int64_feature(height),
'image/width': int64_feature(width),
}))
def _convert_dataset(split_name, filenames, labels, dataset_dir):
assert split_name in ['train', 'validation']
num_per_shard = int(math.ceil(len(filenames) / float(_NUM_SHARDS)))
with tf.Graph().as_default():
for shard_id in range(_NUM_SHARDS):
output_filename = _get_dataset_filename(dataset_path, split_name, shard_id)
with tf.python_io.TFRecordWriter(output_filename) as tfrecord_writer:
start_ndx = shard_id * num_per_shard
end_ndx = min((shard_id+1) * num_per_shard, len(filenames))
for i in range(start_ndx, end_ndx):
sys.stdout.write('\r>> Converting image %d/%d shard %d' % (
i+1, len(filenames), shard_id))
sys.stdout.flush()
img = load_image(filenames[i])
image_data = tf.compat.as_bytes(img.tostring())
label = labels[i]
example = image_to_tfexample_mine(image_data, image_format, height, width, label)
# Serialize to string and write on the file
tfrecord_writer.write(example.SerializeToString())
sys.stdout.write('\n')
sys.stdout.flush()
def run(dataset_dir):
labels = Read_Labels(dataset_dir + '/training_labels.csv')
photo_filenames = _get_filenames_and_classes(dataset_dir + '/images_training')
shuffle_data = True
photo_filenames, labels = Shuffle_images_with_labels(
shuffle_data,photo_filenames, labels)
training_filenames = photo_filenames[_NUM_VALIDATION:]
training_labels = labels[_NUM_VALIDATION:]
validation_filenames = photo_filenames[:_NUM_VALIDATION]
validation_labels = labels[:_NUM_VALIDATION]
_convert_dataset('train',
training_filenames, training_labels, dataset_path)
_convert_dataset('validation',
validation_filenames, validation_labels, dataset_path)
print('\nFinished converting the Flowers dataset!')`
And I decode it by:
`
with tf.Session() as sess:
feature = {
'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''),
'image/format': tf.FixedLenFeature((), tf.string, default_value='jpeg'),
'image/class/label': tf.FixedLenFeature(
[37,], tf.float32, default_value=tf.zeros([37,], dtype=tf.float32)),
}
filename_queue = tf.train.string_input_producer([data_path], num_epochs=1)
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
features = tf.parse_single_example(serialized_example, features=feature)
image = tf.decode_raw(features['image/encoded'], tf.float32)
print(image.get_shape())
label = tf.cast(features['image/class/label'], tf.float32)
image = tf.reshape(image, [224, 224, 3])
images, labels = tf.train.shuffle_batch([image, label], batch_size=10, capacity=30, num_threads=1, min_after_dequeue=10)
init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())
sess.run(init_op)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for batch_index in range(6):
img, lbl = sess.run([images, labels])
img = img.astype(np.uint8)
print(img.shape)
for j in range(6):
plt.subplot(2, 3, j+1)
plt.imshow(img[j, ...])
plt.show()
coord.request_stop()
coord.join(threads)`
It's all fine up to this point. But when I use the bellow commands for decoding TFRecord files:
`
reader = tf.TFRecordReader
keys_to_features = {
'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''),
'image/format': tf.FixedLenFeature((), tf.string, default_value='raw'),
'image/class/label': tf.FixedLenFeature(
[37,], tf.float32, default_value=tf.zeros([37,], dtype=tf.float32)),
}
items_to_handlers = {
'image': slim.tfexample_decoder.Image('image/encoded'),
'label': slim.tfexample_decoder.Tensor('image/class/label'),
}
decoder = slim.tfexample_decoder.TFExampleDecoder(
keys_to_features, items_to_handlers)`
I get the following error.
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, assertion failed: [Unable to decode bytes as JPEG, PNG, GIF, or BMP]
[[Node: case/If_0/decode_image/cond_jpeg/cond_png/cond_gif/Assert_1/Assert = Assert[T=[DT_STRING], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](case/If_0/decode_image/cond_jpeg/cond_png/cond_gif/is_bmp, case/If_0/decode_image/cond_jpeg/cond_png/cond_gif/Assert_1/Assert/data_0)]]
INFO:tensorflow:Caught OutOfRangeError. Stopping Training.
INFO:sensorflow:Finished training! Saving model to disk.
To use Densenet for my problem, I should fix this error first.
Could anybody please help me out of this problem. This code works perfectly for the datasets like flowers, MNIST and CIFAR10 available at https://github.com/pudae/tensorflow-densenet/tree/master/datasets but does not work for my data.
In densenet.py Line: 48
shouldn't it be ??
if dropout_rate:
>>>>>net = tf.nn.dropout(net,keep_prob=dropout_rate)
Using the inspect_checkpoint
tool on the pretrained model, I can see that the convolutional layers miss their biases.
densenet169/dense_block4/conv_block4/x1/BatchNorm/beta (DT_FLOAT) [736]
densenet169/dense_block4/conv_block4/x1/BatchNorm/gamma (DT_FLOAT) [736]
densenet169/dense_block4/conv_block4/x1/BatchNorm/moving_mean (DT_FLOAT) [736]
densenet169/dense_block4/conv_block4/x1/BatchNorm/moving_variance (DT_FLOAT) [736]
densenet169/dense_block4/conv_block4/x1/Conv/weights (DT_FLOAT) [1,1,736,128]
densenet169/dense_block4/conv_block4/x2/BatchNorm/beta (DT_FLOAT) [128]
densenet169/dense_block4/conv_block4/x2/BatchNorm/gamma (DT_FLOAT) [128]
densenet169/dense_block4/conv_block4/x2/BatchNorm/moving_mean (DT_FLOAT) [128]
densenet169/dense_block4/conv_block4/x2/BatchNorm/moving_variance (DT_FLOAT) [128]
densenet169/dense_block4/conv_block4/x2/Conv/weights (DT_FLOAT) [3,3,128,32]
How were the Keras checkpoints converted? Did something go wrong?
Hello I'm trying to train densenet as part of Faster-RCNN architecture using pre-trained checkpoint, I would like to know if there is required preprocessing image step required to successfully run the model. Thanks!
Hi, thanks for your work. I found the DenseNet-121 on ImageNet's feature map size in
each block is [56,28,14,7], however in the pre-trained model is [55,27,13,6]
densenet121/conv1/convolution [-1, 112, 112, 64]
densenet121/dense_block1/conv_block1/x1/Conv/convolution [-1, 55, 55, 128]
densenet121/dense_block1/conv_block1/x2/Conv/convolution [-1, 55, 55, 32]
densenet121/dense_block1/conv_block2/x1/Conv/convolution [-1, 55, 55, 128]
densenet121/dense_block1/conv_block2/x2/Conv/convolution [-1, 55, 55, 32]
densenet121/dense_block1/conv_block3/x1/Conv/convolution [-1, 55, 55, 128]
densenet121/dense_block1/conv_block3/x2/Conv/convolution [-1, 55, 55, 32]
densenet121/dense_block1/conv_block4/x1/Conv/convolution [-1, 55, 55, 128]
densenet121/dense_block1/conv_block4/x2/Conv/convolution [-1, 55, 55, 32]
densenet121/dense_block1/conv_block5/x1/Conv/convolution [-1, 55, 55, 128]
densenet121/dense_block1/conv_block5/x2/Conv/convolution [-1, 55, 55, 32]
densenet121/dense_block1/conv_block6/x1/Conv/convolution [-1, 55, 55, 128]
densenet121/dense_block1/conv_block6/x2/Conv/convolution [-1, 55, 55, 32]
densenet121/transition_block1/blk/Conv/convolution [-1, 55, 55, 128]
densenet121/transition_block1/AvgPool2D/AvgPool [-1, 27, 27, 128]
densenet121/dense_block2/conv_block1/x1/Conv/convolution [-1, 27, 27, 128]
densenet121/dense_block2/conv_block1/x2/Conv/convolution [-1, 27, 27, 32]
densenet121/dense_block2/conv_block2/x1/Conv/convolution [-1, 27, 27, 128]
densenet121/dense_block2/conv_block2/x2/Conv/convolution [-1, 27, 27, 32]
densenet121/dense_block2/conv_block3/x1/Conv/convolution [-1, 27, 27, 128]
densenet121/dense_block2/conv_block3/x2/Conv/convolution [-1, 27, 27, 32]
densenet121/dense_block2/conv_block4/x1/Conv/convolution [-1, 27, 27, 128]
densenet121/dense_block2/conv_block4/x2/Conv/convolution [-1, 27, 27, 32]
densenet121/dense_block2/conv_block5/x1/Conv/convolution [-1, 27, 27, 128]
densenet121/dense_block2/conv_block5/x2/Conv/convolution [-1, 27, 27, 32]
densenet121/dense_block2/conv_block6/x1/Conv/convolution [-1, 27, 27, 128]
densenet121/dense_block2/conv_block6/x2/Conv/convolution [-1, 27, 27, 32]
densenet121/dense_block2/conv_block7/x1/Conv/convolution [-1, 27, 27, 128]
densenet121/dense_block2/conv_block7/x2/Conv/convolution [-1, 27, 27, 32]
densenet121/dense_block2/conv_block8/x1/Conv/convolution [-1, 27, 27, 128]
densenet121/dense_block2/conv_block8/x2/Conv/convolution [-1, 27, 27, 32]
densenet121/dense_block2/conv_block9/x1/Conv/convolution [-1, 27, 27, 128]
densenet121/dense_block2/conv_block9/x2/Conv/convolution [-1, 27, 27, 32]
densenet121/dense_block2/conv_block10/x1/Conv/convolution [-1, 27, 27, 128]
densenet121/dense_block2/conv_block10/x2/Conv/convolution [-1, 27, 27, 32]
densenet121/dense_block2/conv_block11/x1/Conv/convolution [-1, 27, 27, 128]
densenet121/dense_block2/conv_block11/x2/Conv/convolution [-1, 27, 27, 32]
densenet121/dense_block2/conv_block12/x1/Conv/convolution [-1, 27, 27, 128]
densenet121/dense_block2/conv_block12/x2/Conv/convolution [-1, 27, 27, 32]
densenet121/transition_block2/blk/Conv/convolution [-1, 27, 27, 256]
densenet121/transition_block2/AvgPool2D/AvgPool [-1, 13, 13, 256]
densenet121/dense_block3/conv_block1/x1/Conv/convolution [-1, 13, 13, 128]
densenet121/dense_block3/conv_block1/x2/Conv/convolution [-1, 13, 13, 32]
densenet121/dense_block3/conv_block2/x1/Conv/convolution [-1, 13, 13, 128]
densenet121/dense_block3/conv_block2/x2/Conv/convolution [-1, 13, 13, 32]
densenet121/dense_block3/conv_block3/x1/Conv/convolution [-1, 13, 13, 128]
densenet121/dense_block3/conv_block3/x2/Conv/convolution [-1, 13, 13, 32]
densenet121/dense_block3/conv_block4/x1/Conv/convolution [-1, 13, 13, 128]
densenet121/dense_block3/conv_block4/x2/Conv/convolution [-1, 13, 13, 32]
densenet121/dense_block3/conv_block5/x1/Conv/convolution [-1, 13, 13, 128]
densenet121/dense_block3/conv_block5/x2/Conv/convolution [-1, 13, 13, 32]
densenet121/dense_block3/conv_block6/x1/Conv/convolution [-1, 13, 13, 128]
densenet121/dense_block3/conv_block6/x2/Conv/convolution [-1, 13, 13, 32]
densenet121/dense_block3/conv_block7/x1/Conv/convolution [-1, 13, 13, 128]
densenet121/dense_block3/conv_block7/x2/Conv/convolution [-1, 13, 13, 32]
densenet121/dense_block3/conv_block8/x1/Conv/convolution [-1, 13, 13, 128]
densenet121/dense_block3/conv_block8/x2/Conv/convolution [-1, 13, 13, 32]
densenet121/dense_block3/conv_block9/x1/Conv/convolution [-1, 13, 13, 128]
densenet121/dense_block3/conv_block9/x2/Conv/convolution [-1, 13, 13, 32]
densenet121/dense_block3/conv_block10/x1/Conv/convolution [-1, 13, 13, 128]
densenet121/dense_block3/conv_block10/x2/Conv/convolution [-1, 13, 13, 32]
densenet121/dense_block3/conv_block11/x1/Conv/convolution [-1, 13, 13, 128]
densenet121/dense_block3/conv_block11/x2/Conv/convolution [-1, 13, 13, 32]
densenet121/dense_block3/conv_block12/x1/Conv/convolution [-1, 13, 13, 128]
densenet121/dense_block3/conv_block12/x2/Conv/convolution [-1, 13, 13, 32]
densenet121/dense_block3/conv_block13/x1/Conv/convolution [-1, 13, 13, 128]
densenet121/dense_block3/conv_block13/x2/Conv/convolution [-1, 13, 13, 32]
densenet121/dense_block3/conv_block14/x1/Conv/convolution [-1, 13, 13, 128]
densenet121/dense_block3/conv_block14/x2/Conv/convolution [-1, 13, 13, 32]
densenet121/dense_block3/conv_block15/x1/Conv/convolution [-1, 13, 13, 128]
densenet121/dense_block3/conv_block15/x2/Conv/convolution [-1, 13, 13, 32]
densenet121/dense_block3/conv_block16/x1/Conv/convolution [-1, 13, 13, 128]
densenet121/dense_block3/conv_block16/x2/Conv/convolution [-1, 13, 13, 32]
densenet121/dense_block3/conv_block17/x1/Conv/convolution [-1, 13, 13, 128]
densenet121/dense_block3/conv_block17/x2/Conv/convolution [-1, 13, 13, 32]
densenet121/dense_block3/conv_block18/x1/Conv/convolution [-1, 13, 13, 128]
densenet121/dense_block3/conv_block18/x2/Conv/convolution [-1, 13, 13, 32]
densenet121/dense_block3/conv_block19/x1/Conv/convolution [-1, 13, 13, 128]
densenet121/dense_block3/conv_block19/x2/Conv/convolution [-1, 13, 13, 32]
densenet121/dense_block3/conv_block20/x1/Conv/convolution [-1, 13, 13, 128]
densenet121/dense_block3/conv_block20/x2/Conv/convolution [-1, 13, 13, 32]
densenet121/dense_block3/conv_block21/x1/Conv/convolution [-1, 13, 13, 128]
densenet121/dense_block3/conv_block21/x2/Conv/convolution [-1, 13, 13, 32]
densenet121/dense_block3/conv_block22/x1/Conv/convolution [-1, 13, 13, 128]
densenet121/dense_block3/conv_block22/x2/Conv/convolution [-1, 13, 13, 32]
densenet121/dense_block3/conv_block23/x1/Conv/convolution [-1, 13, 13, 128]
densenet121/dense_block3/conv_block23/x2/Conv/convolution [-1, 13, 13, 32]
densenet121/dense_block3/conv_block24/x1/Conv/convolution [-1, 13, 13, 128]
densenet121/dense_block3/conv_block24/x2/Conv/convolution [-1, 13, 13, 32]
densenet121/transition_block3/blk/Conv/convolution [-1, 13, 13, 512]
densenet121/transition_block3/AvgPool2D/AvgPool [-1, 6, 6, 512]
densenet121/dense_block4/conv_block1/x1/Conv/convolution [-1, 6, 6, 128]
densenet121/dense_block4/conv_block1/x2/Conv/convolution [-1, 6, 6, 32]
densenet121/dense_block4/conv_block2/x1/Conv/convolution [-1, 6, 6, 128]
densenet121/dense_block4/conv_block2/x2/Conv/convolution [-1, 6, 6, 32]
densenet121/dense_block4/conv_block3/x1/Conv/convolution [-1, 6, 6, 128]
densenet121/dense_block4/conv_block3/x2/Conv/convolution [-1, 6, 6, 32]
densenet121/dense_block4/conv_block4/x1/Conv/convolution [-1, 6, 6, 128]
densenet121/dense_block4/conv_block4/x2/Conv/convolution [-1, 6, 6, 32]
densenet121/dense_block4/conv_block5/x1/Conv/convolution [-1, 6, 6, 128]
densenet121/dense_block4/conv_block5/x2/Conv/convolution [-1, 6, 6, 32]
densenet121/dense_block4/conv_block6/x1/Conv/convolution [-1, 6, 6, 128]
densenet121/dense_block4/conv_block6/x2/Conv/convolution [-1, 6, 6, 32]
densenet121/dense_block4/conv_block7/x1/Conv/convolution [-1, 6, 6, 128]
densenet121/dense_block4/conv_block7/x2/Conv/convolution [-1, 6, 6, 32]
densenet121/dense_block4/conv_block8/x1/Conv/convolution [-1, 6, 6, 128]
densenet121/dense_block4/conv_block8/x2/Conv/convolution [-1, 6, 6, 32]
densenet121/dense_block4/conv_block9/x1/Conv/convolution [-1, 6, 6, 128]
densenet121/dense_block4/conv_block9/x2/Conv/convolution [-1, 6, 6, 32]
densenet121/dense_block4/conv_block10/x1/Conv/convolution [-1, 6, 6, 128]
densenet121/dense_block4/conv_block10/x2/Conv/convolution [-1, 6, 6, 32]
densenet121/dense_block4/conv_block11/x1/Conv/convolution [-1, 6, 6, 128]
densenet121/dense_block4/conv_block11/x2/Conv/convolution [-1, 6, 6, 32]
densenet121/dense_block4/conv_block12/x1/Conv/convolution [-1, 6, 6, 128]
densenet121/dense_block4/conv_block12/x2/Conv/convolution [-1, 6, 6, 32]
densenet121/dense_block4/conv_block13/x1/Conv/convolution [-1, 6, 6, 128]
densenet121/dense_block4/conv_block13/x2/Conv/convolution [-1, 6, 6, 32]
densenet121/dense_block4/conv_block14/x1/Conv/convolution [-1, 6, 6, 128]
densenet121/dense_block4/conv_block14/x2/Conv/convolution [-1, 6, 6, 32]
densenet121/dense_block4/conv_block15/x1/Conv/convolution [-1, 6, 6, 128]
densenet121/dense_block4/conv_block15/x2/Conv/convolution [-1, 6, 6, 32]
densenet121/dense_block4/conv_block16/x1/Conv/convolution [-1, 6, 6, 128]
densenet121/dense_block4/conv_block16/x2/Conv/convolution [-1, 6, 6, 32]
do we need to change the this line to the following in densenet.py
net = slim.conv2d(net, num_filters, 7, stride=2, scope='conv1')
to
net = slim.conv2d(net, num_filters, 7, stride=2, scope='conv1',padding="VALID")
Looking forward to your reply
Hello, I met a problem when using densenet121 pre-trained model in tensorflow using saver.restore(sess, FLAGS.checkpoint_path). The error said that can not find some parameters in checkpoints. part log is below:
2019-02-25 20:38:07.183012: W tensorflow/core/framework/op_kernel.cc:1158] Not found: Key densenet121/dense_block1/conv_block2/x2/Conv/biases not found in che ckpoint
2019-02-25 20:38:07.185169: W tensorflow/core/framework/op_kernel.cc:1158] Not found: Key densenet121/conv1/biases not found in checkpoint
2019-02-25 20:38:07.185359: W tensorflow/core/framework/op_kernel.cc:1158] Not found: Key densenet121/dense_block1/conv_block1/x1/Conv/biases not found in che ckpoint
2019-02-25 20:38:07.185769: W tensorflow/core/framework/op_kernel.cc:1158] Not found: Key densenet121/dense_block1/conv_block2/x2/Conv/biases not found in che ckpoint
[[Node: save_1/RestoreV2_23 = RestoreV2[dtypes=[DT_FLOAT], _device="/j ob:localhost/replica:0/task:0/cpu:0"](_arg_save_1/Const_0_0, save_1/RestoreV2_2 3/tensor_names, save_1/RestoreV2_23/shape_and_slices)]]
2019-02-25 20:38:07.186000: W tensorflow/core/framework/op_kernel.cc:1158] Not found: Key densenet121/dense_block1/conv_block2/x2/Conv/biases not found in che ckpoint
[[Node: save_1/RestoreV2_23 = RestoreV2[dtypes=[DT_FLOAT], _device="/j ob:localhost/replica:0/task:0/cpu:0"](_arg_save_1/Const_0_0, save_1/RestoreV2_2 3/tensor_names, save_1/RestoreV2_23/shape_and_slices)]]
2019-02-25 20:38:07.186009: W tensorflow/core/framework/op_kernel.cc:1158] Not found: Key densenet121/dense_block1/conv_block2/x2/Conv/biases not found in che ckpoint
[[Node: save_1/RestoreV2_23 = RestoreV2[dtypes=[DT_FLOAT], _device="/j ob:localhost/replica:0/task:0/cpu:0"](_arg_save_1/Const_0_0, save_1/RestoreV2_2 3/tensor_names, save_1/RestoreV2_23/shape_and_slices)]]
2019-02-25 20:38:07.186139: W tensorflow/core/framework/op_kernel.cc:1158] Not found: Key densenet121/dense_block1/conv_block2/x2/Conv/biases not found in che ckpoint
and I print pre-trained model parameters names, and actually I also can not find some parameters such as densenet121/dense_block1/conv_block2/x2/Conv/biases
I'm not very familiar with this code enough๏ผcan you help me๏ผ
Hello pudae:
I try densenet121 base on your code for FR project, but it seems that the Conv structure ( BN+ReLU+Conv) is not work for me.
I modify the Conv structure to Conv+BN+ReLU, the training is ok but the accuracy is lower.
So I try to modify the Conv structure to BN+ReLU+Conv+BN+ReLU, it seems that the training is ok and the accuracy is better than above two structure.
I am confused for that. Do you have any suggenstion for that? My part of densenet code is as below:
reduction=0.5
growth_rate=32
num_filters=64
compression = 1.0 - reduction
num_layers=[6,12,24,16]
num_dense_blocks = len(num_layers)
with slim.arg_scope([slim.conv2d, slim.fully_connected],
weights_initializer=slim.xavier_initializer_conv2d(uniform=True),
weights_regularizer=slim.l2_regularizer(weight_decay),
activation_fn=None,
biases_initializer=None):
with tf.variable_scope('densenet121', [images], reuse=reuse):
with slim.arg_scope([slim.batch_norm],
scale=True,
decay=0.99,
epsilon=1.1e-5), \
slim.arg_scope([slim.batch_norm, slim.dropout], is_training=phase_train), \
slim.arg_scope([_conv], dropout_rate=None):
# initial convolution
print ("input size: ", images.get_shape())
net = slim.conv2d(images, num_filters, 7, stride=2, scope='conv1')
print ("conv1 size: ", net.get_shape())
net = slim.batch_norm(net)
net = tf.nn.relu(net)
net = slim.max_pool2d(net, 3, stride=2, padding='SAME')
print ("max pool size: ", net.get_shape())
# blocks
for i in range(num_dense_blocks - 1):
# dense blocks
net, num_filters = _dense_block(net, num_layers[i], num_filters, growth_rate, scope='dense_block' + str(i+1))
print ("dense block %d size: %s" % (i, net.get_shape()))
# Add transition_block
net, num_filters = _transition_block(net, num_filters, compression=compression, scope='transition_block' + str(i+1))
print ("transition block %d size: %s" % (i, net.get_shape()))
net, num_filters = _dense_block(
net, num_layers[-1], num_filters,
growth_rate,
scope='dense_block' + str(num_dense_blocks))
print ("dense block %d size: %s" % (i+1, net.get_shape()))
# final blocks
with tf.variable_scope('final_block', [images]):
net = slim.batch_norm(net)
net = tf.nn.relu(net)
net = tf.reduce_mean(net, [1,2], name='global_avg_pool', keep_dims=False)
print ("global ave pooling size: %s" % (net.get_shape()))
net = slim.batch_norm(net)
net = tf.nn.relu(net)
net = slim.fully_connected(net, bottleneck_layer_size, activation_fn=None,
scope='logits', reuse=False)
print ("fully connection size: %s" % (net.get_shape()))
return net, None
Hi, thanks for your code.
I have two confusions about the pre-trained model.
Hello, thanks for your great work. I need your pretrained model for my own experiment. I'll appreciate it if you can give me the access. My google mail address: [email protected]
I am trying to use DensNet for regression problem with TF-Slim. My data contains 60000 jpeg images with 37 float labels for each image. I divided my data into three different tfrecords files of a train set (60%), a validation set (20%) and a test set (20%).
I need to evaluate validation set during training loop and make a plot like image.
In TF-Slim documentation they just explain train loop and evaluation loop separately. I can just evaluate validation or test set after training loop finished. While as I said I need to evaluate during training.
I tried to use slim.evaluation.evaluation_loop function instead of slim.evaluation.evaluate_once. But it doesn't help.
slim.evaluation.evaluation_loop(
master=FLAGS.master,
checkpoint_dir=checkpoint_path,
logdir=FLAGS.eval_dir,
num_evals=num_batches,
eval_op=list(names_to_updates.values()) + print_ops,
variables_to_restore=variables_to_restore,
summary_op = tf.summary.merge(summary_ops),
eval_interval_secs = eval_interval_secs )
I tried evaluation.evaluate_repeatedly as well.
from tensorflow.contrib.training.python.training import evaluation
evaluation.evaluate_repeatedly(
master=FLAGS.master,
checkpoint_dir=checkpoint_path,
eval_ops=list(names_to_updates.values()) + print_ops,
eval_interval_secs = eval_interval_secs )
In both of these functions, they just read the latest available checkpoint from checkpoint_dir and apparently waiting for the next one, however when the new checkpoints are generated, they don't perform at all.
I use Python 2.7.13 and Tensorflow 1.3.0 on CPU.
Any help will be highly appreciated.
Can you give the hand-by-hand instruction for evaluating and testing?
Pudae, apologies you mentioned this in another issue but I couldn't get it work.
If I create symbol like so:
densenet.densenet_arg_scope(data_format='NCHW')
I get
TypeError: densenet_arg_scope() got an unexpected keyword argument 'data_format'
If I do this:
dense_args = densenet.densenet_arg_scope()
dense_args['data_format'] = "NCHW" # This doesn't work!
print(dense_args)
with slim.arg_scope(dense_args):
base_model, _ = densenet.densenet121(in_tensor,
num_classes=out_features,
is_training=is_training)
Tensorflow just seems to ignore it and use NHWC
Hi, I want to ask how to determine the maximum number of training steps if fine-tune this pretrained densenet using my own dataset ?
Hi! Thank you for your great code! I got a excellent performance by using your pre-trained model.
But when I trained 121-densenet on ImageNet from scratch by using your network code, I got a very pool evaluation result. So, did you train the model from scratch using your code? Thanks~~~
Hi, thanks for your work, can you provide a DOI for citing your repository and DenseNet-121 pre-trained model on ImageNet? Thanks a lot
In tensorflow, we could use these commends to set the usage of gpu
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.9 # ๅ ็จGPU90%็ๆพๅญ
session = tf.Session(config=config)
If I use the package of slim, how to set the usage of gpu?
Hi, thanks for your code.
TensorFlow-Slim Models' page is not found
do you have any other link?
I am starting to learn tf and I am very much appreciate your work. I use your prediction code and run it successfully, however, the output is only the top-1 guess. How can I get top-5 guesses and the top-5 probabilities? I try to use tf.nn.top_k
and tf.nn.in_top_k
but I ran into problems that I cannot solve.
Hi @pudae ,
Do we have any options to control the number of threads in TF-Slim both in training and evaluation processes?
Specifically, I use this network for my classification problem. I changed the evaluation part in a way that runs train and evaluation in parallel like this code. I can run it on my own CPU without any problem. But I can't execute them on a supercomputer. It seems that it is related to the very large number of threads which are being created by Tensorflow. If the number of threads exceeds the maximum number of threads pre-set in SLURM (= 28) then the job will fail. Since it's unable to create new threads it will end up with error "resource temporarily unavailable".
This error provided when the code tries to restore parameters from checkpoints. If there is no limitation on the number of threads (like on my pc) it works fine:
INFO:tensorflow:Restoring parameters from ./model.ckpt-0
INFO:tensorflow:Starting evaluation at
I tensorflow/core/kernels/logging_ops.cc:79] eval/Accuracy[0]
I tensorflow/core/kernels/logging_ops.cc:79] eval/Recall_5[0]
INFO:tensorflow:Evaluation [1/60]
However, when there is a limitation on the number of threads (like SLURM job submission on supercomputers) we get:
INFO:tensorflow:Restoring parameters from ./model.ckpt-0
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
I tried to limit the number of CPU threads used by Tensorflow to 1 by creating config like:
FLAGS.num_preprocessing_threads=1
config = tf.ConfigProto()
config.intra_op_parallelism_threads = FLAGS.num_preprocessing_threads
config.inter_op_parallelism_threads = FLAGS.num_preprocessing_threads
slim.evaluation.evaluation_loop(
master=FLAGS.master,
checkpoint_path=each_ckpt,
logdir=FLAGS.eval_dir,
num_evals=num_batches,
eval_op=list(names_to_updates.values()) + print_ops,
variables_to_restore=variables_to_restore,
session_config=config)
But unfortunately, that didn't help. In my opinion, the main problem we are having here is the fact that we are not able to control the number of threads here. Although we set it to 1 with various TF options you can actually see that this job is creating many more threads on the node:
slurm_scriptโโฌโpythonโโโ128*[{python}]
โโpythonโโโ8*[{python}]
Training script is creating 128 threads and evaluation script is creating 8 (both numbers vary over time).
Any idea on the way to control the thread numbers will be highly appreciated because I do need to fix this issue urgently.
Ellie
P.S. I'm using Python 2.7.13 and Tensorflow 1.3.0.
Training a model from scratch.
If i want to use 8 GPUs to train a model๏ผa server๏ผ๏ผhow to configure these flags's value?
flags:
num_clones
worker_replicas
num_ps_tasks
hi,
when I restore your pre_trained model, there raise some mistake
NotFoundError (see above for trackback): Key densenet121/dense_block1/conv_block2/*1/Conv/biases not found in the checkpoint
here is my code
x=tf.placeholder("float",[None,224,224,3])
y=tf.placeholder("float",[None,num_classes])
logits, collection= densenet.densenet121(x,num_classes=num_classes)
predictions = collection['predictions']
loss=-tf.reduce_sum(y*tf.log(predictions))
optimizer=tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(loss)
accuracy=tf.reduce_mean(tf.cast(tf.equal(tf.argmax(predictions, 1), tf.argmax(y, 1)),"float"))
saver=tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess,"tf-densenet121.ckpt")
Hi @pudae,
Do you have any idea how can I run "eval_image_classifier.py" on GPUs? Should I change any functions or do any modifications? Or whether there exist any other specific functions for evaluation on GPUs?
I can already run "train_image_classifier.py" on GPUs because of having the associated flag for switching between CPU and GPU:
tf.app.flags.DEFINE_boolean('clone_on_cpu', False,
'Use CPUs to deploy clones.')
I did try to add the same line to eval_image_classifier.py
, but it had no effect. I'm using Python 2.7.13 and Tensorflow 1.3.0 .
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
import tensorflow as tf
from deployment import model_deploy
from datasets import dataset_factory
from nets import nets_factory
from preprocessing import preprocessing_factory
slim = tf.contrib.slim
tf.app.flags.DEFINE_integer(
'batch_size', 32, 'The number of samples in each batch.')
tf.app.flags.DEFINE_integer(
'max_num_batches', None,
'Max number of batches to evaluate by default use all.')
tf.app.flags.DEFINE_string(
'master', '', 'The address of the TensorFlow master to use.')
tf.app.flags.DEFINE_string(
'checkpoint_path', '...',
'The directory where the model was written to or an absolute path to a '
'checkpoint file.')
tf.app.flags.DEFINE_string(
'eval_dir', '...',
'Directory where the results are saved to.')
tf.app.flags.DEFINE_integer('num_clones', 1,
'Number of model clones to deploy.')
tf.app.flags.DEFINE_boolean('clone_on_cpu', False,
'Use CPUs to deploy clones.')
tf.app.flags.DEFINE_integer('worker_replicas', 1, 'Number of worker replicas.')
tf.app.flags.DEFINE_integer(
'num_readers', 4,
'The number of parallel readers that read data from the dataset.')
tf.app.flags.DEFINE_integer(
'num_ps_tasks', 0,
'The number of parameter servers. If the value is 0, then the parameters '
'are handled locally by the worker.')
tf.app.flags.DEFINE_integer(
'num_preprocessing_threads', 4,
'The number of threads used to create the batches.')
tf.app.flags.DEFINE_string(
'dataset_name', '...', 'The name of the dataset to load.')
tf.app.flags.DEFINE_string(
'dataset_split_name', 'validation', 'The name of the train/test split.')
tf.app.flags.DEFINE_string(
'dataset_dir', '...',
'The directory where the dataset files are stored.')
tf.app.flags.DEFINE_integer(
'labels_offset', 0,
'An offset for the labels in the dataset. This flag is primarily used to '
'evaluate the VGG and ResNet architectures which do not use a background '
'class for the ImageNet dataset.')
tf.app.flags.DEFINE_string(
'model_name', 'densenet161', 'The name of the architecture to evaluate.')
tf.app.flags.DEFINE_string(
'preprocessing_name', None, 'The name of the preprocessing to use. If left '
'as `None`, then the model_name flag is used.')
tf.app.flags.DEFINE_float(
'moving_average_decay', None,
'The decay to use for the moving average.'
'If left as None, then moving averages are not used.')
tf.app.flags.DEFINE_integer(
'eval_image_size', None, 'Eval image size')
FLAGS = tf.app.flags.FLAGS
def main(_):
if not FLAGS.dataset_dir:
raise ValueError('You must supply the dataset directory with --dataset_dir')
#######################
# Config model_deploy #
#######################
tf.logging.set_verbosity(tf.logging.INFO)
with tf.Graph().as_default():
deploy_config = model_deploy.DeploymentConfig(
num_clones=FLAGS.num_clones,
clone_on_cpu=FLAGS.clone_on_cpu,
#replica_id=FLAGS.task,
num_replicas=FLAGS.worker_replicas,
num_ps_tasks=FLAGS.num_ps_tasks)
# Create global_step
with tf.device(deploy_config.variables_device()):
tf_global_step = slim.create_global_step()
######################
# Select the dataset #
######################
dataset = dataset_factory.get_dataset(
FLAGS.dataset_name, FLAGS.dataset_split_name, FLAGS.dataset_dir)
####################
# Select the model #
####################
network_fn = nets_factory.get_network_fn(
FLAGS.model_name,
num_classes=(dataset.num_classes - FLAGS.labels_offset),
is_training=False)
##############################################################
# Create a dataset provider that loads data from the dataset #
##############################################################
with tf.device(deploy_config.inputs_device()):
provider = slim.dataset_data_provider.DatasetDataProvider(
dataset,
num_readers=FLAGS.num_readers,
shuffle=False,
common_queue_capacity=2 * FLAGS.batch_size,
common_queue_min=FLAGS.batch_size)
[image, label] = provider.get(['image', 'label'])
label -= FLAGS.labels_offset
#####################################
# Select the preprocessing function #
#####################################
preprocessing_name = FLAGS.preprocessing_name or FLAGS.model_name
image_preprocessing_fn = preprocessing_factory.get_preprocessing(
preprocessing_name,
is_training=False)
eval_image_size = FLAGS.eval_image_size or network_fn.default_image_size
image = image_preprocessing_fn(image, eval_image_size, eval_image_size)
images, labels = tf.train.batch(
[image, label],
batch_size=FLAGS.batch_size,
num_threads=FLAGS.num_preprocessing_threads,
capacity=5 * FLAGS.batch_size)
batch_queue = slim.prefetch_queue.prefetch_queue(
[images, labels], capacity=2 * deploy_config.num_clones)
####################
# Define the model #
####################
def clone_fn(batch_queue):
"""Allows data parallelism by creating multiple clones of network_fn."""
with tf.device(deploy_config.inputs_device()):
images, labels = batch_queue.dequeue()
logits, end_points = network_fn(images)
logits = tf.squeeze(logits)
#############################
# Specify the loss function #
#############################
if 'AuxLogits' in end_points:
tf.losses.mean_squared_error(
predictions=end_points['AuxLogits'], labels=labels, weights=0.4, scope='aux_loss')
tf.losses.mean_squared_error(
predictions=logits, labels=labels, weights=1.0)
return end_points
#clones = model_deploy.create_clones(deploy_config, clone_fn, [batch_queue])
#first_clone_scope = deploy_config.clone_scope(0)
####################
# Define the model #
####################
logits, _ = network_fn(images)
if FLAGS.moving_average_decay:
variable_averages = tf.train.ExponentialMovingAverage(
FLAGS.moving_average_decay, tf_global_step)
variables_to_restore = variable_averages.variables_to_restore(
slim.get_model_variables())
variables_to_restore[tf_global_step.op.name] = tf_global_step
else:
variables_to_restore = slim.get_variables_to_restore()
logits = tf.squeeze(logits)
# Define the metrics:
predictions = logits
names_to_values, names_to_updates = slim.metrics.aggregate_metric_map({
'Accuracy': tf.metrics.root_mean_squared_error(predictions, labels),
'Recall_5': slim.metrics.streaming_recall(
logits, labels),
})
# Print the summaries to screen.
print_ops = []
summary_ops = []
for name, value in names_to_values.items():
summary_name = 'eval/%s' % name
op = tf.summary.scalar(summary_name, value, collections=[])
op = tf.Print(op, [value], summary_name)
summary_ops.append(op)
print_ops.append(tf.Print(value, [value], summary_name))
tf.add_to_collection(tf.GraphKeys.SUMMARIES, op)
# TODO(sguada) use num_epochs=1
if FLAGS.max_num_batches:
num_batches = FLAGS.max_num_batches
else:
# This ensures that we make a single pass over all of the data.
num_batches = math.ceil(dataset.num_samples / float(FLAGS.batch_size))
if tf.gfile.IsDirectory(FLAGS.checkpoint_path):
if tf.train.latest_checkpoint(FLAGS.checkpoint_path):
checkpoint_path = tf.train.latest_checkpoint(FLAGS.checkpoint_path)
else:
checkpoint_path = FLAGS.checkpoint_path
eval_interval_secs = 6
tf.logging.info('Evaluating %s' % checkpoint_path)
slim.evaluation.evaluation_loop(
master=FLAGS.master,
checkpoint_dir=checkpoint_path,
logdir=FLAGS.eval_dir,
num_evals=num_batches,
eval_op=list(names_to_updates.values()) + print_ops,
variables_to_restore=variables_to_restore,
eval_interval_secs = eval_interval_secs )
if __name__ == '__main__':
tf.app.run()
I tried to use some code like Tensorflow tutorial as well:
# Creates a graph.
with tf.device('/gpu:2'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with allow_soft_placement and log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(
allow_soft_placement=True, log_device_placement=True))
# Runs the op.
print(sess.run(c))
I modified the code in this way:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
import tensorflow as tf
from datasets import dataset_factory
from nets import nets_factory
from preprocessing import preprocessing_factory
slim = tf.contrib.slim
tf.app.flags.DEFINE_integer(
'batch_size', 32, 'The number of samples in each batch.')
tf.app.flags.DEFINE_integer(
'max_num_batches', None,
'Max number of batches to evaluate by default use all.')
tf.app.flags.DEFINE_string(
'master', '', 'The address of the TensorFlow master to use.')
tf.app.flags.DEFINE_string(
'checkpoint_path', '...',
'The directory where the model was written to or an absolute path to a '
'checkpoint file.')
tf.app.flags.DEFINE_string(
'eval_dir', '...',
'Directory where the results are saved to.')
tf.app.flags.DEFINE_integer(
'num_preprocessing_threads', 4,
'The number of threads used to create the batches.')
tf.app.flags.DEFINE_string(
'dataset_name', '...', 'The name of the dataset to load.')
tf.app.flags.DEFINE_string(
'dataset_split_name', 'validation', 'The name of the train/test split.')
tf.app.flags.DEFINE_string(
'dataset_dir', '...',
'The directory where the dataset files are stored.')
tf.app.flags.DEFINE_integer(
'labels_offset', 0,
'An offset for the labels in the dataset. This flag is primarily used to '
'evaluate the VGG and ResNet architectures which do not use a background '
'class for the ImageNet dataset.')
tf.app.flags.DEFINE_string(
'model_name', 'densenet161', 'The name of the architecture to evaluate.')
tf.app.flags.DEFINE_string(
'preprocessing_name', None, 'The name of the preprocessing to use. If left '
'as `None`, then the model_name flag is used.')
tf.app.flags.DEFINE_float(
'moving_average_decay', None,
'The decay to use for the moving average.'
'If left as None, then moving averages are not used.')
tf.app.flags.DEFINE_integer(
'eval_image_size', None, 'Eval image size')
FLAGS = tf.app.flags.FLAGS
# Initialize all global and local variables
init = tf.group(tf.global_variables_initializer(),
tf.local_variables_initializer())
def main(_):
if not FLAGS.dataset_dir:
raise ValueError('You must supply the dataset directory with --dataset_dir')
tf.logging.set_verbosity(tf.logging.INFO)
sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True,
log_device_placement=True))
with tf.Graph().as_default(), tf.device('/gpu:0'):
sess.run(init)
tf_global_step = slim.get_or_create_global_step()
######################
# Select the dataset #
######################
dataset = dataset_factory.get_dataset(
FLAGS.dataset_name, FLAGS.dataset_split_name, FLAGS.dataset_dir)
####################
# Select the model #
####################
network_fn = nets_factory.get_network_fn(
FLAGS.model_name,
num_classes=(dataset.num_classes - FLAGS.labels_offset),
is_training=False)
##############################################################
# Create a dataset provider that loads data from the dataset #
##############################################################
provider = slim.dataset_data_provider.DatasetDataProvider(
dataset,
shuffle=False,
common_queue_capacity=2 * FLAGS.batch_size,
common_queue_min=FLAGS.batch_size)
[image, label] = provider.get(['image', 'label'])
label -= FLAGS.labels_offset
#####################################
# Select the preprocessing function #
#####################################
preprocessing_name = FLAGS.preprocessing_name or FLAGS.model_name
image_preprocessing_fn = preprocessing_factory.get_preprocessing(
preprocessing_name,
is_training=False)
eval_image_size = FLAGS.eval_image_size or network_fn.default_image_size
image = image_preprocessing_fn(image, eval_image_size, eval_image_size)
images, labels = tf.train.batch(
[image, label],
batch_size=FLAGS.batch_size,
num_threads=FLAGS.num_preprocessing_threads,
capacity=5 * FLAGS.batch_size)
####################
# Define the model #
####################
logits, _ = network_fn(images)
if FLAGS.moving_average_decay:
variable_averages = tf.train.ExponentialMovingAverage(
FLAGS.moving_average_decay, tf_global_step)
variables_to_restore = variable_averages.variables_to_restore(
slim.get_model_variables())
variables_to_restore[tf_global_step.op.name] = tf_global_step
else:
variables_to_restore = slim.get_variables_to_restore()
logits = tf.squeeze(logits)
# Define the metrics:
predictions = logits
names_to_values, names_to_updates = slim.metrics.aggregate_metric_map({
'Accuracy': tf.metrics.root_mean_squared_error(predictions, labels),
'Recall_5': slim.metrics.streaming_recall(
logits, labels),
})
# Print the summaries to screen.
print_ops = []
summary_ops = []
for name, value in names_to_values.items():
summary_name = 'eval/%s' % name
op = tf.summary.scalar(summary_name, value, collections=[])
op = tf.Print(op, [value], summary_name)
summary_ops.append(op)
print_ops.append(tf.Print(value, [value], summary_name))
tf.add_to_collection(tf.GraphKeys.SUMMARIES, op)
# TODO(sguada) use num_epochs=1
if FLAGS.max_num_batches:
num_batches = FLAGS.max_num_batches
else:
# This ensures that we make a single pass over all of the data.
num_batches = math.ceil(dataset.num_samples / float(FLAGS.batch_size))
if tf.gfile.IsDirectory(FLAGS.checkpoint_path):
if tf.train.latest_checkpoint(FLAGS.checkpoint_path):
checkpoint_path = tf.train.latest_checkpoint(FLAGS.checkpoint_path)
else:
checkpoint_path = FLAGS.checkpoint_path
#print(checkpoint_path)
eval_interval_secs = 6
tf.logging.info('Evaluating %s' % checkpoint_path)
slim.evaluation.evaluation_loop(
master=FLAGS.master,
checkpoint_dir=checkpoint_path,
logdir=FLAGS.eval_dir,
num_evals=num_batches,
eval_op=list(names_to_updates.values()) + print_ops,
variables_to_restore=variables_to_restore,
eval_interval_secs = eval_interval_secs )
if __name__ == '__main__':
tf.app.run()
When I run this code, I face this error:
Traceback (most recent call last):
File "/home/zgholami/test1/GZ_Project/GZ_DenseNet_TF-slim/eval_image_classifier.py", line 210, in <module>
tf.app.run()
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/zgholami/test1/GZ_Project/GZ_DenseNet_TF-slim/eval_image_classifier.py", line 206, in main
eval_interval_secs = 60 )
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/contrib/slim/python/slim/evaluation.py", line 296, in evaluation_loo
p
timeout=timeout)
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/contrib/training/python/training/evaluation.py", line 447, in evalua
te_repeatedly
session_creator=session_creator, hooks=hooks) as session:
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 668, in __init__
stop_grace_period_secs=stop_grace_period_secs)
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 490, in __init__
self._sess = _RecoverableSession(self._coordinated_creator)
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 842, in __init__
_WrappedSession.__init__(self, self._create_session())
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 847, in _create_session
return self._sess_creator.create_session()
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 551, in create_session
self.tf_sess = self._session_creator.create_session()
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 425, in create_session
init_fn=self._scaffold.init_fn)
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 273, in prepare_session
config=config)
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 189, in _restore_checkpoin
t
saver.restore(sess, checkpoint_filename_with_path)
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1560, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run
options, run_metadata)
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'eval_step': Could not satisfy explicit device s
pecification '/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices:
Const: GPU CPU
AssignAdd: CPU
VariableV2: CPU
Identity: GPU CPU
Assign: CPU
IsVariableInitialized: CPU
[[Node: eval_step = VariableV2[_class=["loc:@eval_step"], container="", dtype=DT_INT64, shape=[], shared_name="", _device="/device:GPU:0"]
()]]
Caused by op u'eval_step', defined at:
File "/home/zgholami/test1/GZ_Project/GZ_DenseNet_TF-slim/eval_image_classifier.py", line 210, in <module>
tf.app.run()
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/zgholami/test1/GZ_Project/GZ_DenseNet_TF-slim/eval_image_classifier.py", line 206, in main
eval_interval_secs = 60 )
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/contrib/slim/python/slim/evaluation.py", line 296, in evaluation_loo
p
timeout=timeout)
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/contrib/training/python/training/evaluation.py", line 410, in evalua
te_repeatedly
eval_step = get_or_create_eval_step()
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/training/evaluation.py", line 57, in _get_or_create_eval_step
collections=[ops.GraphKeys.LOCAL_VARIABLES, ops.GraphKeys.EVAL_STEP])
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1065, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 962, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 367, in get_variable
validate_shape=validate_shape, use_resource=use_resource)
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 352, in _true_getter
use_resource=use_resource)
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 725, in _get_single_variable
validate_shape=validate_shape)
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 199, in __init__
expected_shape=expected_shape)
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 283, in _init_from_args
name=name)
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/ops/state_ops.py", line 131, in variable_op_v2
shared_name=shared_name)
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 682, in _variable_v2
name=name)
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/group/pawsey0245/zgholami/pyml/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'eval_step': Could not satisfy explicit device specification '
/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices:
Const: GPU CPU
AssignAdd: CPU
VariableV2: CPU
Identity: GPU CPU
Assign: CPU
IsVariableInitialized: CPU
[[Node: eval_step = VariableV2[_class=["loc:@eval_step"], container="", dtype=DT_INT64, shape=[], shared_name="", _device="/device:GPU:0"]
()]]
ERROR:tensorflow:==================================
Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>):
<tf.Tensor 'report_uninitialized_variables_1/boolean_mask/Gather:0' shape=(?,) dtype=string>
If you want to mark it as used call its "mark_used()" method.
Hi, I was curious why it's the case for this implementation (and also slim's resnet_v1) that fine-tuning a model with is_training=True and then running inference with is_training=True gives better results than inference with is_training=False?
I searched and this has been mentioned here: tensorflow/models#1288 & tensorflow/models#2138 (comment) & tensorflow/models#391
And the slim walkthrough does the same thing in the last cell (inference with is_training=True).
Hi Pudae, thanks a lot for this super-neat implementation. I am trying to load the model (without the pre-processing functions, etc) so that I can chop off the last fc layer and stick my own (to re-train on my data-set). However, I was having a bit of an issue loading the model without any of the functions attached:
def create_symbol(chkpt_dir=CHKPT_DIR):
with tf.Graph().as_default():
slim.get_or_create_global_step()
# Not possible to have channels first?
X = tf.placeholder(tf.float32,
shape=(None, 224, 224, 3))
logits, endpoints = densenet.densenet121(
X,
num_classes=1000,
is_training=True,
reuse=None)
print(logits)
print(endpoints)
variables_to_restore = slim.get_variables_to_restore()
sess = tf.Session()
saver = tf.train.Saver(variables_to_restore)
init_op = tf.group(
tf.global_variables_initializer(),
tf.local_variables_initializer())
checkpoint_path = os.path.join(chkpt_dir, 'tf-densenet121.ckpt')
sess.run(init_op)
saver.restore(sess, checkpoint_path)
return endpoints, sess
sym, sess = create_symbol()
Seems to complain about finding biases:
--------------------------------------------------------------------------
NotFoundError Traceback (most recent call last)
/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1322 try:
-> 1323 return fn(*args)
1324 except errors.OpError as e:
/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
1301 feed_dict, fetch_list, target_list,
-> 1302 status, run_metadata)
1303
/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
472 compat.as_text(c_api.TF_Message(self.status.status)),
--> 473 c_api.TF_GetCode(self.status.status))
474 # Delete the underlying status object from memory otherwise it stays alive
NotFoundError: Key densenet121/dense_block2/conv_block7/x1/Conv/biases not found in checkpoint
[[Node: save/RestoreV2_158 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_158/tensor_names, save/RestoreV2_158/shape_and_slices)]]
[[Node: save/RestoreV2_16/_1181 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_2392_save/RestoreV2_16", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
During handling of the above exception, another exception occurred:
NotFoundError Traceback (most recent call last)
<ipython-input-5-9d5aca7f5637> in <module>()
----> 1 sym, sess = create_symbol()
<ipython-input-4-e283472f7a13> in create_symbol(chkpt_dir)
25 checkpoint_path = os.path.join(chkpt_dir, 'tf-densenet121.ckpt')
26 sess.run(init_op)
---> 27 saver.restore(sess, checkpoint_path)
28 return network_fn, sess
/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/training/saver.py in restore(self, sess, save_path)
1664 if context.in_graph_mode():
1665 sess.run(self.saver_def.restore_op_name,
-> 1666 {self.saver_def.filename_tensor_name: save_path})
1667 else:
1668 self._build_eager(save_path, build_save=False, build_restore=True)
/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
887 try:
888 result = self._run(None, fetches, feed_dict, options_ptr,
--> 889 run_metadata_ptr)
890 if run_metadata:
891 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1118 if final_fetches or final_targets or (handle and feed_dict_tensor):
1119 results = self._do_run(handle, final_targets, final_fetches,
-> 1120 feed_dict_tensor, options, run_metadata)
1121 else:
1122 results = []
/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1315 if handle is None:
1316 return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1317 options, run_metadata)
1318 else:
1319 return self._do_call(_prun_fn, self._session, handle, feeds, fetches)
/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1334 except KeyError:
1335 pass
-> 1336 raise type(e)(node_def, op, message)
1337
1338 def _extend_graph(self):
NotFoundError: Key densenet121/dense_block2/conv_block7/x1/Conv/biases not found in checkpoint
[[Node: save/RestoreV2_158 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_158/tensor_names, save/RestoreV2_158/shape_and_slices)]]
[[Node: save/RestoreV2_16/_1181 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_2392_save/RestoreV2_16", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
Caused by op 'save/RestoreV2_158', defined at:
File "/anaconda/envs/py35/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"__main__", mod_spec)
File "/anaconda/envs/py35/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/anaconda/envs/py35/lib/python3.5/site-packages/ipykernel_launcher.py", line 16, in <module>
app.launch_new_instance()
File "/anaconda/envs/py35/lib/python3.5/site-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/anaconda/envs/py35/lib/python3.5/site-packages/ipykernel/kernelapp.py", line 486, in start
self.io_loop.start()
File "/anaconda/envs/py35/lib/python3.5/site-packages/tornado/platform/asyncio.py", line 112, in start
self.asyncio_loop.run_forever()
File "/anaconda/envs/py35/lib/python3.5/asyncio/base_events.py", line 345, in run_forever
self._run_once()
File "/anaconda/envs/py35/lib/python3.5/asyncio/base_events.py", line 1312, in _run_once
handle._run()
File "/anaconda/envs/py35/lib/python3.5/asyncio/events.py", line 125, in _run
self._callback(*self._args)
File "/anaconda/envs/py35/lib/python3.5/site-packages/tornado/ioloop.py", line 760, in _run_callback
ret = callback()
File "/anaconda/envs/py35/lib/python3.5/site-packages/tornado/stack_context.py", line 276, in null_wrapper
return fn(*args, **kwargs)
File "/anaconda/envs/py35/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 536, in <lambda>
self.io_loop.add_callback(lambda : self._handle_events(self.socket, 0))
File "/anaconda/envs/py35/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 450, in _handle_events
self._handle_recv()
File "/anaconda/envs/py35/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 480, in _handle_recv
self._run_callback(callback, msg)
File "/anaconda/envs/py35/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 432, in _run_callback
callback(*args, **kwargs)
File "/anaconda/envs/py35/lib/python3.5/site-packages/tornado/stack_context.py", line 276, in null_wrapper
return fn(*args, **kwargs)
File "/anaconda/envs/py35/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
return self.dispatch_shell(stream, msg)
File "/anaconda/envs/py35/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
handler(stream, idents, msg)
File "/anaconda/envs/py35/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
user_expressions, allow_stdin)
File "/anaconda/envs/py35/lib/python3.5/site-packages/ipykernel/ipkernel.py", line 208, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/anaconda/envs/py35/lib/python3.5/site-packages/ipykernel/zmqshell.py", line 537, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/anaconda/envs/py35/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2728, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/anaconda/envs/py35/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2850, in run_ast_nodes
if self.run_code(code, result):
File "/anaconda/envs/py35/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-5-9d5aca7f5637>", line 1, in <module>
sym, sess = create_symbol()
File "<ipython-input-4-e283472f7a13>", line 19, in create_symbol
saver = tf.train.Saver(variables_to_restore)
File "/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1218, in __init__
self.build()
File "/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1227, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1263, in _build
build_save=build_save, build_restore=build_restore)
File "/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 751, in _build_internal
restore_sequentially, reshape)
File "/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 427, in _AddRestoreOps
tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
File "/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 267, in restore_op
[spec.tensor.dtype])[0])
File "/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1021, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
NotFoundError (see above for traceback): Key densenet121/dense_block2/conv_block7/x1/Conv/biases not found in checkpoint
[[Node: save/RestoreV2_158 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_158/tensor_names, save/RestoreV2_158/shape_and_slices)]]
[[Node: save/RestoreV2_16/_1181 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_2392_save/RestoreV2_16", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
However, the end-points seem ok:
('densenet121/dense_block4', <tf.Tensor 'densenet121/dense_block4/conv_block16/concat:0' shape=(?, 7, 7, 1024) dtype=float32>), ('densenet121/logits', <tf.Tensor 'densenet121/logits/Relu:0' shape=(?, 1, 1, 1000) dtype=float32>), ('predictions', <tf.Tensor 'densenet121/predictions/Reshape_1:0' shape=(?, 1, 1, 1000) dtype=float32>)])
So that I can just extract: 'densenet121/dense_block4/conv_block16/concat:0' and add my own:
('densenet121/logits', <tf.Tensor 'densenet121/logits/Relu:0' shape=(?, 1, 1, 16) dtype=float32>), ('predictions', <tf.Tensor 'densenet121/predictions/Reshape_1:0' shape=(?, 1, 1, 16) dtype=float32>)])
Also was curious if there was a reason that shape is channels-last, since I thought channels-first is faster for cuDNN training?
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.