Comments (32)
You are welcome. :)
from tensorflow-vgg.
I am using r0.10. It seems that it can find the key[name][0] in your data_dict. Have you downloaded the vgg16.npy file and put it in your tensorflow-vgg-master directory?
from tensorflow-vgg.
@machrisaa Thanks for your reply. I tried the vgg16.npy
file you provide and now it works! I wonder whether I can train the model myself instead of used the pretrained one. Do you have any idea about it?
from tensorflow-vgg.
I haven't tried train the VGG from scratch. But I have re-trained the VGG19 to do other tasks by changing the last 2 fully connected layers. You can try modify the vgg19_trainable.py
to vgg16 to do that.
from tensorflow-vgg.
@machrisaa Thanks for you reply. Could you please tell me what the comment on this line of code means? I am not quite understand how do you calculate the number of incoming units as 25088. Thanks a lot!
from tensorflow-vgg.
Hi rylanchiu, basically 25088 = ((224 / (2 ** 5)) ** 2) * 512
means:
224
is the input size
/ (2**5)
because there are 5 times reduced the size by 1/2 by the max_pool
** 2
because it is a square and it is the width * height
* 512
because 512 is the number of channels in the self.pool
Therefore, 25088 = ((224 / (2 ** 5)) ** 2) * 512
is the total number of elements in the Tensor self.pool5
from tensorflow-vgg.
@machrisaa Thanks! Now I know all the parts except for the 5. Why there are 5 times reduced the size instead of 4? I am a newbie to CNN, please forgive for my stupid question. Thanks a lot!
from tensorflow-vgg.
Oh I see. There are 5 pooling layers! Thank you very much @machrisaa !
from tensorflow-vgg.
@rylanchiu no problem :)
from tensorflow-vgg.
hi @machrisaa , sorry to disturb you again. Now I tried to run a image classifier on Imagenet. I just pick 5 classes but in the first 10 iteration, it doesn't converge at all, and the cross entropy just shrinks around 1.609. I wonder is it normal? And do you have any idea about it?
from tensorflow-vgg.
I think 10 iterations are far too small. Usually, during a proper training for few epochs, i.e. may be more than 100k iterations, the cost does not always decrease because the network needs to adopt the characteristics of different images and it takes lots of time.
If you would like do some experiments to see the effect of the gradient descent, you may try to train a single image repeatedly. For example, train an image of a tiger repeatedly and you can see the probability of the classification of being a tiger increase.
from tensorflow-vgg.
@machrisaa Thanks for you reply. I did what you told and find all the output is in the form like[1, 0, 0, 0, 0]
, i.e., predicting one(any of them) class as 1 and the others are 0. Do you know what's the reason. Have you trained a model from scratch with vgg19_trainable.py
? Thanks!
from tensorflow-vgg.
Apart from using the pre-trained data to train another model, I haven't tried to train the network from scratch. There are some rumours suggest that it is quite hard to train the VGG from scratch to get the pre-trained result. Perhaps the reason for this is because it highly depends on a good initialisation and a good set of hyperparameters. Unlike some more advanced networks, like ResNet, it does not have batch normalisation to minimise the effect of initialisation and hyperparameters. So most likely that's the reason why it is hard to train.
On the other hand, the simplicity of the VGG is why people choose it to perform tasks other than classification such as style synthesis, super-resolutions, etc. In these case, the VGG is used as a cost function because of its good ability to extract the features of an image.
In your case, my suggestion is that try to remove the last fc6
, fc7
, and fc8
layers and define 1 or 2 fully-connected layers with the size of the final outcome is 5. For example like this:
self.fc6_custom = self.fc_layer(self.pool5, 25088, 1024, "fc6_custom")
self.relu6 = tf.nn.relu(self.fc6_custom)
self.fc7_custom = self.fc_layer(self.relu6, 1024, 5, "fc7_custom")
self.prob = tf.nn.softmax(self.fc7_custom, name="prob")
And use the original pre-trained data as your initialisation, i.e. load the original npy data. In this case, because there are no variables named beginning with fc6_custom
and fc7_custom
, new variables will be initialised in the get_var
method. And the original variables in the conv
layers will be preserved.
This is the method I used to retrain the VGG to analysis sound data by converting sound clip into stereogram.
from tensorflow-vgg.
@machrisaa But can I add batch normalization myself? I have done that but seems useless...
from tensorflow-vgg.
I haven't tried before. Would you like to share how you add the batch normalisation?
from tensorflow-vgg.
@machrisaa Sure.
Here is the bn
function:
def batch_norm(self, x, name):
"""Batch normalization."""
with tf.variable_scope(name):
params_shape = [x.get_shape()[-1]]
beta = tf.get_variable('beta', params_shape, tf.float32,
initializer=tf.constant_initializer(0.0, tf.float32))
gamma = tf.get_variable('gamma', params_shape, tf.float32,
initializer=tf.constant_initializer(1.0, tf.float32))
mean, variance = tf.nn.moments(x, [0, 1, 2], name='moments')
output = tf.nn.batch_normalization(x, mean, variance, beta, gamma, 0.001)
output.set_shape(x.get_shape())
return output
And here is your conv_layer
uses bn:
def conv_layer(self, input_, in_channels, out_channels, name):
fil_shape = [3, 3, in_channels, out_channels]
with tf.variable_scope(name):
if self.norm_mode == 'bn':
input_ = self.batch_norm(input_, name)
filters = tf.get_variable(name+'_filters',
fil_shape,
dtype=tf.float32,
initializer=tf.random_normal_initializer(mean=0.0, stddev=0.001))#self.get_conv_var(3, in_channels, out_channels, name)
b = tf.get_variable(name+'_bias', [out_channels])
conv = tf.nn.conv2d(input_, filters, [1, 1, 1, 1], padding='SAME')
bias = tf.nn.bias_add(conv, b)
output = tf.nn.relu(bias)
return output
from tensorflow-vgg.
I am afraid your batch normalisation is not entirely correct. You can try to use the batch_norm
from tensorflow.contrib.layers
. And I usually will put the batch_norm
after several blocks of conv
layers, e.g. after conv1_2
, conv2_2
, etc.
But looks like it is not the problem. I have modified the test without using any pre-trained data:
img1 = utils.load_image("./test_data/tiger.jpeg")
img1_true_result = [1, 0, 0, 0, 0] # 1-hot result
batch1 = img1.reshape((1, 224, 224, 3))
with tf.device('/cpu:0'):
sess = tf.Session()
images = tf.placeholder(tf.float32, [1, 224, 224, 3])
true_out = tf.placeholder(tf.float32, [1, 5])
train_mode = tf.placeholder(tf.bool)
# no pre-trained data used:
vgg = vgg19.Vgg19(None)
vgg.build(images, train_mode)
sess.run(tf.initialize_all_variables())
# test classification
prob = sess.run(vgg.prob, feed_dict={images: batch1, train_mode: False})
print 'before train:', prob
cost = tf.reduce_sum((vgg.prob - true_out) ** 2)
train = tf.train.GradientDescentOptimizer(0.0001).minimize(cost)
for i in xrange(100000):
sess.run(train, feed_dict={images: batch1, true_out: [img1_true_result], train_mode: True})
# test classification again, should have a higher probability about tiger
prob = sess.run(vgg.prob, feed_dict={images: batch1, train_mode: False})
print 'step %d:' % i, prob
Also, modify the VGG to return a result of size equals 5:
self.fc8 = self.fc_layer(self.relu7, 4096, 5, "fc8")
And this is the result I got:
before train: [[ 0.2002476 0.1999044 0.20009962 0.19998285 0.19976552]]
step 0: [[ 0.20025404 0.19990279 0.20009799 0.19998124 0.19976392]]
step 1: [[ 0.20026048 0.19990119 0.2000964 0.19997966 0.19976233]]
step 2: [[ 0.20026691 0.1998996 0.20009479 0.19997805 0.19976074]]
step 3: [[ 0.20027333 0.19989797 0.20009315 0.19997641 0.19975911]]
step 4: [[ 0.20027977 0.19989637 0.20009156 0.19997482 0.19975752]]
step 5: [[ 0.20028618 0.19989474 0.20008992 0.1999732 0.19975589]]
step 6: [[ 0.20029263 0.19989315 0.20008834 0.19997162 0.19975431]]
step 7: [[ 0.20029905 0.19989154 0.20008671 0.19996999 0.19975269]]
step 8: [[ 0.20030549 0.19988994 0.2000851 0.19996837 0.19975108]]
step 9: [[ 0.20031191 0.19988832 0.20008348 0.19996676 0.19974948]]
step 10: [[ 0.20031837 0.19988672 0.20008188 0.19996516 0.19974789]]
You should get non-integral result instead of [1,0,0,0,0]
. And in this example, you should see the first value is increasing in each step.
from tensorflow-vgg.
@machrisaa Could you please tell me what have I done wrong with batch normalization? Indeed I can see the error reduce when feeding only one image. But when the amount is about 10,000, it stops decreasing.
from tensorflow-vgg.
Your implementation doesn't include a moving average. Check the source for more information.
Do you mean 10,000 iterations? I am not sure why the cost stops descending because I suppose this simple testing can achieve almost 0 error.
from tensorflow-vgg.
@machrisaa No, I mean when I add 10,000 images, the cost stops decreasing
from tensorflow-vgg.
So that is a real training. I suppose you have 10,000 images and all of them are tagged into 5 categories. Each image has a 1-hot true result, e.g. [0,1,0,0,0]
. And you feed one or more pairs of image and result to the feed_dict
in each iteration.
If everything is set correctly, you have to run the training quite a long time in order to see the network converge. How many iterations have you run?
Would you like to share your training code to me to have a look?
from tensorflow-vgg.
Sure.
import tensorflow as tf
import vgg19
import utils
import numpy as np
import os
import sys
import random
num_total_images = 0
num_classes = 0
num_batch_size = 20
path_dataset = "../../data/Imagenet_dataset/"
learning_rate = 0.001
dataset_images = []
dataset_labels = []
test_paths_labels = []
for subdir in os.listdir(path_dataset):
if subdir.startswith('.') or "test" == subdir:
continue
elif os.path.isfile(path_dataset + subdir):
test_paths_labels.append(path_dataset + subdir)
continue
for image_file_name in os.listdir(path_dataset + subdir):
if image_file_name.startswith('.'):
continue
image = path_dataset + subdir + '/' + image_file_name
dataset_images.append(image)
dataset_labels.append(subdir) # for test
num_total_images += 1
num_classes = len(set(dataset_labels))
text_classes = list(set(dataset_labels))
for cls_i in range(len(text_classes)):
for i in range(len(dataset_labels)):
if dataset_labels[i] == text_classes[cls_i]:
dataset_labels[i] = [1 if j == cls_i else 0 for j in range(num_classes)]
# generate synset.txt (labels' text for printing)
with open("./synset.txt", "w") as f:
for cls in text_classes:
f.write(cls + "\n")
if __name__=='__main__':
sess = tf.InteractiveSession()
images = tf.placeholder(tf.float32, [num_batch_size, 224, 224, 3])
labels = tf.placeholder(tf.float32, [num_batch_size, num_classes])
# test dataset
train_mode = tf.placeholder(tf.bool)
vgg = vgg.Vgg19(num_batch_size, norm_mode=mode)
vgg.build_net(images, train_mode)
cost = tf.reduce_sum((vgg.prob - labels) ** 2)
train = tf.train.AdamOptimizer(0.0001).minimize(cost)
correct_prediction = tf.equal(tf.argmax(vgg.prob, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.initialize_all_variables())
num_data_trained = 0
# for loading images randomly
dataset_toload = [i for i in range(len(dataset_images))]
print("check", len(dataset_toload), len(dataset_images))
random.seed()
for i in range(1, 10000):
# a batch of data
print ('iteration:', i)
batch_images = []
batch_labels = []
# a random batch of index
batch_rand = []
for _ in range(num_batch_size):
rand_chosen_ind = random.choice(dataset_toload)
batch_rand.append(rand_chosen_ind)
dataset_toload.remove(rand_chosen_ind)
# construct a batch of training data (images & labels)
for one_sample in batch_rand:
# here we load real data
image_file = utils.load_image(dataset_images[one_sample])
batch_images.append(image_file)
batch_labels.append(dataset_labels[one_sample])
# convert list into array
batch_images = np.array(batch_images)
batch_labels = np.array(batch_labels)
batch_images = batch_images.reshape((num_batch_size, 224, 224, 3))
train_feed_dict = {
images : batch_images,
labels : batch_labels,
train_mode : True
}
cost_val = cost.eval(feed_dict=train_feed_dict)
print ('cross entropy: ', cost_val)
sess.run(train, feed_dict=train_feed_dict)
if i % 10 == 0:
with open('./cost.txt', 'a') as f:
f.write(str(cost_val)+'\n')
from tensorflow-vgg.
@machrisaa Sorry that the code is quite terrible since I have no time to refactor it.
from tensorflow-vgg.
http://www.cc.gatech.edu/~hays/compvision/proj6/ Why the error can decrease so fast according to them.
from tensorflow-vgg.
I think it is not actually very fast. They have spent 30 epochs = 45000 iterations to drop their objective function from 2.8 to 1.8. I think you can have a faster and better result in your case of classification of 5 categories.
from tensorflow-vgg.
Hi @machrisaa , but now the iteration is 1136, but the error still doesn't begin to decrease. Is it a prove that my implementation is really wrong?
from tensorflow-vgg.
It should start to converge, although 1136 iterations are still far not enough. I saw that your batch size is 20. Usually, I would prefer to use smaller batch size such as 8. But it should not cause any problem.
Like what I have mentioned above, have you tried to change only the last fully-connected layer and reuse the pre-trained data to train it? It should converge quite quickly. You can try this simplified case first to see if there is anything wrong in your implementation.
from tensorflow-vgg.
@machrisaa I have tried to compare several optimization methods so I need to train a network from scratch. I think the available method I can use is training a more shallow networks. Do you have any suggestion?
Indeed I have tried to change to the last 2 fully connected layers but it doesn't work.
from tensorflow-vgg.
If comparing of different optimisation is all you want, I would suggest you work on the MNIST dataset and highly simplify the network to 2 or 3 conv layers and 1 or 2 fully connected layers at the end. You can save more time to get a result within 10k to 20k iterations with error rate least than 1%.
As I mentioned, VGG is not a trivial network and it requires a certain amount of fine tune on the hyper-parameters. For this kind of network, you may need to spend days and days on training it and adjusting the learning rate manually after several iterations.
from tensorflow-vgg.
@machrisaa Do you have any recommendation about a simpler vgg-like network? Thanks a lot!
from tensorflow-vgg.
You may try the one in the Tensorflow tutorial Deep MNIST for Experts:
import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
import tensorflow as tf
sess = tf.InteractiveSession()
x = tf.placeholder("float", shape=[None, 784])
y_ = tf.placeholder("float", shape=[None, 10])
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
x_image = tf.reshape(x, [-1, 28, 28, 1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
cross_entropy = -tf.reduce_sum(y_ * tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run(tf.initialize_all_variables())
for i in range(500):
batch = mnist.train.next_batch(50)
if i % 100 == 0:
train_accuracy = accuracy.eval(feed_dict={
x: batch[0], y_: batch[1], keep_prob: 1.0})
print("step %d, training accuracy %g" % (i, train_accuracy))
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
print("test accuracy %g" % accuracy.eval(feed_dict={
x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
from tensorflow-vgg.
@machrisaa I would have a try! Thanks for your patience and help all the time!
from tensorflow-vgg.
Related Issues (20)
- The vgg16.npy can't be downed. HOT 4
- Relu layer
- _pickle.UnpicklingError HOT 1
- About NPY file
- why there is no activation function behind the convolution layer? HOT 1
- what about more channels
- Train on my own data... HOT 6
- About accuracy of the pretrained vgg.npy
- the padding in maxpooling of the vgg16.py HOT 1
- Do you know how to implement the "use this to build the VGG object", Please?
- about fc_layer
- vgg19_trainable without trainer?
- the issues about split
- Different value from this version and keras HOT 1
- colaboratory
- Tensorflow version problem HOT 2
- npy file for pytorch?
- Google Colab
- Spelling error.
- When I first use this net,where should I spill the code in the "readme-usage" ? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorflow-vgg.