Coder Social home page Coder Social logo

Comments (32)

machrisaa avatar machrisaa commented on July 18, 2024 2

You are welcome. :)

from tensorflow-vgg.

machrisaa avatar machrisaa commented on July 18, 2024

I am using r0.10. It seems that it can find the key[name][0] in your data_dict. Have you downloaded the vgg16.npy file and put it in your tensorflow-vgg-master directory?

from tensorflow-vgg.

 avatar commented on July 18, 2024

@machrisaa Thanks for your reply. I tried the vgg16.npy file you provide and now it works! I wonder whether I can train the model myself instead of used the pretrained one. Do you have any idea about it?

from tensorflow-vgg.

machrisaa avatar machrisaa commented on July 18, 2024

I haven't tried train the VGG from scratch. But I have re-trained the VGG19 to do other tasks by changing the last 2 fully connected layers. You can try modify the vgg19_trainable.py to vgg16 to do that.

from tensorflow-vgg.

 avatar commented on July 18, 2024

@machrisaa Thanks for you reply. Could you please tell me what the comment on this line of code means? I am not quite understand how do you calculate the number of incoming units as 25088. Thanks a lot!

from tensorflow-vgg.

machrisaa avatar machrisaa commented on July 18, 2024

Hi rylanchiu, basically 25088 = ((224 / (2 ** 5)) ** 2) * 512 means:
224 is the input size
/ (2**5) because there are 5 times reduced the size by 1/2 by the max_pool
** 2 because it is a square and it is the width * height
* 512 because 512 is the number of channels in the self.pool
Therefore, 25088 = ((224 / (2 ** 5)) ** 2) * 512 is the total number of elements in the Tensor self.pool5

from tensorflow-vgg.

 avatar commented on July 18, 2024

@machrisaa Thanks! Now I know all the parts except for the 5. Why there are 5 times reduced the size instead of 4? I am a newbie to CNN, please forgive for my stupid question. Thanks a lot!

from tensorflow-vgg.

 avatar commented on July 18, 2024

Oh I see. There are 5 pooling layers! Thank you very much @machrisaa !

from tensorflow-vgg.

machrisaa avatar machrisaa commented on July 18, 2024

@rylanchiu no problem :)

from tensorflow-vgg.

 avatar commented on July 18, 2024

hi @machrisaa , sorry to disturb you again. Now I tried to run a image classifier on Imagenet. I just pick 5 classes but in the first 10 iteration, it doesn't converge at all, and the cross entropy just shrinks around 1.609. I wonder is it normal? And do you have any idea about it?

from tensorflow-vgg.

machrisaa avatar machrisaa commented on July 18, 2024

I think 10 iterations are far too small. Usually, during a proper training for few epochs, i.e. may be more than 100k iterations, the cost does not always decrease because the network needs to adopt the characteristics of different images and it takes lots of time.

If you would like do some experiments to see the effect of the gradient descent, you may try to train a single image repeatedly. For example, train an image of a tiger repeatedly and you can see the probability of the classification of being a tiger increase.

from tensorflow-vgg.

 avatar commented on July 18, 2024

@machrisaa Thanks for you reply. I did what you told and find all the output is in the form like[1, 0, 0, 0, 0], i.e., predicting one(any of them) class as 1 and the others are 0. Do you know what's the reason. Have you trained a model from scratch with vgg19_trainable.py? Thanks!

from tensorflow-vgg.

machrisaa avatar machrisaa commented on July 18, 2024

Apart from using the pre-trained data to train another model, I haven't tried to train the network from scratch. There are some rumours suggest that it is quite hard to train the VGG from scratch to get the pre-trained result. Perhaps the reason for this is because it highly depends on a good initialisation and a good set of hyperparameters. Unlike some more advanced networks, like ResNet, it does not have batch normalisation to minimise the effect of initialisation and hyperparameters. So most likely that's the reason why it is hard to train.

On the other hand, the simplicity of the VGG is why people choose it to perform tasks other than classification such as style synthesis, super-resolutions, etc. In these case, the VGG is used as a cost function because of its good ability to extract the features of an image.

In your case, my suggestion is that try to remove the last fc6, fc7, and fc8 layers and define 1 or 2 fully-connected layers with the size of the final outcome is 5. For example like this:

self.fc6_custom = self.fc_layer(self.pool5, 25088, 1024, "fc6_custom")
self.relu6 = tf.nn.relu(self.fc6_custom)
self.fc7_custom = self.fc_layer(self.relu6, 1024, 5, "fc7_custom")
self.prob = tf.nn.softmax(self.fc7_custom, name="prob")

And use the original pre-trained data as your initialisation, i.e. load the original npy data. In this case, because there are no variables named beginning with fc6_custom and fc7_custom, new variables will be initialised in the get_var method. And the original variables in the conv layers will be preserved.

This is the method I used to retrain the VGG to analysis sound data by converting sound clip into stereogram.

from tensorflow-vgg.

 avatar commented on July 18, 2024

@machrisaa But can I add batch normalization myself? I have done that but seems useless...

from tensorflow-vgg.

machrisaa avatar machrisaa commented on July 18, 2024

I haven't tried before. Would you like to share how you add the batch normalisation?

from tensorflow-vgg.

 avatar commented on July 18, 2024

@machrisaa Sure.

Here is the bn function:

def batch_norm(self, x, name):
    """Batch normalization."""
    with tf.variable_scope(name):
        params_shape = [x.get_shape()[-1]]

        beta = tf.get_variable('beta', params_shape, tf.float32, 
            initializer=tf.constant_initializer(0.0, tf.float32))

        gamma = tf.get_variable('gamma', params_shape, tf.float32, 
            initializer=tf.constant_initializer(1.0, tf.float32))

        mean, variance = tf.nn.moments(x, [0, 1, 2], name='moments')

        output = tf.nn.batch_normalization(x, mean, variance, beta, gamma, 0.001)
        output.set_shape(x.get_shape())
        return output          

And here is your conv_layer uses bn:

def conv_layer(self, input_, in_channels, out_channels, name):
    fil_shape = [3, 3, in_channels, out_channels]

    with tf.variable_scope(name):
        if self.norm_mode == 'bn':
            input_ = self.batch_norm(input_, name)
        filters = tf.get_variable(name+'_filters', 
                                fil_shape, 
                                dtype=tf.float32,
                                initializer=tf.random_normal_initializer(mean=0.0, stddev=0.001))#self.get_conv_var(3, in_channels, out_channels, name)
        b = tf.get_variable(name+'_bias', [out_channels])
        conv = tf.nn.conv2d(input_, filters, [1, 1, 1, 1], padding='SAME')
        bias = tf.nn.bias_add(conv, b)
        output = tf.nn.relu(bias)
        return output

from tensorflow-vgg.

machrisaa avatar machrisaa commented on July 18, 2024

I am afraid your batch normalisation is not entirely correct. You can try to use the batch_norm from tensorflow.contrib.layers. And I usually will put the batch_norm after several blocks of conv layers, e.g. after conv1_2, conv2_2, etc.

But looks like it is not the problem. I have modified the test without using any pre-trained data:

img1 = utils.load_image("./test_data/tiger.jpeg")
img1_true_result = [1, 0, 0, 0, 0]  # 1-hot result

batch1 = img1.reshape((1, 224, 224, 3))

with tf.device('/cpu:0'):
    sess = tf.Session()

    images = tf.placeholder(tf.float32, [1, 224, 224, 3])
    true_out = tf.placeholder(tf.float32, [1, 5])
    train_mode = tf.placeholder(tf.bool)

    # no pre-trained data used:
    vgg = vgg19.Vgg19(None)
    vgg.build(images, train_mode)

    sess.run(tf.initialize_all_variables())

    # test classification
    prob = sess.run(vgg.prob, feed_dict={images: batch1, train_mode: False})
    print 'before train:', prob

    cost = tf.reduce_sum((vgg.prob - true_out) ** 2)
    train = tf.train.GradientDescentOptimizer(0.0001).minimize(cost)

    for i in xrange(100000):
        sess.run(train, feed_dict={images: batch1, true_out: [img1_true_result], train_mode: True})

        # test classification again, should have a higher probability about tiger
        prob = sess.run(vgg.prob, feed_dict={images: batch1, train_mode: False})
        print 'step %d:' % i, prob

Also, modify the VGG to return a result of size equals 5:

self.fc8 = self.fc_layer(self.relu7, 4096, 5, "fc8")

And this is the result I got:

before train: [[ 0.2002476   0.1999044   0.20009962  0.19998285  0.19976552]]
step 0: [[ 0.20025404  0.19990279  0.20009799  0.19998124  0.19976392]]
step 1: [[ 0.20026048  0.19990119  0.2000964   0.19997966  0.19976233]]
step 2: [[ 0.20026691  0.1998996   0.20009479  0.19997805  0.19976074]]
step 3: [[ 0.20027333  0.19989797  0.20009315  0.19997641  0.19975911]]
step 4: [[ 0.20027977  0.19989637  0.20009156  0.19997482  0.19975752]]
step 5: [[ 0.20028618  0.19989474  0.20008992  0.1999732   0.19975589]]
step 6: [[ 0.20029263  0.19989315  0.20008834  0.19997162  0.19975431]]
step 7: [[ 0.20029905  0.19989154  0.20008671  0.19996999  0.19975269]]
step 8: [[ 0.20030549  0.19988994  0.2000851   0.19996837  0.19975108]]
step 9: [[ 0.20031191  0.19988832  0.20008348  0.19996676  0.19974948]]
step 10: [[ 0.20031837  0.19988672  0.20008188  0.19996516  0.19974789]]

You should get non-integral result instead of [1,0,0,0,0]. And in this example, you should see the first value is increasing in each step.

from tensorflow-vgg.

 avatar commented on July 18, 2024

@machrisaa Could you please tell me what have I done wrong with batch normalization? Indeed I can see the error reduce when feeding only one image. But when the amount is about 10,000, it stops decreasing.

from tensorflow-vgg.

machrisaa avatar machrisaa commented on July 18, 2024

Your implementation doesn't include a moving average. Check the source for more information.

Do you mean 10,000 iterations? I am not sure why the cost stops descending because I suppose this simple testing can achieve almost 0 error.

from tensorflow-vgg.

 avatar commented on July 18, 2024

@machrisaa No, I mean when I add 10,000 images, the cost stops decreasing

from tensorflow-vgg.

machrisaa avatar machrisaa commented on July 18, 2024

So that is a real training. I suppose you have 10,000 images and all of them are tagged into 5 categories. Each image has a 1-hot true result, e.g. [0,1,0,0,0]. And you feed one or more pairs of image and result to the feed_dict in each iteration.

If everything is set correctly, you have to run the training quite a long time in order to see the network converge. How many iterations have you run?

Would you like to share your training code to me to have a look?

from tensorflow-vgg.

 avatar commented on July 18, 2024

Sure.

import tensorflow as tf

import vgg19
import utils
import numpy as np
import os
import sys
import random

num_total_images = 0
num_classes = 0
num_batch_size = 20
path_dataset = "../../data/Imagenet_dataset/"
learning_rate = 0.001

dataset_images = []
dataset_labels = []
test_paths_labels = []
for subdir in os.listdir(path_dataset):
    if subdir.startswith('.') or "test" == subdir:
        continue
    elif os.path.isfile(path_dataset + subdir):
        test_paths_labels.append(path_dataset + subdir)
        continue
    for image_file_name in os.listdir(path_dataset + subdir):
        if image_file_name.startswith('.'):
            continue
        image = path_dataset + subdir + '/' + image_file_name
        dataset_images.append(image)
        dataset_labels.append(subdir) # for test
        num_total_images += 1
num_classes = len(set(dataset_labels)) 
text_classes = list(set(dataset_labels)) 
for cls_i in range(len(text_classes)): 
    for i in range(len(dataset_labels)):
        if dataset_labels[i] == text_classes[cls_i]:
            dataset_labels[i] = [1 if j == cls_i else 0 for j in range(num_classes)]
# generate synset.txt (labels' text for printing)
with open("./synset.txt", "w") as f:
    for cls in text_classes:
        f.write(cls + "\n")


if __name__=='__main__':

    sess = tf.InteractiveSession()

    images = tf.placeholder(tf.float32, [num_batch_size, 224, 224, 3])
    labels = tf.placeholder(tf.float32, [num_batch_size, num_classes]) 
    # test dataset
    train_mode = tf.placeholder(tf.bool)

    vgg = vgg.Vgg19(num_batch_size, norm_mode=mode)
    vgg.build_net(images, train_mode)
    cost = tf.reduce_sum((vgg.prob - labels) ** 2) 
    train = tf.train.AdamOptimizer(0.0001).minimize(cost)
    correct_prediction = tf.equal(tf.argmax(vgg.prob, 1), tf.argmax(labels, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    sess.run(tf.initialize_all_variables())

    num_data_trained = 0
    # for loading images randomly
    dataset_toload = [i for i in range(len(dataset_images))]
    print("check", len(dataset_toload), len(dataset_images))
    random.seed()
    for i in range(1, 10000):
        # a batch of data
        print ('iteration:', i)
        batch_images = []
        batch_labels = []
        # a random batch of index
        batch_rand = []
        for _ in range(num_batch_size):
            rand_chosen_ind = random.choice(dataset_toload)
            batch_rand.append(rand_chosen_ind)
            dataset_toload.remove(rand_chosen_ind)

        # construct a batch of training data (images & labels)
        for one_sample in batch_rand:
            # here we load real data
            image_file = utils.load_image(dataset_images[one_sample])
            batch_images.append(image_file)
            batch_labels.append(dataset_labels[one_sample])

        # convert list into array
        batch_images = np.array(batch_images)
        batch_labels = np.array(batch_labels)
        
        batch_images = batch_images.reshape((num_batch_size, 224, 224, 3))
        
        train_feed_dict = {
            images : batch_images,
            labels : batch_labels,
            train_mode : True
        }

        cost_val = cost.eval(feed_dict=train_feed_dict)
        print ('cross entropy: ', cost_val)
        sess.run(train, feed_dict=train_feed_dict)
        if i % 10 == 0:
            with open('./cost.txt', 'a') as f:
                f.write(str(cost_val)+'\n')

from tensorflow-vgg.

 avatar commented on July 18, 2024

@machrisaa Sorry that the code is quite terrible since I have no time to refactor it.

from tensorflow-vgg.

 avatar commented on July 18, 2024

http://www.cc.gatech.edu/~hays/compvision/proj6/ Why the error can decrease so fast according to them.

from tensorflow-vgg.

machrisaa avatar machrisaa commented on July 18, 2024

I think it is not actually very fast. They have spent 30 epochs = 45000 iterations to drop their objective function from 2.8 to 1.8. I think you can have a faster and better result in your case of classification of 5 categories.

from tensorflow-vgg.

 avatar commented on July 18, 2024

Hi @machrisaa , but now the iteration is 1136, but the error still doesn't begin to decrease. Is it a prove that my implementation is really wrong?

from tensorflow-vgg.

machrisaa avatar machrisaa commented on July 18, 2024

It should start to converge, although 1136 iterations are still far not enough. I saw that your batch size is 20. Usually, I would prefer to use smaller batch size such as 8. But it should not cause any problem.

Like what I have mentioned above, have you tried to change only the last fully-connected layer and reuse the pre-trained data to train it? It should converge quite quickly. You can try this simplified case first to see if there is anything wrong in your implementation.

from tensorflow-vgg.

 avatar commented on July 18, 2024

@machrisaa I have tried to compare several optimization methods so I need to train a network from scratch. I think the available method I can use is training a more shallow networks. Do you have any suggestion?
Indeed I have tried to change to the last 2 fully connected layers but it doesn't work.

from tensorflow-vgg.

machrisaa avatar machrisaa commented on July 18, 2024

If comparing of different optimisation is all you want, I would suggest you work on the MNIST dataset and highly simplify the network to 2 or 3 conv layers and 1 or 2 fully connected layers at the end. You can save more time to get a result within 10k to 20k iterations with error rate least than 1%.

As I mentioned, VGG is not a trivial network and it requires a certain amount of fine tune on the hyper-parameters. For this kind of network, you may need to spend days and days on training it and adjusting the learning rate manually after several iterations.

from tensorflow-vgg.

 avatar commented on July 18, 2024

@machrisaa Do you have any recommendation about a simpler vgg-like network? Thanks a lot!

from tensorflow-vgg.

machrisaa avatar machrisaa commented on July 18, 2024

You may try the one in the Tensorflow tutorial Deep MNIST for Experts:

import input_data

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

import tensorflow as tf

sess = tf.InteractiveSession()

x = tf.placeholder("float", shape=[None, 784])
y_ = tf.placeholder("float", shape=[None, 10])


def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)


def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)


def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')


def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                          strides=[1, 2, 2, 1], padding='SAME')


W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

x_image = tf.reshape(x, [-1, 28, 28, 1])

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

cross_entropy = -tf.reduce_sum(y_ * tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run(tf.initialize_all_variables())
for i in range(500):
    batch = mnist.train.next_batch(50)
    if i % 100 == 0:
        train_accuracy = accuracy.eval(feed_dict={
            x: batch[0], y_: batch[1], keep_prob: 1.0})
        print("step %d, training accuracy %g" % (i, train_accuracy))
    train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

print("test accuracy %g" % accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

from tensorflow-vgg.

 avatar commented on July 18, 2024

@machrisaa I would have a try! Thanks for your patience and help all the time!

from tensorflow-vgg.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.