Coder Social home page Coder Social logo

Comments (7)

nalldrin avatar nalldrin commented on July 22, 2024 2

FYI, here's the code I used to do the preprocessing:

import tensorflow as tf

import tensorflow.contrib.slim as slim
from tensorflow.contrib.slim.python.slim.nets import inception_v3
from tensorflow.contrib.slim.python.slim.nets import resnet_v1
from preprocessing import preprocessing_factory

def PreprocessImage(image, network='resnet_v1_101', image_size=299):
  # If resolution is larger than 224 we need to adjust some internal resizing
  # parameters for vgg preprocessing.
  if any(network.startswith(x) for x in ['resnet', 'vgg']):
    preprocessing_kwargs = {
        'resize_side_min': int(256 * image_size / 224),
        'resize_side_max': int(512 * image_size / 224)
    }
  else:
    preprocessing_kwargs = {}
  preprocessing_fn = preprocessing_factory.get_preprocessing(
      name=network, is_training=False)

  height = image_size
  width = image_size
  image = preprocessing_fn(image, height, width, **preprocessing_kwargs)
  image.set_shape([height, width, 3])
  return image

Note that there appears to be some small difference between the public version of slim image processing library and the internal version (which the meta graph is based on); I get results that are very close, but not exactly identical to the metagraph:
3272 : 0.954818 : /m/068hy : Pet
1076 : 0.953186 : /m/01yrx : Cat
0708 : 0.893966 : /m/01l7qd : Whiskers
4755 : 0.890339 : /m/0jbk : Animal
2847 : 0.882459 : /m/04rky : Mammal
2036 : 0.777796 : /m/0307l : Felidae
3574 : 0.765511 : /m/07k6w8 : Small to medium-sized cats
4799 : 0.679017 : /m/0k0pj : Nose
1495 : 0.476687 : /m/02cqfm : Close-up
0036 : 0.385427 : /m/012c9l : Domestic short-haired cat

If you dig through the preprocess factory it should be vgg preprocessing that's used btw and also note the tweak I had to make to the kwargs... I just filed a bug to tf-slim about this.

-Neil

from dataset.

nalldrin avatar nalldrin commented on July 22, 2024 1

Great to hear that it's working for you now! Sorry for the trouble. I just added a note to classify_oidv2.py to explain the image preprocessing details (for others that hit this issue in the future).

from dataset.

rkrasin avatar rkrasin commented on July 22, 2024

Hi @chenghuige,

I would say that there is not enough info in your question to answer it. Can you show us a script which doesn't work, so that it's possible to reproduce?

from dataset.

chenghuige avatar chenghuige commented on July 22, 2024

@rkrasin Thanks for quick reply. I tried to debug and find the problem is due to preprocess.
I use incpetion and 299 * 299 which will got wrong result.
I can get more resonable result using resnet_v1_101 and 224*224 , class names and rank is ok but score is different from demo code(I can not set 299 * 299 when using resnet_v1_101 preprocess)

using meta graph:
3272: /m/068hy - Pet (score = 0.96)
1076: /m/01yrx - Cat (score = 0.95)
0708: /m/01l7qd - Whiskers (score = 0.90)

using inception v3 preprocess 299 * 299
preprocessing_fn = preprocessing_factory.get_preprocessing('inception_v3', False)
image = preprocessing_fn(image, 299, 299)

3621: /m/07s6nbt - Text (score = 0.69)
3886: /m/09q2t - Brown (score = 0.66)
2306: /m/03gq5hm - Font (score = 0.62)

using resnet_v1_101 preproces 224 * 224
preprocessing_fn = preprocessing_factory.get_preprocessing('resnet_v1_101', False)
image = preprocessing_fn(image, 224, 224) # NOTICE setting to 299 will got error here

3272: /m/068hy - Pet (score = 0.87)
1076: /m/01yrx - Cat (score = 0.68)
2847: /m/04rky - Mammal (score = 0.68)

What confused me is oidv2-resnet_v1_101.readme.txt
it said 'input preprocessing was used with image resolution 299x299'
But seems when inference only can use 224 ? what to do if I want to finetune using 299 * 299 ?
On slim site for resnet v2 152 model, it said, " ^ ResNet V2 models use Inception pre-processing and input image size of 299 (use --preprocessing_name inception --eval_image_size 299 when using eval_image_classifier.py). Performance numbers for ResNet V2 models are reported on the ImageNet validation set." And I verified it is ok to use inception v3 preprocess and 299 * 299 for that checkpoint.

I post the code below, thanks for your attention.

from dataset.

chenghuige avatar chenghuige commented on July 22, 2024
  from __future__ import absolute_import
  from __future__ import division
  from __future__ import print_function

  import sys, os
  import numpy as np
  import tensorflow as tf
  flags = tf.app.flags
  FLAGS = flags.FLAGS
  import sys, os, math

  import tensorflow.contrib.slim as slim 
  from nets import nets_factory
  from preprocessing import preprocessing_factory 

  def read_image(image_path):
    #with tf.device('/cpu:0'):
    with tf.gfile.FastGFile(image_path, "r") as f:
      encoded_image = f.read()
    return encoded_image

  def LoadLabelMap(labelmap_path, dict_path):
    """Load index->mid and mid->display name maps.
    Args:
      labelmap_path: path to the file with the list of mids, describing
          predictions.
      dict_path: path to the dict.csv that translates from mids to display names.
    Returns:
      labelmap: an index to mid list
      label_dict: mid to display name dictionary
    """
    labelmap = [line.rstrip() for line in tf.gfile.GFile(labelmap_path)]
    label_dict = {}
    for line in tf.gfile.GFile(dict_path):
      words = [word.strip(' "\n') for word in line.split(',', 1)]
      label_dict[words[0]] = words[1]
    return labelmap, label_dict


  labelmap_path = './classes-trainable.txt'
  dict_path = './class-descriptions.csv'
  labelmap, label_dict = LoadLabelMap(labelmap_path, dict_path)
  image_checkpoint = '/home/gezi/data/image_model_check_point/openimage/resnet101/oidv2-resnet_v1_101.ckpt'

  image = read_image('./cat.jpg')
  image = tf.image.decode_jpeg(image, channels=3)
  #preprocessing_fn = preprocessing_factory.get_preprocessing('inception_v3', False)  
  #image = preprocessing_fn(image, 299, 299)
  preprocessing_fn = preprocessing_factory.get_preprocessing('resnet_v1_101', False)  
  image = preprocessing_fn(image, 224, 224)
  image = tf.expand_dims(image, 0)

  num_classes = 5000
  net_name = 'resnet_v1_101'
  net_fn = nets_factory.get_network_fn(net_name, num_classes=num_classes, is_training=False)
  logits, end_points = net_fn(image)
  logits = tf.squeeze(logits, name='SpatialSqueeze')
  predictions = tf.nn.sigmoid(logits, name='multi_predictions')
  variables_to_restore = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=net_name)
  saver = tf.train.Saver(variables_to_restore)
  sess = tf.InteractiveSession()
  saver.restore(sess, image_checkpoint)

  predictions_eval = sess.run(predictions)

  top_k = predictions_eval.argsort()[::-1]  # indices sorted by score
  top_k = top_k[:10]
  print('top_k', top_k)
  for idx in top_k:
    mid = labelmap[idx]
    display_name = label_dict[mid]
    score = predictions_eval[idx]
    print('{:04d}: {} - {} (score = {:.2f})'.format(
        idx, mid, display_name, score))

from dataset.

rkrasin avatar rkrasin commented on July 22, 2024

@nalldrin can you please comment on the oidv2-resnet_v1_101.readme.txt and the statement about the resolution there?

from dataset.

chenghuige avatar chenghuige commented on July 22, 2024

@nalldrin Thanks ! Now I can run caption model using this checkpoint.
One more thing may be not important, but interesting is when using slim models, I used to add
image = tf.image.convert_image_dtype(image, dtype=tf.float32) after decode_jpeg(just tested with or with
out this can produce same result)
But for this checkpoint, using the preprocess code you wrote, I must remove this line otherwise I got wrong result.

from dataset.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.