I tried to use oidv2-resnet_v1_101.ckpt like other pretrain models on slim site. N

FYI, here's the code I used to do the preprocessing: <div class="snippet-clipboard

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Build graph using slim resnet 101 and load oidv2-resnet_v1_101.ckpt produce wrong result about dataset HOT 7 CLOSED

openimages commented on July 22, 2024

Build graph using slim resnet 101 and load oidv2-resnet_v1_101.ckpt produce wrong result

from dataset.

Comments (7)

nalldrin commented on July 22, 2024 2

FYI, here's the code I used to do the preprocessing:

import tensorflow as tf

import tensorflow.contrib.slim as slim
from tensorflow.contrib.slim.python.slim.nets import inception_v3
from tensorflow.contrib.slim.python.slim.nets import resnet_v1
from preprocessing import preprocessing_factory

def PreprocessImage(image, network='resnet_v1_101', image_size=299):
  # If resolution is larger than 224 we need to adjust some internal resizing
  # parameters for vgg preprocessing.
  if any(network.startswith(x) for x in ['resnet', 'vgg']):
    preprocessing_kwargs = {
        'resize_side_min': int(256 * image_size / 224),
        'resize_side_max': int(512 * image_size / 224)
    }
  else:
    preprocessing_kwargs = {}
  preprocessing_fn = preprocessing_factory.get_preprocessing(
      name=network, is_training=False)

  height = image_size
  width = image_size
  image = preprocessing_fn(image, height, width, **preprocessing_kwargs)
  image.set_shape([height, width, 3])
  return image

Note that there appears to be some small difference between the public version of slim image processing library and the internal version (which the meta graph is based on); I get results that are very close, but not exactly identical to the metagraph:
3272 : 0.954818 : /m/068hy : Pet
1076 : 0.953186 : /m/01yrx : Cat
0708 : 0.893966 : /m/01l7qd : Whiskers
4755 : 0.890339 : /m/0jbk : Animal
2847 : 0.882459 : /m/04rky : Mammal
2036 : 0.777796 : /m/0307l : Felidae
3574 : 0.765511 : /m/07k6w8 : Small to medium-sized cats
4799 : 0.679017 : /m/0k0pj : Nose
1495 : 0.476687 : /m/02cqfm : Close-up
0036 : 0.385427 : /m/012c9l : Domestic short-haired cat

If you dig through the preprocess factory it should be vgg preprocessing that's used btw and also note the tweak I had to make to the kwargs... I just filed a bug to tf-slim about this.

-Neil

from dataset.

nalldrin commented on July 22, 2024 1

Great to hear that it's working for you now! Sorry for the trouble. I just added a note to classify_oidv2.py to explain the image preprocessing details (for others that hit this issue in the future).

from dataset.

rkrasin commented on July 22, 2024

Hi @chenghuige,

I would say that there is not enough info in your question to answer it. Can you show us a script which doesn't work, so that it's possible to reproduce?

from dataset.

chenghuige commented on July 22, 2024

@rkrasin Thanks for quick reply. I tried to debug and find the problem is due to preprocess.
I use incpetion and 299 * 299 which will got wrong result.
I can get more resonable result using resnet_v1_101 and 224*224 , class names and rank is ok but score is different from demo code(I can not set 299 * 299 when using resnet_v1_101 preprocess)

using meta graph:
3272: /m/068hy - Pet (score = 0.96)
1076: /m/01yrx - Cat (score = 0.95)
0708: /m/01l7qd - Whiskers (score = 0.90)

using inception v3 preprocess 299 * 299
preprocessing_fn = preprocessing_factory.get_preprocessing('inception_v3', False)
image = preprocessing_fn(image, 299, 299)

3621: /m/07s6nbt - Text (score = 0.69)
3886: /m/09q2t - Brown (score = 0.66)
2306: /m/03gq5hm - Font (score = 0.62)

using resnet_v1_101 preproces 224 * 224
preprocessing_fn = preprocessing_factory.get_preprocessing('resnet_v1_101', False)
image = preprocessing_fn(image, 224, 224) # NOTICE setting to 299 will got error here

3272: /m/068hy - Pet (score = 0.87)
1076: /m/01yrx - Cat (score = 0.68)
2847: /m/04rky - Mammal (score = 0.68)

What confused me is oidv2-resnet_v1_101.readme.txt
it said 'input preprocessing was used with image resolution 299x299'
But seems when inference only can use 224 ? what to do if I want to finetune using 299 * 299 ?
On slim site for resnet v2 152 model, it said, " ^ ResNet V2 models use Inception pre-processing and input image size of 299 (use --preprocessing_name inception --eval_image_size 299 when using eval_image_classifier.py). Performance numbers for ResNet V2 models are reported on the ImageNet validation set." And I verified it is ok to use inception v3 preprocess and 299 * 299 for that checkpoint.

I post the code below, thanks for your attention.

from dataset.

chenghuige commented on July 22, 2024

  from __future__ import absolute_import
  from __future__ import division
  from __future__ import print_function

  import sys, os
  import numpy as np
  import tensorflow as tf
  flags = tf.app.flags
  FLAGS = flags.FLAGS
  import sys, os, math

  import tensorflow.contrib.slim as slim 
  from nets import nets_factory
  from preprocessing import preprocessing_factory 

  def read_image(image_path):
    #with tf.device('/cpu:0'):
    with tf.gfile.FastGFile(image_path, "r") as f:
      encoded_image = f.read()
    return encoded_image

  def LoadLabelMap(labelmap_path, dict_path):
    """Load index->mid and mid->display name maps.
    Args:
      labelmap_path: path to the file with the list of mids, describing
          predictions.
      dict_path: path to the dict.csv that translates from mids to display names.
    Returns:
      labelmap: an index to mid list
      label_dict: mid to display name dictionary
    """
    labelmap = [line.rstrip() for line in tf.gfile.GFile(labelmap_path)]
    label_dict = {}
    for line in tf.gfile.GFile(dict_path):
      words = [word.strip(' "\n') for word in line.split(',', 1)]
      label_dict[words[0]] = words[1]
    return labelmap, label_dict


  labelmap_path = './classes-trainable.txt'
  dict_path = './class-descriptions.csv'
  labelmap, label_dict = LoadLabelMap(labelmap_path, dict_path)
  image_checkpoint = '/home/gezi/data/image_model_check_point/openimage/resnet101/oidv2-resnet_v1_101.ckpt'

  image = read_image('./cat.jpg')
  image = tf.image.decode_jpeg(image, channels=3)
  #preprocessing_fn = preprocessing_factory.get_preprocessing('inception_v3', False)  
  #image = preprocessing_fn(image, 299, 299)
  preprocessing_fn = preprocessing_factory.get_preprocessing('resnet_v1_101', False)  
  image = preprocessing_fn(image, 224, 224)
  image = tf.expand_dims(image, 0)

  num_classes = 5000
  net_name = 'resnet_v1_101'
  net_fn = nets_factory.get_network_fn(net_name, num_classes=num_classes, is_training=False)
  logits, end_points = net_fn(image)
  logits = tf.squeeze(logits, name='SpatialSqueeze')
  predictions = tf.nn.sigmoid(logits, name='multi_predictions')
  variables_to_restore = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=net_name)
  saver = tf.train.Saver(variables_to_restore)
  sess = tf.InteractiveSession()
  saver.restore(sess, image_checkpoint)

  predictions_eval = sess.run(predictions)

  top_k = predictions_eval.argsort()[::-1]  # indices sorted by score
  top_k = top_k[:10]
  print('top_k', top_k)
  for idx in top_k:
    mid = labelmap[idx]
    display_name = label_dict[mid]
    score = predictions_eval[idx]
    print('{:04d}: {} - {} (score = {:.2f})'.format(
        idx, mid, display_name, score))

from dataset.

rkrasin commented on July 22, 2024

@nalldrin can you please comment on the oidv2-resnet_v1_101.readme.txt and the statement about the resolution there?

from dataset.

chenghuige commented on July 22, 2024

@nalldrin Thanks ! Now I can run caption model using this checkpoint.
One more thing may be not important, but interesting is when using slim models, I used to add
image = tf.image.convert_image_dtype(image, dtype=tf.float32) after decode_jpeg(just tested with or with
out this can produce same result)
But for this checkpoint, using the preprocess code you wrote, I must remove this line otherwise I got wrong result.

from dataset.

Build graph using slim resnet 101 and load oidv2-resnet_v1_101.ckpt produce wrong result about dataset HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent