Comments (7)
FYI, here's the code I used to do the preprocessing:
import tensorflow as tf
import tensorflow.contrib.slim as slim
from tensorflow.contrib.slim.python.slim.nets import inception_v3
from tensorflow.contrib.slim.python.slim.nets import resnet_v1
from preprocessing import preprocessing_factory
def PreprocessImage(image, network='resnet_v1_101', image_size=299):
# If resolution is larger than 224 we need to adjust some internal resizing
# parameters for vgg preprocessing.
if any(network.startswith(x) for x in ['resnet', 'vgg']):
preprocessing_kwargs = {
'resize_side_min': int(256 * image_size / 224),
'resize_side_max': int(512 * image_size / 224)
}
else:
preprocessing_kwargs = {}
preprocessing_fn = preprocessing_factory.get_preprocessing(
name=network, is_training=False)
height = image_size
width = image_size
image = preprocessing_fn(image, height, width, **preprocessing_kwargs)
image.set_shape([height, width, 3])
return image
Note that there appears to be some small difference between the public version of slim image processing library and the internal version (which the meta graph is based on); I get results that are very close, but not exactly identical to the metagraph:
3272 : 0.954818 : /m/068hy : Pet
1076 : 0.953186 : /m/01yrx : Cat
0708 : 0.893966 : /m/01l7qd : Whiskers
4755 : 0.890339 : /m/0jbk : Animal
2847 : 0.882459 : /m/04rky : Mammal
2036 : 0.777796 : /m/0307l : Felidae
3574 : 0.765511 : /m/07k6w8 : Small to medium-sized cats
4799 : 0.679017 : /m/0k0pj : Nose
1495 : 0.476687 : /m/02cqfm : Close-up
0036 : 0.385427 : /m/012c9l : Domestic short-haired cat
If you dig through the preprocess factory it should be vgg preprocessing that's used btw and also note the tweak I had to make to the kwargs... I just filed a bug to tf-slim about this.
-Neil
from dataset.
Great to hear that it's working for you now! Sorry for the trouble. I just added a note to classify_oidv2.py to explain the image preprocessing details (for others that hit this issue in the future).
from dataset.
Hi @chenghuige,
I would say that there is not enough info in your question to answer it. Can you show us a script which doesn't work, so that it's possible to reproduce?
from dataset.
@rkrasin Thanks for quick reply. I tried to debug and find the problem is due to preprocess.
I use incpetion and 299 * 299 which will got wrong result.
I can get more resonable result using resnet_v1_101 and 224*224 , class names and rank is ok but score is different from demo code(I can not set 299 * 299 when using resnet_v1_101 preprocess)
using meta graph:
3272: /m/068hy - Pet (score = 0.96)
1076: /m/01yrx - Cat (score = 0.95)
0708: /m/01l7qd - Whiskers (score = 0.90)
using inception v3 preprocess 299 * 299
preprocessing_fn = preprocessing_factory.get_preprocessing('inception_v3', False)
image = preprocessing_fn(image, 299, 299)
3621: /m/07s6nbt - Text (score = 0.69)
3886: /m/09q2t - Brown (score = 0.66)
2306: /m/03gq5hm - Font (score = 0.62)
using resnet_v1_101 preproces 224 * 224
preprocessing_fn = preprocessing_factory.get_preprocessing('resnet_v1_101', False)
image = preprocessing_fn(image, 224, 224) # NOTICE setting to 299 will got error here
3272: /m/068hy - Pet (score = 0.87)
1076: /m/01yrx - Cat (score = 0.68)
2847: /m/04rky - Mammal (score = 0.68)
What confused me is oidv2-resnet_v1_101.readme.txt
it said 'input preprocessing was used with image resolution 299x299'
But seems when inference only can use 224 ? what to do if I want to finetune using 299 * 299 ?
On slim site for resnet v2 152 model, it said, " ^ ResNet V2 models use Inception pre-processing and input image size of 299 (use --preprocessing_name inception --eval_image_size 299 when using eval_image_classifier.py). Performance numbers for ResNet V2 models are reported on the ImageNet validation set." And I verified it is ok to use inception v3 preprocess and 299 * 299 for that checkpoint.
I post the code below, thanks for your attention.
from dataset.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys, os
import numpy as np
import tensorflow as tf
flags = tf.app.flags
FLAGS = flags.FLAGS
import sys, os, math
import tensorflow.contrib.slim as slim
from nets import nets_factory
from preprocessing import preprocessing_factory
def read_image(image_path):
#with tf.device('/cpu:0'):
with tf.gfile.FastGFile(image_path, "r") as f:
encoded_image = f.read()
return encoded_image
def LoadLabelMap(labelmap_path, dict_path):
"""Load index->mid and mid->display name maps.
Args:
labelmap_path: path to the file with the list of mids, describing
predictions.
dict_path: path to the dict.csv that translates from mids to display names.
Returns:
labelmap: an index to mid list
label_dict: mid to display name dictionary
"""
labelmap = [line.rstrip() for line in tf.gfile.GFile(labelmap_path)]
label_dict = {}
for line in tf.gfile.GFile(dict_path):
words = [word.strip(' "\n') for word in line.split(',', 1)]
label_dict[words[0]] = words[1]
return labelmap, label_dict
labelmap_path = './classes-trainable.txt'
dict_path = './class-descriptions.csv'
labelmap, label_dict = LoadLabelMap(labelmap_path, dict_path)
image_checkpoint = '/home/gezi/data/image_model_check_point/openimage/resnet101/oidv2-resnet_v1_101.ckpt'
image = read_image('./cat.jpg')
image = tf.image.decode_jpeg(image, channels=3)
#preprocessing_fn = preprocessing_factory.get_preprocessing('inception_v3', False)
#image = preprocessing_fn(image, 299, 299)
preprocessing_fn = preprocessing_factory.get_preprocessing('resnet_v1_101', False)
image = preprocessing_fn(image, 224, 224)
image = tf.expand_dims(image, 0)
num_classes = 5000
net_name = 'resnet_v1_101'
net_fn = nets_factory.get_network_fn(net_name, num_classes=num_classes, is_training=False)
logits, end_points = net_fn(image)
logits = tf.squeeze(logits, name='SpatialSqueeze')
predictions = tf.nn.sigmoid(logits, name='multi_predictions')
variables_to_restore = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=net_name)
saver = tf.train.Saver(variables_to_restore)
sess = tf.InteractiveSession()
saver.restore(sess, image_checkpoint)
predictions_eval = sess.run(predictions)
top_k = predictions_eval.argsort()[::-1] # indices sorted by score
top_k = top_k[:10]
print('top_k', top_k)
for idx in top_k:
mid = labelmap[idx]
display_name = label_dict[mid]
score = predictions_eval[idx]
print('{:04d}: {} - {} (score = {:.2f})'.format(
idx, mid, display_name, score))
from dataset.
@nalldrin can you please comment on the oidv2-resnet_v1_101.readme.txt and the statement about the resolution there?
from dataset.
@nalldrin Thanks ! Now I can run caption model using this checkpoint.
One more thing may be not important, but interesting is when using slim models, I used to add
image = tf.image.convert_image_dtype(image, dtype=tf.float32) after decode_jpeg(just tested with or with
out this can produce same result)
But for this checkpoint, using the preprocess code you wrote, I must remove this line otherwise I got wrong result.
from dataset.
Related Issues (20)
- OpenImages V6 data set HOT 1
- there are no cat and dog coarse-grain category. HOT 1
- Image 01a624308e2f8c5d in oidv6-train-annotations-bbox.csv is mislabled
- Mislabeled Images HOT 1
- segmentations.csv mask 3 coordinates HOT 1
- Decoding Openimages v6 mask coordinates HOT 2
- BadZipFile Error HOT 3
- Soil-dataset
- L
- Golf rounds
- OIDv4 Tool Kit Windows 10 Python 3.7 HOT 2
- Extended dataset download per category? HOT 1
- (V5) Mismatched image and mask resolutions. HOT 2
- Explore UI does not load images HOT 2
- How to report invalid/questionable images? HOT 5
- Open Image Dataset V5 to COCO JSON format
- Why not build a video instance segmentation dataset?
- Where can I download the OpenImage V2 dataset? HOT 1
- Hierarchy question
- Request to add pretrained large-scale object detector to "Community Contributions" HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dataset.