Coder Social home page Coder Social logo

zzh8829 / yolov3-tf2 Goto Github PK

View Code? Open in Web Editor NEW
2.5K 76.0 915.0 4.24 MB

YoloV3 Implemented in Tensorflow 2.0

License: MIT License

Python 1.97% Jupyter Notebook 98.03%
tensorflow tf2 yolo yolov3 object-detection deep-learning machine-learning neural-network tensorflow-tutorials tensorflow-examples

yolov3-tf2's Introduction

YoloV3 Implemented in TensorFlow 2.0

Open In Colab

This repo provides a clean implementation of YoloV3 in TensorFlow 2.0 using all the best practices.

Key Features

  • TensorFlow 2.0
  • yolov3 with pre-trained Weights
  • yolov3-tiny with pre-trained Weights
  • Inference example
  • Transfer learning example
  • Eager mode training with tf.GradientTape
  • Graph mode training with model.fit
  • Functional model with tf.keras.layers
  • Input pipeline using tf.data
  • Tensorflow Serving
  • Vectorized transformations
  • GPU accelerated
  • Fully integrated with absl-py from abseil.io
  • Clean implementation
  • Following the best practices
  • MIT License

demo demo

Usage

Installation

Conda (Recommended)

# Tensorflow CPU
conda env create -f conda-cpu.yml
conda activate yolov3-tf2-cpu

# Tensorflow GPU
conda env create -f conda-gpu.yml
conda activate yolov3-tf2-gpu

Pip

pip install -r requirements.txt

Nvidia Driver (For GPU)

# Ubuntu 18.04
sudo apt-add-repository -r ppa:graphics-drivers/ppa
sudo apt install nvidia-driver-430
# Windows/Other
https://www.nvidia.com/Download/index.aspx

Convert pre-trained Darknet weights

# yolov3
wget https://pjreddie.com/media/files/yolov3.weights -O data/yolov3.weights
python convert.py --weights ./data/yolov3.weights --output ./checkpoints/yolov3.tf

# yolov3-tiny
wget https://pjreddie.com/media/files/yolov3-tiny.weights -O data/yolov3-tiny.weights
python convert.py --weights ./data/yolov3-tiny.weights --output ./checkpoints/yolov3-tiny.tf --tiny

Detection

# yolov3
python detect.py --image ./data/meme.jpg

# yolov3-tiny
python detect.py --weights ./checkpoints/yolov3-tiny.tf --tiny --image ./data/street.jpg

# webcam
python detect_video.py --video 0

# video file
python detect_video.py --video path_to_file.mp4 --weights ./checkpoints/yolov3-tiny.tf --tiny

# video file with output
python detect_video.py --video path_to_file.mp4 --output ./output.avi

Training

I have created a complete tutorial on how to train from scratch using the VOC2012 Dataset. See the documentation here https://github.com/zzh8829/yolov3-tf2/blob/master/docs/training_voc.md

For customzied training, you need to generate tfrecord following the TensorFlow Object Detection API. For example you can use Microsoft VOTT to generate such dataset. You can also use this script to create the pascal voc dataset.

Example commend line arguments for training

python train.py --batch_size 8 --dataset ~/Data/voc2012.tfrecord --val_dataset ~/Data/voc2012_val.tfrecord --epochs 100 --mode eager_tf --transfer fine_tune

python train.py --batch_size 8 --dataset ~/Data/voc2012.tfrecord --val_dataset ~/Data/voc2012_val.tfrecord --epochs 100 --mode fit --transfer none

python train.py --batch_size 8 --dataset ~/Data/voc2012.tfrecord --val_dataset ~/Data/voc2012_val.tfrecord --epochs 100 --mode fit --transfer no_output

python train.py --batch_size 8 --dataset ~/Data/voc2012.tfrecord --val_dataset ~/Data/voc2012_val.tfrecord --epochs 10 --mode eager_fit --transfer fine_tune --weights ./checkpoints/yolov3-tiny.tf --tiny

Tensorflow Serving

You can export the model to tf serving

python export_tfserving.py --output serving/yolov3/1/
# verify tfserving graph
saved_model_cli show --dir serving/yolov3/1/ --tag_set serve --signature_def serving_default

The inputs are preprocessed images (see dataset.transform_iamges)

outputs are

yolo_nms_0: bounding boxes
yolo_nms_1: scores
yolo_nms_2: classes
yolo_nms_3: numbers of valid detections

Benchmark (No Training Yet)

Numbers are obtained with rough calculations from detect_video.py

Macbook Pro 13 (2.7GHz i5)

Detection 416x416 320x320 608x608
YoloV3 1000ms 500ms 1546ms
YoloV3-Tiny 100ms 58ms 208ms

Desktop PC (GTX 970)

Detection 416x416 320x320 608x608
YoloV3 74ms 57ms 129ms
YoloV3-Tiny 18ms 15ms 28ms

AWS g3.4xlarge (Tesla M60)

Detection 416x416 320x320 608x608
YoloV3 66ms 50ms 123ms
YoloV3-Tiny 15ms 10ms 24ms

RTX 2070 (credit to @AnaRhisT94)

Detection 416x416
YoloV3 predict_on_batch 29-32ms
YoloV3 predict_on_batch + TensorRT 22-28ms

Darknet version of YoloV3 at 416x416 takes 29ms on Titan X. Considering Titan X has about double the benchmark of Tesla M60, Performance-wise this implementation is pretty comparable.

Implementation Details

Eager execution

Great addition for existing TensorFlow experts. Not very easy to use without some intermediate understanding of TensorFlow graphs. It is annoying when you accidentally use incompatible features like tensor.shape[0] or some sort of python control flow that works fine in eager mode, but totally breaks down when you try to compile the model to graph.

model(x) vs. model.predict(x)

When calling model(x) directly, we are executing the graph in eager mode. For model.predict, tf actually compiles the graph on the first run and then execute in graph mode. So if you are only running the model once, model(x) is faster since there is no compilation needed. Otherwise, model.predict or using exported SavedModel graph is much faster (by 2x). For non real-time usage, model.predict_on_batch is even faster as tested by @AnaRhisT94)

GradientTape

Extremely useful for debugging purpose, you can set breakpoints anywhere. You can compile all the keras fitting functionalities with gradient tape using the run_eagerly argument in model.compile. From my limited testing, all training methods including GradientTape, keras.fit, eager or not yeilds similar performance. But graph mode is still preferred since it's a tiny bit more efficient.

@tf.function

@tf.function is very cool. It's like an in-between version of eager and graph. You can step through the function by disabling tf.function and then gain performance when you enable it in production. Important note, you should not pass any non-tensor parameter to @tf.function, it will cause re-compilation on every call. I am not sure whats the best way other than using globals.

absl.py (abseil)

Absolutely amazing. If you don't know already, absl.py is officially used by internal projects at Google. It standardizes application interface for Python and many other languages. After using it within Google, I was so excited to hear abseil going open source. It includes many decades of best practices learned from creating large size scalable applications. I literally have nothing bad to say about it, strongly recommend absl.py to everybody.

Loading pre-trained Darknet weights

very hard with pure functional API because the layer ordering is different in tf.keras and darknet. The clean solution here is creating sub-models in keras. Keras is not able to save nested model in h5 format properly, TF Checkpoint is recommended since its offically supported by TensorFlow.

tf.keras.layers.BatchNormalization

It doesn't work very well for transfer learning. There are many articles and github issues all over the internet. I used a simple hack to make it work nicer on transfer learning with small batches.

What is the output of transform_targets ???

I know it's very confusion but the output is tuple of shape

(
  [N, 13, 13, 3, 6],
  [N, 26, 26, 3, 6],
  [N, 52, 52, 3, 6]
)

where N is the number of labels in batch and the last dimension "6" represents [x, y, w, h, obj, class] of the bounding boxes.

IOU and Score Threshold

the default threshold is 0.5 for both IOU and score, you can adjust them according to your need by setting --yolo_iou_threshold and --yolo_score_threshold flags

Maximum number of boxes

By default there can be maximum 100 bounding boxes per image, if for some reason you would like to have more boxes you can use the --yolo_max_boxes flag.

NAN Loss / Training Failed / Doesn't Converge

Many people including me have succeeded in training, so the code definitely works @LongxingTan in #128 provided some of his insights summarized here:

  1. For nan loss, try to make learning rate smaller
  2. Double check the format of your input data. Data input labelled by vott and labelImg is different. so make sure the input box is the right, and check carefully the format is x1/width,y1/height,x2/width,y2/height and NOT x1,y1,x2,y2, or x,y,w,h

Make sure to visualize your custom dataset using this tool

python tools/visualize_dataset.py --classes=./data/voc2012.names

It will output one random image from your dataset with label to output.jpg Training definitely won't work if the rendered label doesn't look correct

Command Line Args Reference

convert.py:
  --output: path to output
    (default: './checkpoints/yolov3.tf')
  --[no]tiny: yolov3 or yolov3-tiny
    (default: 'false')
  --weights: path to weights file
    (default: './data/yolov3.weights')
  --num_classes: number of classes in the model
    (default: '80')
    (an integer)

detect.py:
  --classes: path to classes file
    (default: './data/coco.names')
  --image: path to input image
    (default: './data/girl.png')
  --output: path to output image
    (default: './output.jpg')
  --[no]tiny: yolov3 or yolov3-tiny
    (default: 'false')
  --weights: path to weights file
    (default: './checkpoints/yolov3.tf')
  --num_classes: number of classes in the model
    (default: '80')
    (an integer)

detect_video.py:
  --classes: path to classes file
    (default: './data/coco.names')
  --video: path to input video (use 0 for cam)
    (default: './data/video.mp4')
  --output: path to output video (remember to set right codec for given format. e.g. XVID for .avi)
    (default: None)
  --output_format: codec used in VideoWriter when saving video to file
    (default: 'XVID)
  --[no]tiny: yolov3 or yolov3-tiny
    (default: 'false')
  --weights: path to weights file
    (default: './checkpoints/yolov3.tf')
  --num_classes: number of classes in the model
    (default: '80')
    (an integer)

train.py:
  --batch_size: batch size
    (default: '8')
    (an integer)
  --classes: path to classes file
    (default: './data/coco.names')
  --dataset: path to dataset
    (default: '')
  --epochs: number of epochs
    (default: '2')
    (an integer)
  --learning_rate: learning rate
    (default: '0.001')
    (a number)
  --mode: <fit|eager_fit|eager_tf>: fit: model.fit, eager_fit: model.fit(run_eagerly=True), eager_tf: custom GradientTape
    (default: 'fit')
  --num_classes: number of classes in the model
    (default: '80')
    (an integer)
  --size: image size
    (default: '416')
    (an integer)
  --[no]tiny: yolov3 or yolov3-tiny
    (default: 'false')
  --transfer: <none|darknet|no_output|frozen|fine_tune>: none: Training from scratch, darknet: Transfer darknet, no_output: Transfer all but output, frozen: Transfer and freeze all,
    fine_tune: Transfer all and freeze darknet only
    (default: 'none')
  --val_dataset: path to validation dataset
    (default: '')
  --weights: path to weights file
    (default: './checkpoints/yolov3.tf')

Change Log

October 1, 2019

  • Updated to Tensorflow to v2.0.0 Release

References

It is pretty much impossible to implement this from the yolov3 paper alone. I had to reference the official (very hard to understand) and many un-official (many minor errors) repos to piece together the complete picture.

yolov3-tf2's People

Contributors

burnpiro avatar cypherix avatar dependabot[bot] avatar edurenye avatar ehsanrahnama avatar friyin avatar jbutle55 avatar jlomax-techshare avatar johntyty912 avatar ktaebum avatar kuz-man avatar makra89 avatar marcoleonhardt avatar maremoto avatar maxinho96 avatar rajan780 avatar t04glovern avatar victor30608 avatar yichenj avatar zzh8829 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

yolov3-tf2's Issues

Issue with detect script

I am having trouble running the detect.py script. When I run it after training and loading weights from that training, I get this error:

Traceback (most recent call last):
File "detect.py", line 65, in
app.run(main)
File "C:\Users\venkav1\AppData\Local\Continuum\anaconda3\envs\tf-n\lib\site-packages\absl\app.py", line 300, in run
_run_main(main, args)
File "C:\Users\venkav1\AppData\Local\Continuum\anaconda3\envs\tf-n\lib\site-packages\absl\app.py", line 251, in _run_main
sys.exit(main(argv))
File "detect.py", line 53, in main
for i in range(nums[0]):
TypeError: 'Tensor' object cannot be interpreted as an integer

When I print out nums, it gives this <tf.Tensor 'yolov3/yolo_nms/combined_non_max_suppression/CombinedNonMaxSuppression:3' shape=(1,) dtype=int32>)

Any idea on how to fix this?

Run in eager mode

I was trying to run detect.py eagerly, but when setting a breakpoint it never stops (used PyCharm).
For example, I tried to set up a breakpoint at the beginning of YoloV3 function. It stops at the definition call:

    if FLAGS.tiny:
        yolo = YoloV3Tiny()
    else:
        yolo = YoloV3()

But do not stops at prediction call:

boxes, scores, classes, nums = yolo(img)

Probably I miss something here..

NaN's when training COCO datset

Getting nan when training COCO dataset. Generated tf records using object detection's create_coco_tf_record script.
From your repo, followed the instructions to download weights and convert them. Ran training with the following command line:
python train.py --batch_size 8 --dataset $(DATA_PATH)/coco_train.record* --val_dataset $(DATA_PATH)/coco_val.record* --epochs 100 --mode eager_tf --transfer fine_tune

This is python3.0, tensorflow 2.0 gpu version.

nan

error while converting weights in new conda env

(yolov3-tf2) C:\Users\roman\ml\yolov3-tf2>python convert.py --weights ./data/yolov3-tiny.weights --output ./checkpoints/yolov3-tiny.tf --tiny
Traceback (most recent call last):
File "convert.py", line 4, in
from yolov3_tf2.models import YoloV3, YoloV3Tiny
File "C:\Users\roman\ml\yolov3-tf2\yolov3_tf2\models.py", line 2, in
import tensorflow as tf
File "C:\Users\roman\AppData\Roaming\Python\Python36\site-packages\tensorflow_init_.py", line 24, in
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "C:\Users\roman\AppData\Roaming\Python\Python36\site-packages\tensorflow\python_init_.py", line 52, in
from tensorflow.core.framework.graph_pb2 import *
File "C:\Users\roman\AppData\Roaming\Python\Python36\site-packages\tensorflow\core\framework\graph_pb2.py", line 6, in
from google.protobuf import descriptor as _descriptor
File "C:\Users\roman\AppData\Roaming\Python\Python36\site-packages\google\protobuf\descriptor.py", line 47, in
from google.protobuf.pyext import _message
ImportError: DLL load failed: Procedure not found

How to train or predict on rectangle image input?

Most of the images in my dataset are rectangle, with width to height ratio of 16:9. In which way should I modify the function of 'transform_targets_for_output' or other, if training and predicting on rectangle images is desired? Thanks.

Can't start training.

I get the following error. I am not sure what I need to do to fix my tf record files.

2019-06-10 21:27:25.351566: W tensorflow/core/framework/op_kernel.cc:1431] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Feature: image/key/sha256 (data type: string) is required but could not be found.
2019-06-10 21:27:25.351650: W tensorflow/core/framework/op_kernel.cc:1431] OP_REQUIRES failed at iterator_ops.cc:988 : Invalid argument: Feature: image/key/sha256 (data type: string) is required but could not be found.
	 [[{{node ParseSingleExample/ParseSingleExample}}]]
Traceback (most recent call last):
  File "train.py", line 178, in <module>
    app.run(main)
  File "/home/shaun/tf2/env/lib/python3.6/site-packages/absl/app.py", line 300, in run
2019-06-10 21:27:25.351867: W tensorflow/core/framework/op_kernel.cc:1431] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Feature: image/key/sha256 (data type: string) is required but could not be found.
    _run_main(main, args)
  File "/home/shaun/tf2/env/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "train.py", line 173, in main
    validation_data=val_dataset)
  File "/home/shaun/tf2/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 791, in fit
    initial_epoch=initial_epoch)
  File "/home/shaun/tf2/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1515, in fit_generator
    steps_name='steps_per_epoch')
  File "/home/shaun/tf2/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py", line 213, in model_iteration
    batch_data = _get_next_batch(generator, mode)
  File "/home/shaun/tf2/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py", line 355, in _get_next_batch
    generator_output = next(generator)
  File "/home/shaun/tf2/env/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 556, in __next__
    return self.next()
  File "/home/shaun/tf2/env/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 585, in next
    return self._next_internal()
  File "/home/shaun/tf2/env/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 577, in _next_internal
    output_shapes=self._flat_output_shapes)
  File "/home/shaun/tf2/env/lib/python3.6/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 1954, in iterator_get_next_sync
    _six.raise_from(_core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Feature: image/key/sha256 (data type: string) is required but could not be found.
	 [[{{node ParseSingleExample/ParseSingleExample}}]] [Op:IteratorGetNextSync]

Training: invalid value encountered in less + nan's

python train.py --batch_size 8 --dataset=C:\...\platt.record --val_dataset=C:\...\platt_val.record --epochs 10 --mode eager_fit --transfer fine_tune --weights ./checkpoints/yolov3-tiny.tf --tiny

results in this output:

Epoch 1/10
2019-06-20 02:13:00.680170: I tensorflow/core/profiler/lib/profiler_session.cc:164] Profile Session started.
2019-06-20 02:13:00.685371: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library cupti64_100.dll
      1/Unknown - 4s 4s/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nanW0620 02:13:01.387073  9828 callbacks.py:236] Method (on_train_batch_end) is slow compared to the batch update (0.256449). Check your callbacks.
      7/Unknown - 6s 807ms/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nanC:\...\Anaconda3\envs\yolov3-tf2\lib\site-packages\tensorflow\python\keras\callbacks.py:1467: RuntimeWarning: invalid value encountered in less
  self.monitor_op = lambda a, b: np.less(a, b - self.min_delta)
C:\...\Anaconda3\envs\yolov3-tf2\lib\site-packages\tensorflow\python\keras\callbacks.py:979: RuntimeWarning: invalid value encountered in less
  if self.monitor_op(current - self.min_delta, self.best):

Epoch 00001: saving model to checkpoints/yolov3_train_1.tf
7/7 [==============================] - 7s 1s/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - val_loss: nan - val_yolo_output_0_loss: nan - val_yolo_output_1_loss: nan
Epoch 2/10
6/7 [========================>.....] - ETA: 0s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan
Epoch 00002: saving model to checkpoints/yolov3_train_2.tf
7/7 [==============================] - 3s 394ms/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - val_loss: nan - val_yolo_output_0_loss: nan - val_yolo_output_1_loss: nan
Epoch 3/10
6/7 [========================>.....] - ETA: 0s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan
Epoch 00003: saving model to checkpoints/yolov3_train_3.tf
7/7 [==============================] - 3s 396ms/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - val_loss: nan - val_yolo_output_0_loss: nan - val_yolo_output_1_loss: nan
Epoch 00003: early stopping

What might be the cause for that? Also there are other open issues regarding training and I'm wondering if anyone was successfull.

Potential Bug in Ground True Box Encoding

Hi,
You may have a bug in dataset.py line 36:
Instead of the line: idx, [box[0], box[1], box[2], box[3], 1, y_true[i][j][4]])
I think that it must be
[box[0], box[1], box[2]-box[0], box[3]-box[1], 1, y_true[i][j][4]])

Can you please confirm?

Thanks

Error training custom object detection

Traceback (most recent call last):
  File "train.py", line 175, in <module>
    app.run(main)
  File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "train.py", line 49, in main
    model = YoloV3(FLAGS.size, training=True)
  File "/github.com/zzh8829/yolov3-tf2/yolov3_tf2/models.py", line 210, in YoloV3
    x = YoloConv(128, name='yolo_conv_2')((x, x_36))
  File "/github.com/zzh8829/yolov3-tf2/yolov3_tf2/models.py", line 103, in yolo_conv
    x = Concatenate()([x, x_skip])
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 594, in __call__
    self._maybe_build(inputs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1713, in _maybe_build
    self.build(input_shapes)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/utils/tf_utils.py", line 290, in wrapper
    output_shape = fn(instance, input_shape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/layers/merge.py", line 392, in build
    'Got inputs shapes: %s' % (input_shape))
ValueError: A `Concatenate` layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 36, 36, 128), (None, 37, 37, 256)]

trouble with `import cv2` when running the demo scripts

When running the commands in your readme, I get an error when I call python convert.py:

$ python convert.py
Traceback (most recent call last):
  File "convert.py", line 4, in <module>
    from yolov3_tf2.models import YoloV3, YoloV3Tiny
  File "/current/working/directory/models.py", line 21, in <module>
    from .utils import broadcast_iou
  File "/current/working/directory/utils.py", line 4, in <module>
    import cv2
ImportError: /usr/lib/x86_64-linux-gnu/libcairo.so.2: undefined symbol: FT_Get_Var_Design_Coordinates

It also seems strange to me that the symbol actually does appear in the libcairo file:

$ grep FT_Get_Var_Design_Coordinates /usr/lib/x86_64-linux-gnu/libcairo.so.2
Binary file /usr/lib/x86_64-linux-gnu/libcairo.so.2 matches

The error does not originate in your yolov3-tf2 code, but it might be related to the dependencies. Could you please check the versions of the dependency packages you have installed? It might help for me to downgrade some of them.

I am working on Ubuntu 18.04.2 LTS, with python 3.6.0, pip3 version 9.0.1 and conda 4.6.11

Edit: I should add that I am testing this on a computer without a GPU before migrating to one with a GPU. In the meantime, I have substituted the python package tensorflow-gpu-2.0.0a0 for tensorflow-2.0.0a0

Edit: after some additional investigation, I suspect I have somehow messed up a combination of things installed with apt, pip3, python3 -m pip and conda. It might be helpful if you could share the output of your python3 -m pip freeze and conda list for a working installation. Then I can compare it with my system.

Error when initializing model using TF1.12

I just wanna try your Keras version model in TF1.12 and added two lines into 'yolov3_tf2/models.py'

if __name__ == '__main__':
    model = YoloV3(training=True, size=418)

Things go well in TF2.0, however, when I run it in TF1.12, the following error occurred:

I just wanna try your Keras version model in TF1.12 and added two lines into 'yolov3_tf2/models.py'

if __name__ == '__main__':
    model = YoloV3(training=True, size=418)

However, the following error occurred:

Traceback (most recent call last):
File "/Users/xxx/Code/GitOA/xxx/YoloV3/src/yolov3_tf2/models.py", line 312, in
model = YoloV3(training=True, size=418)
File "/Users/xxx/Code/GitOA/xxx/YoloV3/src/yolov3_tf2/models.py", line 207, in YoloV3
x = YoloConv(256, name='yolo_conv_1')((x, x_61))
File "/Users/xxx/Code/GitOA/xxx/YoloV3/src/yolov3_tf2/models.py", line 112, in yolo_conv
return Model(inputs, x, name=name)(x_in)
File "/Users/xxx/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in call
outputs = self.call(inputs, *args, **kwargs)
File "/Users/xxx/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 815, in call
mask=masks)
File "/Users/xxx/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 1002, in _run_internal_graph
output_tensors = layer.call(computed_tensor, **kwargs)
File "/Users/xxx/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/layers/convolutional.py", line 194, in call
outputs = self._convolution_op(inputs, self.kernel)
File "/Users/xxx/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 966, in call
return self.conv_op(inp, filter)
File "/Users/xxx/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 591, in call
return self.call(inp, filter)
File "/Users/xxx/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 208, in call
name=self.name)
File "/Users/xxx/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1026, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/Users/xxx/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 529, in _apply_op_helper
(input_name, err))
ValueError: Tried to convert 'input' to a tensor and failed. Error: Dimension 1 in both shapes must be equal, but are 13 and 26. Shapes are [?,13,13,512] and [?,26,26,512].
From merging shape 0 with other shapes. for 'yolo_conv_1/conv2d_59/Conv2D/packed' (op: 'Pack') with input shapes: [?,13,13,512], [?,26,26,512].

I think it's because of the version of Tensorflow but I'm not sure which specific part caused this, does anybody knows?

Can not detected any objects

Hi, I used training this model on coco, loss is not converge, I saw you have success trained on VOC, so I try VOC.

The loss is not as big as coco, after training like this:

1906/1906 [==============================] - 481s 252ms/step - loss: 33.4387 - yolo_output_0_loss: 12.5275 - yolo_output_1_loss: 10.1783 - yolo_output_2_loss: 5.0296 - val_loss: 48.1587 - val_yolo_output_0_loss: 6.8133 - val_yolo_output_1_loss: 35.6657 - val_yolo_output_2_loss: 1.8716
Epoch 3/100
1905/1906 [============================>.] - ETA: 0s - loss: 31.7364 - yolo_output_0_loss: 12.3622 - yolo_output_1_loss: 10.4751 - yolo_output_2_loss: 4.9999      
Epoch 00003: saving model to checkpoints/yolov3_voc-3.tf
1906/1906 [==============================] - 633s 332ms/step - loss: 31.7288 - yolo_output_0_loss: 12.3598 - yolo_output_1_loss: 10.4729 - yolo_output_2_loss: 4.9976 - val_loss: 51.0017 - val_yolo_output_0_loss: 10.2503 - val_yolo_output_1_loss: 37.1377 - val_yolo_output_2_loss: 1.0919
Epoch 4/100
 205/1906 [==>...........................] - ETA: 7:58 - loss: 31.8537 - yolo_output_0_loss: 12.5202 - yolo_output_1_loss: 11.6964 - yolo_output_2_loss: 5.5369^CTraceback (most recent call last):

I run detection, but got nothing result:

I0730 14:28:41.549190 139918602266368 demo_voc.py:31] weights loaded from ./checkpoints/yolov3_voc-3
I0730 14:28:42.425215 139918602266368 demo_voc.py:41] time: 0.8550496101379395
I0730 14:28:42.425319 139918602266368 demo_voc.py:43] detections:
box num:  tf.Tensor(0, shape=(), dtype=int32)

Do you have any idea for why?

Transfer Learning for custom class number

How can we do any sort of transfer learning on our own dataset with number of classes other than 80? In my case training from scratch doesn't give that great result. Transfer darknet seems to transfer even the yolo layer where classes have been taken into consideration.

Invalid argument: Expected image (JPEG, PNG, or GIF)-Error while training on custom dataset

Hi there,
I am currently trying to train on my custom dataset. I have created a .tfrecord-file which looks reasonable to me. However, when I run train.py the following error message occurs directly after having printed Epoch 1/100:

2019-05-20 22:19:39.735116: W tensorflow/core/framework/op_kernel.cc:1431] OP_REQUIRES failed at iterator_ops.cc:988 : Invalid argument: Expected image (JPEG, PNG, or GIF), got unknown format starting with '/9j/4AAQSkZJRgAB' [[{{node DecodeJpeg}}]] Traceback (most recent call last): File "C:/Users/Marcel/.../yolov3-tf2/train.py", line 184, in <module> app.run(main) File "C:\Users\Marcel\AppData\Local\Programs\Python\Python36\lib\site-packages\absl\app.py", line 300, in run _run_main(main, args) File "C:\Users\Marcel\AppData\Local\Programs\Python\Python36\lib\site-packages\absl\app.py", line 251, in _run_main sys.exit(main(argv)) File "C:/Users/Marcel/Desktop/Uni/6.Semester/Projektarbeit/YOLO/asia_repo/yolov3-tf2/train.py", line 176, in main validation_data=val_dataset) File "C:\Users\Marcel\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\keras\engine\training.py", line 791, in fit initial_epoch=initial_epoch) File "C:\Users\Marcel\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1515, in fit_generator steps_name='steps_per_epoch') File "C:\Users\Marcel\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\keras\engine\training_generator.py", line 213, in model_iteration batch_data = _get_next_batch(generator, mode) File "C:\Users\Marcel\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\keras\engine\training_generator.py", line 355, in _get_next_batch generator_output = next(generator) File "C:\Users\Marcel\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 556, in __next__ return self.next() File "C:\Users\Marcel\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 585, in next return self._next_internal() File "C:\Users\Marcel\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 577, in _next_internal output_shapes=self._flat_output_shapes) File "C:\Users\Marcel\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\gen_dataset_ops.py", line 1983, in iterator_get_next_sync _six.raise_from(_core._status_to_exception(e.code, message), None) File "<string>", line 3, in raise_from tensorflow.python.framework.errors_impl.InvalidArgumentError: Expected image (JPEG, PNG, or GIF), got unknown format starting with '/9j/4AAQSkZJRgAB' [[{{node DecodeJpeg}}]] [Op:IteratorGetNextSync]

Comparing the given "start of the unknown data" (/9j/4AAQSkZJRgAB) with the .tfrecord-file it becomes clear, that it is the start of the encoded image:
features { feature { key: "image/encoded" value { bytes_list { value: "/9j/4AAQSkZJRgABAQAAAQABAAD..." } } ... }

So I think my .tfrecord-file is not the problem in this case but rather I am somewhere missing a decoding of the encoded image. I also already checked if my files are in some way corrupted, but I am pretty sure that they are fine. Used google and stackoverflow but these did not reveal the answer to my problem neither. Thus, I am stuck and cannot think of another reason for this error.
Did anyone else experience the same problem and can help me find the source of this error?

Cannot convert custom darknet model

I'm trying to convert my darknet weights to tensorflow weights using the command
python convert.py --weights /path/to/weights --output ./checkpoints/yolo-obj.tf

And what I get is this error message:

File "convert.py", line 33, in <module>
    app.run(main)
  File "/home/raulberari/.conda/envs/yolov3-tf2/lib/python3.6/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/home/raulberari/.conda/envs/yolov3-tf2/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "convert.py", line 20, in main
    load_darknet_weights(yolo, FLAGS.weights, FLAGS.tiny)
  File "/home/raulberari/yolov3-tf2/yolov3_tf2/utils.py", line 66, in load_darknet_weights
    conv_shape).transpose([2, 3, 1, 0])
ValueError: cannot reshape array of size 42732 into shape (256,128,3,3)

This happens after
I0801 10:46:50.183817 139702532433664 utils.py:45] yolo_output_2/conv2d_73 bn

Does anyone have an explanation for this? I'm running this in the given env, yolov3-tf2 on an Ubuntu machine.

detect.py ERROR

python detect.py --weights ./checkpoints/yolov3-tiny.tf --tiny --image ./data/girl.png

W0706 03:01:45.838256 139645809293184 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
2019-07-06 03:01:47.077830: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2019-07-06 03:01:47.078112: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x16d4a00 executing computations on platform Host. Devices:
2019-07-06 03:01:47.078146: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): ,
2019-07-06 03:01:47.158210: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
I0706 03:01:47.481413 139645809293184 detect.py:29] weights loaded
I0706 03:01:47.481828 139645809293184 detect.py:32] classes loaded
I0706 03:01:47.794163 139645809293184 detect.py:41] time: 0.30501627922058105
I0706 03:01:47.794389 139645809293184 detect.py:43] detections:
Traceback (most recent call last):
File "detect.py", line 56, in
app.run(main)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "detect.py", line 44, in main
for i in range(nums[0]):
TypeError: 'Tensor' object cannot be interpreted as an integer

Loss is decrease slowly

image

As you may notice, the loss seems unnormal after several epochs. it seems total loss does not converge.

How to obtain multiple confidences for each bbox?

Thanks for great work!
Can I obtain multiple (e.g., top-10) outputs for each bboxt?
Current implementation returns only the class with the highest probability (e.g., dog 0.8 coordinates), but I wonder if I can obtain the results like:
dog 0.8 coordinatesA
cat 0.1 coordinatesA
dog 0.5 coordinatesB
cat 0.3 coordinatesB
horse 0.1 coordinatesB
...

Covert txt to tfrecord, train miss Error.

I have a train.txt, like this:

# imagepath xmin,ymin,xmax,ymax,label ...
path/to/img1.jpg 50,100,150,200,0 30,50,200,120,3
path/to/img2.jpg 120,300,250,600,2
...

Then I covert this txt file to tfreord with this code:

import os
import random
import tensorflow as tf

train_txt = './train.txt'
images_path = './JPEGImages'

def get_example(line):
    class_text = []
    xmin = []
    ymin = []
    xmax = []
    ymax = []

    line = line.split(' ')
    # ่ฏปๅ–ๅ›พ็‰‡
    image_path = line[0]
    with tf.io.gfile.GFile(image_path, 'rb') as fib:
        image_encoded = fib.read()

    # ่ฏปๅ–ๅๆ ‡ๅŠ็ฑปๅˆซ
    for item in line[1:]:
        item = item.split(',')
        xmin.append(float(item[0]))
        ymin.append(float(item[1]))
        xmax.append(float(item[2]))
        ymax.append(float(item[3]))
        if item[4] == 0:
            class_text.append("0".encode('utf8'))
        else:
            class_text.append("1".encode('utf8'))

    example = tf.train.Example(features=tf.train.Features(feature={
        'image/encoded': tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_encoded])),
        'image/object/bbox/xmin': tf.train.Feature(float_list=tf.train.FloatList(value=xmin)),
        'image/object/bbox/xmax': tf.train.Feature(float_list=tf.train.FloatList(value=xmax)),
        'image/object/bbox/ymin': tf.train.Feature(float_list=tf.train.FloatList(value=ymin)),
        'image/object/bbox/ymax': tf.train.Feature(float_list=tf.train.FloatList(value=ymax)),
        'image/object/class/text': tf.train.Feature(bytes_list=tf.train.BytesList(value=class_text))
    }))

    return example

train_writer = tf.io.TFRecordWriter('./train.tfrecord')
val_writer = tf.io.TFRecordWriter('./val.tfrecord')
with open(train_txt, 'r') as f:
    lines = f.read().split('\n')
    random.shuffle(lines)
    # ่ฎญ็ปƒๆ•ฐๆฎ
    for line in lines[:5000]:
        if len(line) > 0:
            example = get_example(line)
            train_writer.write(example.SerializeToString())
    # valๆ•ฐๆฎ
    for line in lines[5000:]:
        if len(line) > 0:
            example = get_example(line)
            val_writer.write(example.SerializeToString())
train_writer.close()
val_writer.close()
print("finish!")

and in yolov3_tf2/dataset.py, I have revised a little(line 79 ~ 97)๏ผš

# https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md#conversion-script-outline-conversion-script-outline
IMAGE_FEATURE_MAP = {
    # 'image/width': tf.io.FixedLenFeature([], tf.int64),
    # 'image/height': tf.io.FixedLenFeature([], tf.int64),
    # 'image/filename': tf.io.FixedLenFeature([], tf.string),
    # 'image/source_id': tf.io.FixedLenFeature([], tf.string),
    # 'image/key/sha256': tf.io.FixedLenFeature([], tf.string),
    'image/encoded': tf.io.FixedLenFeature([], tf.string),
    # 'image/format': tf.io.FixedLenFeature([], tf.string),
    'image/object/bbox/xmin': tf.io.VarLenFeature(tf.float32),
    'image/object/bbox/ymin': tf.io.VarLenFeature(tf.float32),
    'image/object/bbox/xmax': tf.io.VarLenFeature(tf.float32),
    'image/object/bbox/ymax': tf.io.VarLenFeature(tf.float32),
    'image/object/class/text': tf.io.VarLenFeature(tf.string),
    # 'image/object/class/label': tf.io.VarLenFeature(tf.int64),
    # 'image/object/difficult': tf.io.VarLenFeature(tf.int64),
    # 'image/object/truncated': tf.io.VarLenFeature(tf.int64),
    # 'image/object/view': tf.io.VarLenFeature(tf.string),
}

when I begin to train, I miss this error:

2019-07-26 09:28:12.864438: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7134 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-07-26 09:28:27.838122: W tensorflow/core/framework/op_kernel.cc:1546] OP_REQUIRES failed at iterator_ops.cc:1055 : Invalid argument: Paddings must be non-negative: 0 -16
         [[{{node Pad}}]]
Traceback (most recent call last):
  File "train.py", line 177, in <module>
    app.run(main)
  File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "train.py", line 116, in main
    for batch, (images, labels) in enumerate(train_dataset):
  File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 586, in __next__
    return self.next()
  File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 623, in next
    return self._next_internal()
  File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 615, in _next_internal
    output_shapes=self._flat_output_shapes)
  File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2120, in iterator_get_next_sync
    _six.raise_from(_core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Paddings must be non-negative: 0 -16
         [[{{node Pad}}]] [Op:IteratorGetNextSync]
Exception ignored in: <bound method _CheckpointRestoreCoordinator.__del__ of <tensorflow.python.training.tracking.util._CheckpointRestoreCoordinator object at 0x7f0c8052abe0>>
Traceback (most recent call last):
  File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/tensorflow/python/training/tracking/util.py", line 244, in __del__
  File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/tensorflow/python/training/tracking/util.py", line 93, in node_names
  File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/tensorflow/python/training/tracking/object_identity.py", line 76, in __getitem__
KeyError: (<tensorflow.python.training.tracking.object_identity._ObjectIdentityWrapper object at 0x7f0c8049a048>,)

Who can help me ? thanks๏ผ

Cannot convert to tflite, problem with lambda

Hi, I want to convert weights to tflite using tflite_convert --keras_model_file=yolov3-tiny.h5 --output_file=yolov3-tiny.tflite
It fails with the following traceback:

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0513 14:27:21.003883 140717950494528 deprecation.py:506] From /usr/local/lib/python3.7/site-packages/tensorflow/python/ops/init_ops.py:97: calling Zeros.init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0513 14:27:21.004970 140717950494528 deprecation.py:506] From /usr/local/lib/python3.7/site-packages/tensorflow/python/ops/init_ops.py:97: calling Ones.init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py:820: UserWarning: yolov3_tf2.models is not loaded, but a Lambda layer uses it. It may cause errors.
, UserWarning)
Traceback (most recent call last):
File "/usr/local/bin/tflite_convert", line 11, in
sys.exit(main())
File "/usr/local/lib/python3.7/site-packages/tensorflow/lite/python/tflite_convert.py", line 448, in main
app.run(main=run_main, argv=sys.argv[:1])
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.7/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/usr/local/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "/usr/local/lib/python3.7/site-packages/tensorflow/lite/python/tflite_convert.py", line 444, in run_main
_convert_model(tflite_flags)
File "/usr/local/lib/python3.7/site-packages/tensorflow/lite/python/tflite_convert.py", line 123, in _convert_model
converter = _get_toco_converter(flags)
File "/usr/local/lib/python3.7/site-packages/tensorflow/lite/python/tflite_convert.py", line 110, in _get_toco_converter
return converter_fn(**converter_kwargs)
File "/usr/local/lib/python3.7/site-packages/tensorflow/lite/python/lite.py", line 627, in from_keras_model_file
keras_model = _keras.models.load_model(model_file)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 215, in load_model
custom_objects=custom_objects)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/saving/model_config.py", line 55, in model_from_config
return deserialize(config, custom_objects=custom_objects)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/layers/serialization.py", line 95, in deserialize
printable_module_name='layer')
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 192, in deserialize_keras_object
list(custom_objects.items())))
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1231, in from_config
process_layer(layer_data)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1215, in process_layer
layer = deserialize_layer(layer_data, custom_objects=custom_objects)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/layers/serialization.py", line 95, in deserialize
printable_module_name='layer')
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 192, in deserialize_keras_object
list(custom_objects.items())))
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1241, in from_config
process_node(layer, node_data)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1197, in process_node
layer(flat_input_tensors[0], **kwargs)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 612, in call
outputs = self.call(inputs, *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py", line 768, in call
return self.function(inputs, **arguments)
File "/home/mba/GitHub/yolov3-tf2/yolov3_tf2/models.py", line 139, in
x = Lambda(lambda x: import tensorflow as tf; tf.reshape(x, (-1, tf.shape(x)[1], tf.shape(x)[2], anchors, classes + 5)))(x)
NameError: name 'tf' is not defined

I found similar problem here: https://stackoverflow.com/questions/54347963/tf-is-not-defined-on-load-model-using-lambda

How could I solve it?

Valid loss explode after several epochs

image
here is the picture of my training procedure, after several epochs' of decreasing, the valid loss suddenly exploded, did it happened to you when you were training?

Perform non maximum suppression for classes separately (request to extend code)

Right now the non maximum suppression is performed for all the bounding boxes for all the classes.

def yolo_nms(outputs, anchors, masks, classes):

Would it be possible to add a flag that allows you to carry out the non maximum class separately for different classes?

I have a use case where a large object can sometimes have a smaller object attached to it. The problem is that the bounding box of the smaller object (if it is present) is always suppressed by the bounding object of the larger object.

I have found a hacky solution that works for me, but I think that it would be useful to a have a general solution.

Python convert.py error

I want to know the version of cudnn, I tried cudnn7.5.1, but he doesn't work.

~/yolov3-tf2$ python convert.py
/home/dhh/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
2019-07-09 17:46:52.049771: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-09 17:46:52.054104: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-07-09 17:46:52.116723: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1009] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-09 17:46:52.117718: I tensorflow/compiler/xla/service/service.cc:162] XLA service 0x564ac0e64a40 executing computations on platform CUDA. Devices:
2019-07-09 17:46:52.117733: I tensorflow/compiler/xla/service/service.cc:169] StreamExecutor device (0): GeForce GTX 1050 Ti, Compute Capability 6.1
2019-07-09 17:46:52.119343: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2019-07-09 17:46:52.119700: I tensorflow/compiler/xla/service/service.cc:162] XLA service 0x564ac0ed0e50 executing computations on platform Host. Devices:
2019-07-09 17:46:52.119714: I tensorflow/compiler/xla/service/service.cc:169] StreamExecutor device (0): ,
2019-07-09 17:46:52.120057: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1467] Found device 0 with properties:
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.392
pciBusID: 0000:01:00.0
totalMemory: 3.94GiB freeMemory: 3.67GiB
2019-07-09 17:46:52.120088: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1546] Adding visible gpu devices: 0
2019-07-09 17:46:52.120143: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-09 17:46:52.121039: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1015] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-09 17:46:52.121049: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0
2019-07-09 17:46:52.121069: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1034] 0: N
2019-07-09 17:46:52.121172: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1149] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3462 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
Model: "yolov3"


Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) [(None, None, None, 0


yolo_darknet (Model) ((None, None, None, 40620640 input_1[0][0]


yolo_conv_0 (Model) (None, None, None, 5 11024384 yolo_darknet[1][2]


yolo_conv_1 (Model) (None, None, None, 2 2957312 yolo_conv_0[1][0]
yolo_darknet[1][1]


yolo_conv_2 (Model) (None, None, None, 1 741376 yolo_conv_1[1][0]
yolo_darknet[1][0]


yolo_output_0 (Model) (None, None, None, 3 4984063 yolo_conv_0[1][0]


yolo_output_1 (Model) (None, None, None, 3 1312511 yolo_conv_1[1][0]


yolo_output_2 (Model) (None, None, None, 3 361471 yolo_conv_2[1][0]


yolo_boxes_0 (Lambda) ((None, None, None, 0 yolo_output_0[1][0]


yolo_boxes_1 (Lambda) ((None, None, None, 0 yolo_output_1[1][0]


yolo_boxes_2 (Lambda) ((None, None, None, 0 yolo_output_2[1][0]


yolo_nms (Lambda) ((None, 100, 4), (No 0 yolo_boxes_0[0][0]
yolo_boxes_0[0][1]
yolo_boxes_0[0][2]
yolo_boxes_1[0][0]
yolo_boxes_1[0][1]
yolo_boxes_1[0][2]
yolo_boxes_2[0][0]
yolo_boxes_2[0][1]
yolo_boxes_2[0][2]

Total params: 62,001,757
Trainable params: 61,949,149
Non-trainable params: 52,608


I0709 17:46:57.439600 139777120954112 convert.py:18] model created
I0709 17:46:57.441039 139777120954112 utils.py:45] yolo_darknet/conv2d bn
I0709 17:46:57.443896 139777120954112 utils.py:45] yolo_darknet/conv2d_1 bn
I0709 17:46:57.446532 139777120954112 utils.py:45] yolo_darknet/conv2d_2 bn
I0709 17:46:57.448883 139777120954112 utils.py:45] yolo_darknet/conv2d_3 bn
I0709 17:46:57.451326 139777120954112 utils.py:45] yolo_darknet/conv2d_4 bn
I0709 17:46:57.454463 139777120954112 utils.py:45] yolo_darknet/conv2d_5 bn
I0709 17:46:57.456880 139777120954112 utils.py:45] yolo_darknet/conv2d_6 bn
I0709 17:46:57.459455 139777120954112 utils.py:45] yolo_darknet/conv2d_7 bn
I0709 17:46:57.461632 139777120954112 utils.py:45] yolo_darknet/conv2d_8 bn
I0709 17:46:57.464024 139777120954112 utils.py:45] yolo_darknet/conv2d_9 bn
I0709 17:46:57.468281 139777120954112 utils.py:45] yolo_darknet/conv2d_10 bn
I0709 17:46:57.470547 139777120954112 utils.py:45] yolo_darknet/conv2d_11 bn
I0709 17:46:57.473851 139777120954112 utils.py:45] yolo_darknet/conv2d_12 bn
I0709 17:46:57.476211 139777120954112 utils.py:45] yolo_darknet/conv2d_13 bn
I0709 17:46:57.479484 139777120954112 utils.py:45] yolo_darknet/conv2d_14 bn
I0709 17:46:57.481873 139777120954112 utils.py:45] yolo_darknet/conv2d_15 bn
I0709 17:46:57.485709 139777120954112 utils.py:45] yolo_darknet/conv2d_16 bn
I0709 17:46:57.488569 139777120954112 utils.py:45] yolo_darknet/conv2d_17 bn
I0709 17:46:57.492237 139777120954112 utils.py:45] yolo_darknet/conv2d_18 bn
I0709 17:46:57.494879 139777120954112 utils.py:45] yolo_darknet/conv2d_19 bn
I0709 17:46:57.498184 139777120954112 utils.py:45] yolo_darknet/conv2d_20 bn
I0709 17:46:57.500384 139777120954112 utils.py:45] yolo_darknet/conv2d_21 bn
I0709 17:46:57.503392 139777120954112 utils.py:45] yolo_darknet/conv2d_22 bn
I0709 17:46:57.505593 139777120954112 utils.py:45] yolo_darknet/conv2d_23 bn
I0709 17:46:57.508599 139777120954112 utils.py:45] yolo_darknet/conv2d_24 bn
I0709 17:46:57.510802 139777120954112 utils.py:45] yolo_darknet/conv2d_25 bn
I0709 17:46:57.513751 139777120954112 utils.py:45] yolo_darknet/conv2d_26 bn
I0709 17:46:57.525273 139777120954112 utils.py:45] yolo_darknet/conv2d_27 bn
I0709 17:46:57.528098 139777120954112 utils.py:45] yolo_darknet/conv2d_28 bn
I0709 17:46:57.534902 139777120954112 utils.py:45] yolo_darknet/conv2d_29 bn
I0709 17:46:57.538571 139777120954112 utils.py:45] yolo_darknet/conv2d_30 bn
I0709 17:46:57.550390 139777120954112 utils.py:45] yolo_darknet/conv2d_31 bn
I0709 17:46:57.554516 139777120954112 utils.py:45] yolo_darknet/conv2d_32 bn
I0709 17:46:57.565870 139777120954112 utils.py:45] yolo_darknet/conv2d_33 bn
I0709 17:46:57.569745 139777120954112 utils.py:45] yolo_darknet/conv2d_34 bn
I0709 17:46:57.581679 139777120954112 utils.py:45] yolo_darknet/conv2d_35 bn
I0709 17:46:57.585679 139777120954112 utils.py:45] yolo_darknet/conv2d_36 bn
I0709 17:46:57.597076 139777120954112 utils.py:45] yolo_darknet/conv2d_37 bn
I0709 17:46:57.601133 139777120954112 utils.py:45] yolo_darknet/conv2d_38 bn
I0709 17:46:57.612637 139777120954112 utils.py:45] yolo_darknet/conv2d_39 bn
I0709 17:46:57.616534 139777120954112 utils.py:45] yolo_darknet/conv2d_40 bn
I0709 17:46:57.627830 139777120954112 utils.py:45] yolo_darknet/conv2d_41 bn
I0709 17:46:57.631635 139777120954112 utils.py:45] yolo_darknet/conv2d_42 bn
I0709 17:46:57.642864 139777120954112 utils.py:45] yolo_darknet/conv2d_43 bn
I0709 17:46:57.699196 139777120954112 utils.py:45] yolo_darknet/conv2d_44 bn
I0709 17:46:57.705252 139777120954112 utils.py:45] yolo_darknet/conv2d_45 bn
I0709 17:46:57.757189 139777120954112 utils.py:45] yolo_darknet/conv2d_46 bn
I0709 17:46:57.761758 139777120954112 utils.py:45] yolo_darknet/conv2d_47 bn
I0709 17:46:57.804775 139777120954112 utils.py:45] yolo_darknet/conv2d_48 bn
I0709 17:46:57.809182 139777120954112 utils.py:45] yolo_darknet/conv2d_49 bn
I0709 17:46:57.859200 139777120954112 utils.py:45] yolo_darknet/conv2d_50 bn
I0709 17:46:57.863812 139777120954112 utils.py:45] yolo_darknet/conv2d_51 bn
I0709 17:46:57.906277 139777120954112 utils.py:45] yolo_conv_0/conv2d_52 bn
I0709 17:46:57.909931 139777120954112 utils.py:45] yolo_conv_0/conv2d_53 bn
I0709 17:46:57.959341 139777120954112 utils.py:45] yolo_conv_0/conv2d_54 bn
I0709 17:46:57.963100 139777120954112 utils.py:45] yolo_conv_0/conv2d_55 bn
I0709 17:46:58.012643 139777120954112 utils.py:45] yolo_conv_0/conv2d_56 bn
I0709 17:46:58.016298 139777120954112 utils.py:45] yolo_output_0/conv2d_57 bn
I0709 17:46:58.065901 139777120954112 utils.py:45] yolo_output_0/conv2d_58 bias
I0709 17:46:58.067934 139777120954112 utils.py:45] yolo_conv_1/conv2d_59 bn
I0709 17:46:58.070146 139777120954112 utils.py:45] yolo_conv_1/conv2d_60 bn
I0709 17:46:58.072342 139777120954112 utils.py:45] yolo_conv_1/conv2d_61 bn
I0709 17:46:58.077865 139777120954112 utils.py:45] yolo_conv_1/conv2d_62 bn
I0709 17:46:58.080121 139777120954112 utils.py:45] yolo_conv_1/conv2d_63 bn
I0709 17:46:58.086580 139777120954112 utils.py:45] yolo_conv_1/conv2d_64 bn
I0709 17:46:58.088860 139777120954112 utils.py:45] yolo_output_1/conv2d_65 bn
I0709 17:46:58.094257 139777120954112 utils.py:45] yolo_output_1/conv2d_66 bias
I0709 17:46:58.095541 139777120954112 utils.py:45] yolo_conv_2/conv2d_67 bn
I0709 17:46:58.097150 139777120954112 utils.py:45] yolo_conv_2/conv2d_68 bn
I0709 17:46:58.098598 139777120954112 utils.py:45] yolo_conv_2/conv2d_69 bn
I0709 17:46:58.100960 139777120954112 utils.py:45] yolo_conv_2/conv2d_70 bn
I0709 17:46:58.102352 139777120954112 utils.py:45] yolo_conv_2/conv2d_71 bn
I0709 17:46:58.104810 139777120954112 utils.py:45] yolo_conv_2/conv2d_72 bn
I0709 17:46:58.106407 139777120954112 utils.py:45] yolo_output_2/conv2d_73 bn
I0709 17:46:58.108677 139777120954112 utils.py:45] yolo_output_2/conv2d_74 bias
I0709 17:46:58.109469 139777120954112 convert.py:21] weights loaded
2019-07-09 17:46:58.121147: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-07-09 17:46:58.759909: E tensorflow/stream_executor/cuda/cuda_dnn.cc:328] Loaded runtime CuDNN library: 7.3.1 but source was compiled with: 7.4.2. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
2019-07-09 17:46:58.761417: E tensorflow/stream_executor/cuda/cuda_dnn.cc:328] Loaded runtime CuDNN library: 7.3.1 but source was compiled with: 7.4.2. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
Traceback (most recent call last):
File "convert.py", line 33, in
app.run(main)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "convert.py", line 24, in main
output = yolo(img)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 660, in call
outputs = self.call(inputs, *args, **kwargs)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 870, in call
return self._run_internal_graph(inputs, training=training, mask=mask)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 1011, in _run_internal_graph
output_tensors = layer(computed_tensors, **kwargs)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 660, in call
outputs = self.call(inputs, *args, **kwargs)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 870, in call
return self._run_internal_graph(inputs, training=training, mask=mask)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 1011, in _run_internal_graph
output_tensors = layer(computed_tensors, **kwargs)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 660, in call
outputs = self.call(inputs, *args, **kwargs)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/layers/convolutional.py", line 196, in call
outputs = self._convolution_op(inputs, self.kernel)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 1078, in call
return self.conv_op(inp, filter)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 634, in call
return self.call(inp, filter)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 233, in call
name=self.name)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 1951, in conv2d
name=name)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1031, in conv2d
data_format=data_format, dilations=dilations, name=name, ctx=_ctx)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1130, in conv2d_eager_fallback
ctx=_ctx, name=name)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 66, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D]

label all becomes zeros on coco

Hi, following dataset preprocess, I have generated tfrecords of coco and normalized labeled boxes with ori image width and height. But the label transformed got me this:

 <tf.Tensor: id=1600, shape=(3, 26, 26, 3, 6), dtype=float32, numpy=
array([[[[[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         ...,

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]]],


        [[[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         ...,

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]]],


        [[[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         ...,

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]]],


        ...,


        [[[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         ...,

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]]],


        [[[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         ...,

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]]],


        [[[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         ...,

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]]]],



       [[[[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         ...,

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]]],


        [[[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         ...,

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]]],


        [[[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         ...,

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]]],


        ...,


        [[[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         ...,

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]]],


        [[[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         ...,

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]]],


        [[[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         ...,

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]]]],



       [[[[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         ...,

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]]],


        [[[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         ...,

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]]],


        [[[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         ...,

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]]],


        ...,


        [[[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         ...,

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]]],


        [[[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         ...,

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]]],


        [[[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         ...,

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0.]]]]], dtype=float32)>

it's all zeros. Any suggestions on this?

Threshold option during detection

Hey @zzh8829, thanks for your code, works great. I was thinking it would be a good idea to have a detection threshold flag --thresh for the detection.py or conversion.py. Do I get it right that iou_threshold and score_threshold in lines 190-192 of models.py is the only way to change threshold for prediction? Let me know, this flag along with num_classes flag would be a great boost to the repo.

AttributeError: module 'tensorflow._api.v2.config' has no attribute 'gpu'

YoloV3-Tiny graph that runs using TF 2.0/CPU-only crashes when attempting to run TF2.0/GPU with the following error:

File "C:\Users\Rob\AppData\Local\conda\conda\envs\tf2-gpu\lib\site-packages\absl\app.py", line 251, in _run_main
sys.exit(main(argv))
File "detect_video.py", line 26, in main
tf.config.gpu.set_per_process_memory_fraction(FLAGS.gpu_fraction)
AttributeError: module 'tensorflow._api.v2.config' has no attribute 'gpu'

Any idea?
Thanks,
Rob

Trying to decode format of 'outputs' for translation to c_api

Nice job on this project! Works great in python on a model we custom trained.
I am in the process of using the TF c_api to translate to C/C++ for deployment. I understand the input tensor (image, 416x416x3x1), but I am having a little trouble trying to figure out the format of the output tensor (for either YoloV3 or TinyYoloV3). Referencing the last lines in those functions:

outputs = Lambda(lambda x: yolo_nms(x, anchors, masks, classes),
name='yolo_nms')((boxes_0[:3], boxes_1[:3]))

Pre-NMS the tensor shape would be similar to:
batch_size x 10647 x (num_classes + 5 bounding box attrs)

The number 10647 is equal to the sum 507 +2028 + 8112, which are the numbers of possible objects detected on each scale (for full YoloV3). The bbox values describing bounding box attributes stand for center_x, center_y, width, height, confidence.

So, looking for confirmation: if I used YoloV3Tiny with say 12 classes and two scales (not 3 like full YoloV3), the output tensor would look like the following:

(507 + 2028) * 3 * (12 + 5) = 2535 * 3 * 60 = 456,300

Note: I am also consulting the combined_max_suppression API, which is the last step in the pb file I am loading:
https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/image/combined_non_max_suppression
which defines the output as the following:

Returns:
'nmsed_boxes': A [batch_size, max_detections, 4] float32 tensor containing the non-max suppressed boxes. 'nmsed_scores': A [batch_size, max_detections] float32 tensor containing the scores for the boxes. 'nmsed_classes': A [batch_size, max_detections] float32 tensor containing the class for boxes. 'valid_detections': A [batch_size] int32 tensor indicating the number of valid detections per batch item. Only the top valid_detections[i] entries in nms_boxes[i], nms_scores[i] and nms_class[i] are valid. The rest of the entries are zero paddings.

Is this correct? Any help would be appreciated.

Thanks!
Rob

Cannot train at all: AttributeError: 'Tensor' object has no attribute 'numpy'

Detection works without any problems. When I try running the first command as explained in README under training (applied to my train and test sets), I get the following error message.

(yolov3-tf2-master) C:\Users\Pawel Wocjan\Documents\ML\yolov3-tf2-master>python train.py --batch_size 8 --dataset "C:\Users\Pawel Wocjan\Documents\ML\yolov3-tf2-master\data\racoon_dataset\train.record" --val_dataset "C:\Users\Pawel Wocjan\Documents\ML\yolov3-tf2-master\data
racoon_dataset\test.record" --epochs 100 --mode eager_tf --transfer fine_tune
W0724 12:49:11.198809 5172 deprecation.py:506] From C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\ops\init_ops.py:1251: calling VarianceScaling.init (from tensorflow.python.ops.init_ops) with dtype is deprecated an
d will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
2019-07-24 12:49:13.840923: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
W0724 12:49:14.279574 5172 deprecation.py:323] From C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\autograph\impl\api.py:255: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated an
d will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Traceback (most recent call last):
File "train.py", line 175, in
app.run(main)
File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\absl\app.py", line 300, in run
_run_main(main, args)
File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\absl\app.py", line 251, in _run_main
sys.exit(main(argv))
File "train.py", line 129, in main
epoch, batch, total_loss.numpy(),
AttributeError: 'Tensor' object has no attribute 'numpy'

inference speed question

Thanks for a very clean implementation! I'm getting 0.6 sec for predict() on 1080ti and puzzled why this should be so slow. With a similar implementation, I am able to get > 30 fps.
Any idea, please?

Nan values at very beginning of training and wrong tensor shapes

I have successfully used the code to train from scratch and also using the options darknet and no_output on several custom data sets. I have written a script that produces the tf records for training and validation.

When I try to train on a new dataset, I get nan values for the loss at the very beginning of training and also an error message that some tensor shapes do not match. I am not sure what is wrong. I think that I correctly generate the tf records for training and validation (eagle_train.record and eagle_test.record in the run below) because everything worked fine for the other data sets that I tried previously.

I have also noticed that the shapes that do not index into shape [8, 13, 13, 3, 6] change each time I rerun the command for training. In the run below, the wrong shape is [2, 13, 5, 2], but in other runs I got shapes such as [1, 15, 5, 2] and [7, 13, 5, 2] even though I have not changed the code at all.

Does anybody have an idea what could be the cause for this behavior? Thanks a lot for your help!

     95/Unknown - 57s 600ms/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan2019-08-12 14:01:02.722807: W tensorflow/core/framework/op_kernel.cc:1546] OP_REQUIRES failed at scatter_nd_op.cc:217 : Invalid argument: indices[2] = [2, 13, 5, 2] does not index into sha
pe [8,13,13,3,6]

Here is the complete trace:

(yolov3-tf2-master) C:\Users\Pawel Wocjan\Documents\ML\yolov3-tf2-master>python train.py --dataset ./data/eagle_train.record --val_dataset ./data/eagle_test.record --transfer darknet --mode fit --epochs 2
C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\framework\dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\framework\dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\framework\dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\framework\dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\framework\dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\framework\dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
2019-08-12 13:59:48.546377: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library nvcuda.dll
2019-08-12 13:59:48.626458: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.71
pciBusID: 0000:01:00.0
2019-08-12 13:59:48.629067: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-08-12 13:59:48.630822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-08-12 13:59:48.633876: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-08-12 13:59:48.636530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.71
pciBusID: 0000:01:00.0
2019-08-12 13:59:48.638655: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-08-12 13:59:48.640399: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-08-12 13:59:49.208639: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-08-12 13:59:49.211078: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0
2019-08-12 13:59:49.212671: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N
2019-08-12 13:59:49.214527: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6280 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:01:00.0, compute capability: 7.5)
W0812 14:00:03.033751  7376 deprecation.py:323] From C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\ops\array_ops.py:1340: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Epoch 1/2
2019-08-12 14:00:34.698461: I tensorflow/core/profiler/lib/profiler_session.cc:174] Profiler session started.
2019-08-12 14:00:34.700999: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'cupti64_100.dll'; dlerror: cupti64_100.dll not found
2019-08-12 14:00:34.703512: W tensorflow/core/profiler/lib/profiler_session.cc:182] Encountered error while starting profiler: Unavailable: CUPTI error: CUPTI could not be loaded or symbol could not be found.
      1/Unknown - 29s 29s/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan2019-08-12 14:00:34.991461: I tensorflow/core/platform/default/device_tracer.cc:641] Collecting 0 kernel records, 0 memcpy records.
2019-08-12 14:00:35.036669: E tensorflow/core/platform/default/device_tracer.cc:68] CUPTI error: CUPTI could not be loaded or symbol could not be found.
     95/Unknown - 57s 600ms/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan2019-08-12 14:01:02.722807: W tensorflow/core/framework/op_kernel.cc:1546] OP_REQUIRES failed at scatter_nd_op.cc:217 : Invalid argument: indices[2] = [2, 13, 5, 2] does not index into sha
pe [8,13,13,3,6]
     96/Unknown - 57s 597ms/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan2019-08-12 14:01:02.768297: W tensorflow/core/framework/op_kernel.cc:1546] OP_REQUIRES failed at iterator_ops.cc:1055 : Invalid argument: [_Derived_]{{function_node __inference_transform_t
argets_for_output_14319_specialized_for_StatefulPartitionedCall_at___inference_Dataset_map_<lambda>_15243}} {{function_node __inference_transform_targets_for_output_14319_specialized_for_StatefulPartitionedCall_at___inference_Dataset_map_<lambda>_15243}} indices[2] = [2, 13, 5, 2] does not index into shape [8
,13,13,3,6]
         [[{{node TensorScatterUpdate}}]]
         [[StatefulPartitionedCall]]
Traceback (most recent call last):
  File "train.py", line 193, in <module>
    app.run(main)
  File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\absl\app.py", line 300, in run
    _run_main(main, args)
  File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\absl\app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "train.py", line 188, in main
    validation_data=val_dataset)
  File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\keras\engine\training.py", line 643, in fit
    use_multiprocessing=use_multiprocessing)
  File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\keras\engine\training_generator.py", line 694, in fit
    steps_name='steps_per_epoch')
  File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\keras\engine\training_generator.py", line 220, in model_iteration
    batch_data = _get_next_batch(generator, mode)
  File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\keras\engine\training_generator.py", line 362, in _get_next_batch
    generator_output = next(generator)
  File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 586, in __next__
    return self.next()
  File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 623, in next
    return self._next_internal()
  File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 615, in _next_internal
    output_shapes=self._flat_output_shapes)
  File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\ops\gen_dataset_ops.py", line 2150, in iterator_get_next_sync
    _six.raise_from(_core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: [_Derived_]{{function_node __inference_transform_targets_for_output_14319_specialized_for_StatefulPartitionedCall_at___inference_Dataset_map_<lambda>_15243}} {{function_node __inference_transform_targets_for_output_14319_specialized_for_StatefulPar
titionedCall_at___inference_Dataset_map_<lambda>_15243}} indices[2] = [2, 13, 5, 2] does not index into shape [8,13,13,3,6]
         [[{{node TensorScatterUpdate}}]]
         [[StatefulPartitionedCall]] [Op:IteratorGetNextSync]
Exception ignored in: <bound method _CheckpointRestoreCoordinator.__del__ of <tensorflow.python.training.tracking.util._CheckpointRestoreCoordinator object at 0x00000284600F3E10>>
Traceback (most recent call last):
  File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\training\tracking\util.py", line 244, in __del__
  File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\training\tracking\util.py", line 93, in node_names
  File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\training\tracking\object_identity.py", line 76, in __getitem__
KeyError: (<tensorflow.python.training.tracking.object_identity._ObjectIdentityWrapper object at 0x00000282E5DA0F60>,)

detecting all 0 when model diverges even slightly

I don't understand if the model should just output 0 for everything even if the model diverges a little bit, i.e., I get the output only for 2 epochs where train and val loss are almost same, but even for 1 epoch later I get all 0. Is this a expected behaviour or I am doing something wrong?

Does someone happen to know why this is the case?

Cannot train on COCO due to coco.names file

I've fixed the bug and (lightly) tested, but do not have push permissions!

It comes down to COCO names having different spellings than matching VOC names.

If you make the change, don't forget to keep the old names file and point to it in train.py ;-)

tensorflow serving not working

it gives an error when making a request to server. Says binary and op are different. Does this model work when server is loaded?

question!

hi why you have done this?

def YoloLoss(anchors, classes=80, ignore_thresh=0.5):
def yolo_loss(y_true, y_pred):
.

return yolo_loss

Benchmark eager vs. graph?

Thanks for sharing this work! It's definitely interesting to see an example of what TF 2.0 is going to be to work with.

I have one question: in your readme you mention:

From my limited testing, GradientTape is definitely a bit slower than the normal graph mode.

Could you expand on this? Maybe share some numbers?

I'm interested because the advent of eager execution has always been to have imperative programming (for easier workflow) while not losing too much in performance. If it turns out however that for practical purposes it's not feasible to train in eager mode, one would have to maintain separate training loops, like you've done in train.py. It seems to me this would be detrimental to maintainability of TF2 repositories. Do you have a view on this?

Metrics : how to track the model?

Hello, i wanted to know how do you track that model accuracy ?
I only see loss tracking, i wonder if it is possible to follow an accuracy on validation dataset?

Customized learning rate function

Hello and thanks for your work !
I wanted to know if it is possible to play with custom learning rate optimiser. I usually use CLR for image classification and I wanted to know if It is possible to implement it in your code.

true Finetuning

Hi Man, i think you have done a very nice job in here, congrats. I would just add the chance to finetune the network with other dataset, because as it is, you assume people would want to finetune in the same COCO Dataset.

I made some minor modifications to make this happen:

  • Add a parameter with the number of classes (which could beinfered from the data/dataset.names)
  • Freeze the darknet and then generate new output branches with the new right number of classes.
  • Propagate the parameter through the anchor and mask creation

I will try to a PR with those changes so more people can benefit from it. Best regards!

How do you resume training?

What specific flags should I pass into train.py in order to resume my training checkpoint? I'm not sure if --transfer none, fine_tune, freeze, darknet fit the criteria to resume as they all involve freezing or getting rid of some parts of the weights.

resized image has wrong aspect ratio

It seems in https://github.com/zzh8829/yolov3-tf2/blob/eb30bd48ac1354a329a0763b2a8fe57364c5a272/yolov3_tf2/dataset.py you just simply resized original image into shape of (416, 416)

def transform_images(x_train, size): x_train = tf.image.resize(x_train, (size, size)) x_train = x_train / 255 return x_train

However, I think it might cause image distortion and make anchors meaningless. In darknet, the author use letterbox() method to keep image aspect ratio by padding.

Have you compared the results of different resizing method?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.