jakeret / tf_unet Goto Github PK

View Code? Open in Web Editor NEW

1.9K 1.9K 748.0 4.82 MB

Generic U-Net Tensorflow implementation for image segmentation

License: GNU General Public License v3.0

Python 97.86% Makefile 1.98% Shell 0.16%

deep-learning image-segmentation neural-network tensorflow

tf_unet's People

Contributors

Stargazers

Watchers

Forkers

liob adriaromero felixgruen zhengfangwu nicolov ameya005 pyd1data rjbashar anjytka zach-er derthorsten balodhi mave5 ml-lab zoonono oscarlight thekchang kongsea guanjiahui faiyadhs007 jaejun-yoo oppa3109 bhaprayan gwnudt inzamamul hpclearn dearkafka thanhsn luwao conanhung sam186 raghparihar ieee820 gxlcliqi jeanpat dhaneshr liweijia nickorberg romuere bkainz thinksono kirankm templeblock hehuanshu96 wanjinchang dreadlord1984 sunjieee guokr1991 agrafix fans2017 ethanbb kumasento benjamesbabala thatfreesky hhappy06 ang-zhu benakiva lyk125 amitshah liulohua appcoreopc freeyawork jiandai jordangierschendorf xiaofengqing giovanni-turra lomascolo albertaparicio sarahkef crazyvertigo frankite alessiom yabebalfantaye tifftliu optimuse biao187 lonl yzxyzh chenrongjing mmmatthew davidaknowles besler vipzpb zgongkuang fuweilin92 kim-seongjung johntang93 mamrehn ashstuff guoshengxu zjmhfut prestonlaw fileonard geoyi yiming992 alibekj vscv max0609 yyc9268 suntaopku

tf_unet's Issues

usage help, training files

I started as per this link,

Got error when at this python command, data_provider = image_util.ImageDataProvider("fishes/train/*.tif")
AssertionError: No training files

How to get these training files? any other link for prerequisites available?, Please help.

Regards
Gopi. J

How to train the Unet for three-dimensional matrices （custom dataset）

I want to train the unet with nifit data. I load the training data and corresponding label into workspace and obtain two 3-dimensional matrices: train_data.shape = (512,512,200) , train_label = (512, 512, 200). Both the data types are int16.
I have no idea to train the unet with these data.
Would it be possible for you to tell me how to make data_provider for following training?
path = trainer.train(data_provider, output_path, training_iters=32, epochs=100)

Looking forward to your reply.
Best wishes to you.

is there any setting should be considered in using this code?

Hi
I am using this code right now,
is there any setting that should be considered about using this code?
i.e. input pixel values range? ground-truth numbering?....?
I used this code
but the result is very bad and the code output is below 0.5 and should be considered as 0, at result
whole of output shown black.
How I can optimize the result?
please please help!!!

About dice loss

Hi
I am just new to this area.I am using dice loss function for image segmentation. I found different presentation so got confused. Can you please explain me why there 1-

for example : if my network for batch size 1 gives me values Top =0.9 ,bottom =0.9 then reduce mean is 0.9 and loss is 0.1 as per below equation.

loss = 1 - tf.reduce_mean(2 * intersection/ (union))

But I found some codes ,they are just using
loss =np.sum(2*intersection/(union))
so can you please clear this confusion?

regularization on biases

if regularizer is not None:
regularizers = sum([tf.nn.l2_loss(variable) for variable in self.variables])
loss += (regularizer * regularizers)
it seems like that you have regularization on biases, as the self.variables included the biases
` variables = []
for w1,w2 in weights:
variables.append(w1)
variables.append(w2)

for b1,b2 in biases:
    variables.append(b1)
    variables.append(b2)`

Fails to import image_util

Hello,

Importing image_util failed as shown bellow

/tf_unet$ python
Python 3.5.2 (default, Sep 10 2016, 08:21:44) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tf_unet
>>> from tf_unet import unet, util, image_util
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so.5.1.5 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so.8.0 locally
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'image_util'

Additionnal information: tf_unet was imported from python 3 virtual environement with :

>>> print(tf_unet.__version__)
0.1.0
>>> import tensorflow
>>> print(tensorflow.__version__)
0.11.0rc1

Model saving not working

I am using Tensorflow 0.12 along with the latest master branch. I have trained the u-net for around a 1000 patches and the results from the prediction images are encouraging. But when I save the model and restore it in another session, the output is similar to that of a freshly initialized model. Please let me know if anybody has managed to run a save and restore operation and if any specific changes are needed.

Prediction is shifted

I see that predictions are shifted when I overpay predictions on top of image. Is there any bug related to this?

Weighted Cross Entropy is wrong

Please check the second answer in this StackOverflow post.

ValueError: could not broadcast input array from shape (500,333,3) into shape (375,500,3)

The error is:

Traceback (most recent call last):
File "my_unet.py", line 12, in
path = trainer.train(data_provider, "./unet_trained", training_iters=20, epochs=10, display_step=2)
File "/usr/local/lib/python2.7/dist-packages/tf_unet-0.1.0-py2.7.egg/tf_unet/unet.py", line 399, in train
test_x, test_y = data_provider(self.verification_batch_size)
File "/usr/local/lib/python2.7/dist-packages/tf_unet-0.1.0-py2.7.egg/tf_unet/image_util.py", line 98, in call
X[i] = train_data
ValueError: could not broadcast input array from shape (500,333,3) into shape (375,500,3)

I have validated that the dimension of color image and corresponding mask is same for all.

numpy version: 1.12.1
tensorflow version: 1.0.1

NiiImageDataProvider: Generic data provider for nifti images.

Thanks for shearing the tf_unet code and it is an excellent work.
I want to train the unet with custom nifti image data. Each data is a three-dimensional volume data.

I failed to use the SimpleDataProvider and ImageDataProvider class.
So, I try to write 'NiiImageDataProvider' to read nifti data and train the unet.
Following is NiiImageDataProvider class code, It can correctly read the nifti data in local path.
However, when training the unet, I get an error in __call__ function.
I don't know how to modify this function. I'll appreciate a lot if anyone can help me fix it.
Here are my data and code

import nibabel as nb
class NiiImageDataProvider(BaseDataProvider):
    """
    This is an introduction to NiiImageDataProvider (similar to ImageDataProvider provided by jakeret)
    Generic data provider for nifti images.
    Assumes that the data images and label images are stored in the same folder
    and that the labels have a different file prefix 
    e.g. 'demo/data-1.nii' and 'demo/data-label-1.nii'

    Usage:
    data_provider = image_util.NiiImageDataProvider("../tf_unet-master/demo/*.nii")

    :param nii_path: a glob search pattern to find all data and label images
    :param a_min: (optional) min value used for clipping
    :param a_max: (optional) max value used for clipping
    :param Nii_data_prefix: prefix pattern for the data images. Default 'data-*.nii'
    :param Nii_Label_prefix: prefix pattern for the label images. Default 'data-label-*.nii'

    """

    channels = 1
    n_class = 2
    def __init__(self, nii_path, a_min=None, a_max=None, Nii_data_prefix="data", Nii_Label_prefix='data-label'):
        super(NiiImageDataProvider, self).__init__(a_min, a_max)
        self.Nii_data_prefix = Nii_data_prefix
        self.Nii_Label_prefix = Nii_Label_prefix
        self.file_idx = -1

        self.nii_data_files = self._find_niidata_files(nii_path)
        print(nii_path)

        assert len(self.nii_data_files) > 0, "No nii training data"
        print("Number of files used: %s" % len(self.nii_data_files))

        img = self._load_nii_file(self.nii_data_files[0])
        img = img.get_data() # get 3-dimension image data in nifti image
        self.channels = 1 if len(img.shape) == 3 else img.shape[-1]

    def _find_niidata_files(self, nii_path):
        all_niifiles = glob.glob(nii_path)
        return [name for name in all_niifiles if not self.Nii_Label_prefix in name]

    def _load_nii_file(self, path):
        return nb.load(path)

    def _cylce_nii_file(self):
        self.file_idx += 1
        if self.file_idx >= len(self.nii_data_files):
            self.file_idx

    def _next_data(self):
        self._cylce_nii_file()
        nii_image_name = self.nii_data_files[self.file_idx]
        nii_label_name = nii_image_name.replace(self.Nii_data_prefix, self.Nii_Label_prefix)

        nii_image = self._load_nii_file(nii_image_name)
        nii_image = nii_image.get_data()
        nii_image = nii_image.transpose(2, 0, 1)
        nz = nii_image.shape[0]
        nii_image = nii_image.reshape(nz, 512, 512, 1)

        nii_label = self._load_nii_file(nii_label_name)
        nii_label = nii_label.get_data()
        nii_label[nii_label == 2] = 1
        nii_label = nii_label.transpose(2, 0, 1)
        nii_label = nii_label.reshape(nz, 512, 512, 1)


        return nii_image, nii_label

Test code:

from tf_unet import unet
from tf_unet import image_util
# read data
data_provider = image_util.NiiImageDataProvider("X:/tf_unet-master/test_nii_provider/*.nii")
# set parameter
net = unet.Unet(channels=1, n_class=3, layers=3, features_root=16)
trainer = unet.Trainer(net, optimizer="momentum", opt_kwargs=dict(momentum=0.2))
# train
path = trainer.train(data_provider, "./seg_3dim_data_trained_0509", training_iters=5, epochs=1, display_step=2)

Error:

Traceback (most recent call last):
  File "G:/Tensorflow/tf_unet-master/test_nii_provider/test_nii_provider.py", line 9, in <module>
    path = trainer.train(data_provider, "./seg_3dim_data_trained_0509", training_iters=5, epochs=1, display_step=2)
  File "G:\Tensorflow\tf_unet-master\tf_unet\unet.py", line 403, in train
    test_x, test_y = data_provider(self.verification_batch_size)
  File "G:\Tensorflow\tf_unet-master\tf_unet\image_util.py", line 98, in __call__
    train_data, labels = self._load_data_and_label()
  File "G:\Tensorflow\tf_unet-master\tf_unet\image_util.py", line 58, in _load_data_and_label
    return train_data.reshape(1, ny, nx, self.channels), labels.reshape(1, ny, nx, self.n_class),
ValueError: cannot reshape array of size 19398656 into shape (1,74,512,1)

Conceptual Question

Hi,

In the toy problem, if I remove output's 2nd class ( i.e. the output map's NOT class),
then will the Unet give similar results?

The reason I am asking this is - I am trying to implement unet for a 6 class problem and there is a high class imbalance towards the 'background' class. So, now I am considering removing the background class from output map (thus, making it a 5 class problem). In this scenario, the pixels corresponding to background class will have 0s in all the other 5 classes. Since, the dice coeff. does not take into account True Negatives for cost calculation, this trick should work out fine.
I am using Dice coeff with Momentum Optimizer and have tried various layers, feature_root combinations.

But, with this setup, the network is not learning. After some training, the most frequent class becomes white and all other classes become black. The weight histograms are not changing at all as training continues. The gradients are decreasing slowly. And Dice loss is remaining almost constant.

I have been experimenting with this for last 15-20 days. Any help regarding this would make my day. :)

(@jakeret - Since, this issue is not related to technical problems in tf_unet, I will close it after some time if there is no response.)

Issue with groundtruth shape?

Hi,
I met an issue regarding a tensor dimension:

ValueError: Cannot feed value of shape (4, 80, 82, 2) for Tensor 'Placeholder_1:0', which has shape '(?, ?, ?, 4)'

The dataset consists in pairs of greyscale / groundtruth images, both images has a shape (80,82,1). The groundtruth contains pixels of value:

0:background
1,2 : objects
3 : pixels belonging to overlapping parts of the objects

The following code (modified from https://tf-unet.readthedocs.io/en/latest/usage.html) illustrates the nature of the data and the way the code fails:

notebook on gist

Thanks for your advices.
JP

How to load custom dataset for training and validating?

Hi,
my own dataset directory structure is:

data
    |--<label1>
            |--<image001.jpg>
            |--<image001.png> (mask image for image001.jpg)
            |--<image002.jpg>
            |--<image002.png>
            |......
   |--<label2>
           |--<image001.jpg>
           |--<image001.png>
           |......
   |......

I also split the dataset into train and validating sets:

data
    |--train
          |--<label1>
                  |--<image001.jpg>
                  |--<image001.png>
                  |......
          |--<label2>
                  |--<image001.jpg>
                  |--<image001.png>
                  |......
          |......
    |--val
         |--<label1>
                 |--<image002.jpg>
                 |--<image002.png>
                 |......
         |--<label2>
                 |--<image002.jpg>
                 |--<image002.png>
                 |......

May I ask how to load this dataset for training and validating?

Dice loss operating on logits rather than predictions

In calculating the dice loss, we should be taking the softmax of the logits before using it to calculate the loss function.

Compatibility with Tensorflow 1.0

tf_unet ran into multiple issues after I upgraded to Tensorflow 1.0

Are there any plans to update the package?

Multi-GPU training

Have you guys figured out how to incorporate Multi-GPU training as explained in the Cifar-10 tutorial?

regarding the size of input masking image and definition of "in_size" and "size" for offset

Hello Joel,

Thank you very much for sharing your code, which is very well written.

I have several questions, would you mind sharing your thoughts on them?

In your implementation, the input mask training data set has to be of share row*column*2. For my use case, the input masking training data set is of shape row*column*1. Do I have to transform my input masking training data set into the form of row*column*2. Are there any reason that you would like to specify the mask data set that way?
In Create_conv_net, you defined in_size=1000, and size=in_size. Value of size is changed during convolution, pooling, deconv and unpooling operations. Then create_conv_net will return in_size-size as offset, which will be used to compute px and py. This is copied from the program returns prediction: The unet prediction Shape [n, px, py, labels] (px=nx-self.offset/2)
I don’t understand why in_size is setup as 1000, and why we need this offset. Looks like un-pooling and deconvolution can resize the output map to the original image. Especially, the conv2d should allow us to specify the shape of output map.
In the training process, you use test_x, test_y = data_provider(4)
pred_shape = self.store_prediction(sess, test_x, test_y, "_init")
What’s the reason to generate a batch of 4 at the very begining. Are there any considerations here?

Thank you very much for your help.

Segmentation with small dataset

Thanks for your work.

I am trying to apply u-net to a small dataset of 16 patches (it is really small but I need to understand if I am doing some mistakes). Specifically, I prepared a binary mask to define what is positive (1) and negative (0).

Unfortunately, I found two problems:

1- If I select a patch without positive elements, the network normalised the prediction probability.
2- Typical negative elements (with completely different colours) are evaluated as positive.

I started from launcher file placed in script folder and change it to evaluate pictures.

To conclude, I use as test the same pictures of training.

Do you have any suggestion?

Thanks,

Giovanni

Error in combine_img_prediction

Hi,
I have a trouble while running my code:

# Import data
print('Loading dataset...\n')
X_data = np.load(DATASET_FOLDER+"X_data.npy")
y_data = np.load(DATASET_FOLDER+"y_data.npy")
X_test = np.load(DATASET_FOLDER+"X_test.npy")
y_test = np.load(DATASET_FOLDER+"y_test.npy")

print("TRAIN data shape: ", X_data.shape)
print("TRAIN labels shape", y_data.shape)
print("TEST data shape: ", X_test.shape)
print("TEST labels shape: ", y_test.shape)

X_data = np.float32(X_data)
y_data = np.float32(y_data)
X_test = np.float32(X_test)
y_test = np.float32(y_test)

training_iters = 20
epochs = 100
dropout = 0.75 # Dropout, probability to keep units
display_step = 2
restore = False
 
data_provider = image_util.SimpleDataProvider(X_data, y_data, channels=2, n_class=1)

net = unet.Unet(channels=2, n_class=1, layers=4, features_root=64, cost="dice_coefficient")
    
trainer = unet.Trainer(net, optimizer="adam")
path = trainer.train(data_provider, "./unet_trained", training_iters=training_iters, epochs=epochs, dropout=dropout, display_step=display_step, restore=restore)
     
prediction = net.predict(path, X_test)
     
print("Testing error rate: {:.2f}%".format(unet.error_rate(prediction, util.crop_to_shape(y_test, prediction.shape))))

The error is:


Loading dataset...

TRAIN data shape:  (1560, 128, 128, 2)
TRAIN labels shape (1560, 128, 128)
TEST data shape:  (120, 128, 128, 2)
TEST labels shape:  (120, 128, 128)
2017-06-23 15:07:05,594 Layers 4, features 64, filter size 3x3, pool size: 2x2
2017-06-23 15:07:07,878 Removing '/home/stefano/Dropbox/DeepWave/prediction'
2017-06-23 15:07:07,878 Removing '/home/stefano/Dropbox/DeepWave/unet_trained'
2017-06-23 15:07:07,878 Allocating '/home/stefano/Dropbox/DeepWave/prediction'
2017-06-23 15:07:07,879 Allocating '/home/stefano/Dropbox/DeepWave/unet_trained'
2017-06-23 15:07:07.879575: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-23 15:07:07.879602: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-23 15:07:07.879615: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-06-23 15:07:10,201 Verification error= 0.0%, loss= -0.0000
Traceback (most recent call last):
  File "Unet.py", line 45, in <module>
    path = trainer.train(data_provider, "./unet_trained", training_iters=training_iters, epochs=epochs, dropout=dropout, display_step=display_step, restore=restore)
  File "./tf_unet/unet.py", line 404, in train
    pred_shape = self.store_prediction(sess, test_x, test_y, "_init")
  File "./tf_unet/unet.py", line 457, in store_prediction
    img = util.combine_img_prediction(batch_x, batch_y, prediction)
  File "/home/stefano/Dropbox/DeepWave/tf_unet/util.py", line 104, in combine_img_prediction
    to_rgb(crop_to_shape(gt[..., 1], pred.shape).reshape(-1, ny, 1)), 
IndexError: index 1 is out of bounds for axis 3 with size 1

combile_img_prediction function has the following argument shapes:
(4, 128, 128, 1) --> gt
(4, 128, 128, 2) --> data
(4, 36, 36, 1) --> pred

My datasets have the following shapes:
TRAIN data shape: (1560, 128, 128, 2)
TRAIN labels shape (1560, 128, 128)
TEST data shape: (120, 128, 128, 2)
TEST labels shape: (120, 128, 128)

How can I solve the issue?
Thank you! 👍

EDIT: sorry.. obviously n_class was 2. I corrected the error... but now i have:

Traceback (most recent call last):
  File "Unet.py", line 43, in <module>
    path = trainer.train(data_provider, "./unet_trained", training_iters=training_iters, epochs=epochs, dropout=dropout, display_step=display_step, restore=restore)
  File "./tf_unet/unet.py", line 403, in train
    test_x, test_y = data_provider(self.verification_batch_size)
  File "./tf_unet/image_util.py", line 89, in __call__
    train_data, labels = self._load_data_and_label()
  File "./tf_unet/image_util.py", line 50, in _load_data_and_label
    labels = self._process_labels(label)
  File "./tf_unet/image_util.py", line 65, in _process_labels
    labels[..., 0] = ~label
TypeError: ufunc 'invert' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

the error messages of running demo_toy_problem.ipynb

Hi Jakeret,

When running demo_toy_problem.ipynb, executing this line of code path = trainer.train(generator, "./unet_trained", training_iters=20, epochs=100, display_step=2) . I got the following error messages, do you mind sharing any insights on what causes the problem

Verification error= 16.7%, loss= 0.8326
Start optimization
Traceback (most recent call last):
File "/test/tfw/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 480, in _process_fetches
allow_operation=True)
File "/test/tfw/lib/python3.4/site-packages/tensorflow/python/framework/ops.py", line 2301, in as_graph_element
% (type(obj).name, types_str))
TypeError: Can not convert a list into a Tensor or Operation.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "test.py", line 30, in
path = trainer.train(generator, "./unet_trained", training_iters=20, epochs=100, display_step=2)
File "/home/user/test/u-net/ver3/unet.py", line 364, in train
self.net.keep_prob: dropout})
File "/test/tfw/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 340, in run
run_metadata_ptr)
File "/test/tfw/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 523, in _run
processed_fetches = self._process_fetches(fetches)
File "/test/tfw/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 493, in _process_fetches
% (subfetch, fetch, type(subfetch), str(e)))
TypeError: Fetch argument [<tf.Tensor 'gradients/Conv2D_grad/Conv2DBackpropFilter:0' shape=(3, 3, 1, 16) dtype=float32>, <tf.Tensor 'gradients/Conv2D_1_grad/Conv2DBackpropFilter:0' shape=(3, 3, 16, 16) dtype=float32>, <tf.Tensor 'gradients/Conv2D_2_grad/Conv2DBackpropFilter:0' shape=(3, 3, 16, 32) dtype=float32>, <tf.Tensor 'gradients/Conv2D_3_grad/Conv2DBackpropFilter:0' shape=(3, 3, 32, 32) dtype=float32>, <tf.Tensor 'gradients/Conv2D_4_grad/Conv2DBackpropFilter:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Tensor 'gradients/Conv2D_5_grad/Conv2DBackpropFilter:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Tensor 'gradients/Conv2D_6_grad/Conv2DBackpropFilter:0' shape=(3, 3, 64, 32) dtype=float32>, <tf.Tensor 'gradients/Conv2D_7_grad/Conv2DBackpropFilter:0' shape=(3, 3, 32, 32) dtype=float32>, <tf.Tensor 'gradients/Conv2D_8_grad/Conv2DBackpropFilter:0' shape=(3, 3, 32, 16) dtype=float32>, <tf.Tensor 'gradients/Conv2D_9_grad/Conv2DBackpropFilter:0' shape=(3, 3, 16, 16) dtype=float32>, <tf.Tensor 'gradients/add_grad/Reshape_1:0' shape=(16,) dtype=float32>, <tf.Tensor 'gradients/add_1_grad/Reshape_1:0' shape=(16,) dtype=float32>, <tf.Tensor 'gradients/add_2_grad/Reshape_1:0' shape=(32,) dtype=float32>, <tf.Tensor 'gradients/add_3_grad/Reshape_1:0' shape=(32,) dtype=float32>, <tf.Tensor 'gradients/add_4_grad/Reshape_1:0' shape=(64,) dtype=float32>, <tf.Tensor 'gradients/add_5_grad/Reshape_1:0' shape=(64,) dtype=float32>, <tf.Tensor 'gradients/add_7_grad/Reshape_1:0' shape=(32,) dtype=float32>, <tf.Tensor 'gradients/add_8_grad/Reshape_1:0' shape=(32,) dtype=float32>, <tf.Tensor 'gradients/add_10_grad/Reshape_1:0' shape=(16,) dtype=float32>, <tf.Tensor 'gradients/add_11_grad/Reshape_1:0' shape=(16,) dtype=float32>] of [<tf.Tensor 'gradients/Conv2D_grad/Conv2DBackpropFilter:0' shape=(3, 3, 1, 16) dtype=float32>, <tf.Tensor 'gradients/Conv2D_1_grad/Conv2DBackpropFilter:0' shape=(3, 3, 16, 16) dtype=float32>, <tf.Tensor 'gradients/Conv2D_2_grad/Conv2DBackpropFilter:0' shape=(3, 3, 16, 32) dtype=float32>, <tf.Tensor 'gradients/Conv2D_3_grad/Conv2DBackpropFilter:0' shape=(3, 3, 32, 32) dtype=float32>, <tf.Tensor 'gradients/Conv2D_4_grad/Conv2DBackpropFilter:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Tensor 'gradients/Conv2D_5_grad/Conv2DBackpropFilter:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Tensor 'gradients/Conv2D_6_grad/Conv2DBackpropFilter:0' shape=(3, 3, 64, 32) dtype=float32>, <tf.Tensor 'gradients/Conv2D_7_grad/Conv2DBackpropFilter:0' shape=(3, 3, 32, 32) dtype=float32>, <tf.Tensor 'gradients/Conv2D_8_grad/Conv2DBackpropFilter:0' shape=(3, 3, 32, 16) dtype=float32>, <tf.Tensor 'gradients/Conv2D_9_grad/Conv2DBackpropFilter:0' shape=(3, 3, 16, 16) dtype=float32>, <tf.Tensor 'gradients/add_grad/Reshape_1:0' shape=(16,) dtype=float32>, <tf.Tensor 'gradients/add_1_grad/Reshape_1:0' shape=(16,) dtype=float32>, <tf.Tensor 'gradients/add_2_grad/Reshape_1:0' shape=(32,) dtype=float32>, <tf.Tensor 'gradients/add_3_grad/Reshape_1:0' shape=(32,) dtype=float32>, <tf.Tensor 'gradients/add_4_grad/Reshape_1:0' shape=(64,) dtype=float32>, <tf.Tensor 'gradients/add_5_grad/Reshape_1:0' shape=(64,) dtype=float32>, <tf.Tensor 'gradients/add_7_grad/Reshape_1:0' shape=(32,) dtype=float32>, <tf.Tensor 'gradients/add_8_grad/Reshape_1:0' shape=(32,) dtype=float32>, <tf.Tensor 'gradients/add_10_grad/Reshape_1:0' shape=(16,) dtype=float32>, <tf.Tensor 'gradients/add_11_grad/Reshape_1:0' shape=(16,) dtype=float32>] has invalid type <class 'list'>, must be a string or Tensor. (Can not convert a list into a Tensor or Operation.)

Error when training with Dice Coefficient

Hi,

I get an error when I tried training with dice coefficient as the ago function. I noticed there was a new commit on this a couple days ago so I suspect it's some bug in the code. Would you know roughly where this might be?

InvalidArgumentError Traceback (most recent call last)
in ()
----> 1 path = trainer.train(generator, "./unet_trained", training_iters=100, epochs=100, display_step=5)

/home/proj/tf_unet/tf_unet/unet.pyc in train(self, data_provider, output_path, training_iters, epochs, dropout, display_step, restore)
424
425 if step % display_step == 0:
--> 426 self.output_minibatch_stats(sess, summary_writer, step, batch_x, util.crop_to_shape(batch_y, pred_shape))
427
428 total_loss += loss

/home/proj/tf_unet/tf_unet/unet.pyc in output_minibatch_stats(self, sess, summary_writer, step, batch_x, batch_y)
467 feed_dict={self.net.x: batch_x,
468 self.net.y: batch_y,
--> 469 self.net.keep_prob: 1.})
470 summary_writer.add_summary(summary_str, step)
471 summary_writer.flush()

/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
764 try:
765 result = self._run(None, fetches, feed_dict, options_ptr,
--> 766 run_metadata_ptr)
767 if run_metadata:
768 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
962 if final_fetches or final_targets:
963 results = self._do_run(handle, final_targets, final_fetches,
--> 964 feed_dict_string, options, run_metadata)
965 else:
966 results = []

/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1012 if handle is None:
1013 return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
-> 1014 target_list, options, run_metadata)
1015 else:
1016 return self._do_call(_prun_fn, self._session, handle, feed_dict,

/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _do_call(self, fn, *args)
1032 except KeyError:
1033 pass
-> 1034 raise type(e)(node_def, op, message)
1035
1036 def _extend_graph(self):

InvalidArgumentError: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_37/read)]]

Caused by op u'norm_grads', defined at:
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/local/lib/python2.7/dist-packages/ipykernel/main.py", line 3, in
app.launch_new_instance()
File "/usr/local/lib/python2.7/dist-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelapp.py", line 474, in start
ioloop.IOLoop.instance().start()
File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/ioloop.py", line 177, in start
super(ZMQIOLoop, self).start()
File "/usr/local/lib/python2.7/dist-packages/tornado/ioloop.py", line 887, in start
handler_func(fd_obj, events)
File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
self._handle_recv()
File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelbase.py", line 276, in dispatcher
return self.dispatch_shell(stream, msg)
File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelbase.py", line 228, in dispatch_shell
handler(stream, idents, msg)
File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelbase.py", line 390, in execute_request
user_expressions, allow_stdin)
File "/usr/local/lib/python2.7/dist-packages/ipykernel/ipkernel.py", line 196, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/usr/local/lib/python2.7/dist-packages/ipykernel/zmqshell.py", line 501, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2717, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2821, in run_ast_nodes
if self.run_code(code, result):
File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
path = trainer.train(generator, "./unet_trained", training_iters=100, epochs=100, display_step=5)
File "/home/proj/tf_unet/tf_unet/unet.py", line 389, in train
init = self._initialize(training_iters, output_path, restore)
File "/home/proj/tf_unet/tf_unet/unet.py", line 342, in _initialize
tf.summary.histogram('norm_grads', self.norm_gradients_node)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/summary/summary.py", line 205, in histogram
tag=scope.rstrip('/'), values=values, name=scope)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_logging_ops.py", line 139, in _histogram_summary
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1128, in init
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_37/read)]]

Error with example code

I am having trouble running the code provided in the usage section (http://tf-unet.readthedocs.io/en/latest/usage.html)

I created a folder named images and placed my .jpg files in that folder. Here is my code:

from tf_unet import unet, util, image_util

#preparing data loading
data_provider = image_util.ImageDataProvider("images/*.jpg")

#setup & training
net = unet.Unet(layers=3, features_root=64, channels=1, n_class=2)
trainer = unet.Trainer(net)

output_path = "train/"

path = trainer.train(data_provider, output_path, training_iters=32, epochs=100)

#verification
#...

prediction = net.predict(path, data)

unet.error_rate(prediction, util.crop_to_shape(label, prediction.shape))

img = util.combine_img_prediction(data, label, prediction)
util.save_image(img, "prediction.jpg")

And here is the error message:

[aaydin@sn-nvda3 unet_test]$ python test.py 
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Number of files used: 100
2017-03-07 07:54:16,681 Layers 3, features 64, filter size 3x3, pool size: 2x2
Traceback (most recent call last):
  File "test.py", line 12, in <module>
    path = trainer.train(data_provider, output_path, training_iters=32, epochs=100)
  File "/gpfs/home/aaydin/tf_unet/tf_unet/unet.py", line 386, in train
    init = self._initialize(training_iters, output_path, restore)
  File "/gpfs/home/aaydin/tf_unet/tf_unet/unet.py", line 345, in _initialize
    self.optimizer = self._get_optimizer(training_iters, global_step)
  File "/gpfs/home/aaydin/tf_unet/tf_unet/unet.py", line 321, in _get_optimizer
    **self.opt_kwargs).minimize(self.net.cost, 
TypeError: __init__() takes at least 3 arguments (2 given)

It looks like this is not related to my code. It should have something to do with tf-unet.

unet.Unet class

Hello,
I don't see any documentation for the parameters
"features_root" and "layers"

many thanks for updating
Peter

Why are there two weight initilizers?

With in layers, line 24 till 29. These two function are exactly the same, why do they both exist?
def weight_variable(shape, stddev=0.1):
initial = tf.truncated_normal(shape, stddev=stddev)
return tf.Variable(initial)

def weight_variable_devonc(shape, stddev=0.1):
return tf.Variable(tf.truncated_normal(shape, stddev=stddev))

Keras 2

Hi and congrats for this promising project. Did you consider porting it to Keras 2 so people with little background in machine learning can easily understand and tweak your UNet implementation ?

Also Keras 2 will be supported for years and allow user to run their model on multiple backends such as Theano, Tensorflow, JavaScript, Scala, Java, etc). See here for more details : https://blog.keras.io/introducing-keras-2.html

That would be really nice !

Missing conv3d and deconv3d functions

tensorflow/models.py script tries to import conv3d and deconv3d wrappers from tf_unet/layers that don't exist.

Weighted Cross Entropy not working on high imbalance data

I am trying to implement a binary segmentation with data of high imbalance. I am getting a true positive as zero and false negative is always 100%. I have modified the error_rate() function to print the same as follows.

def error_rate(predictions, labels):

    y_ = labels[...,1]
    tp = 0
    fn = 0

    Np = 0
    Nn = 0

    for i in range(y_.shape[0]):
        for j in range(y_.shape[1]):
            for k in range(y_.shape[2]):
                if y_[i][j][k]:
                    Np += 1
                    if predictions[i][j][k][1] > predictions[i][j][k][0]:
                        tp += 1
                else:
                    Nn += 1
                    if predictions[i][j][k][1] < predictions[i][j][k][0]:
                        fn += 1

    if Np == 0:
        tp = -1
    else:
        tp = tp/Np

    return tp*100, (fn/Nn*100)

I am using cross-entropy loss and I have tried with various class weights [0.01, 0.99], [1, 100]. Kindly provide some insight into what might be going wrong. The dataset I am using has sliced CBCT images and every image has labels of both classes present.

Failed to run the demo_toy_problem with tensorflow 1.0.1

Hi,

The demo_toy_problem.ipynb was run from a python 2 virtualenv. tensorflow 1.01 is installed (gpu version). When running cell 7, I got the following error:

AttributeError                            Traceback (most recent call last)
<ipython-input-7-778ffd89a317> in <module>()
----> 1 net = unet.Unet(channels=generator.channels, n_class=generator.n_class, layers=3, features_root=16)

/home/jeanpat/DeepFISH-Github_projects/tf_unet/tf_unet/unet.py in __init__(self, nx, ny, channels, n_class, add_regularizers, class_weights, **kwargs)
    187         self.keep_prob = tf.placeholder(tf.float32) #dropout (keep probability)
    188 
--> 189         logits, self.variables, self.offset = create_conv_net(self.x, self.keep_prob, channels, n_class, **kwargs)
    190 
    191         if class_weights is not None:

/home/jeanpat/DeepFISH-Github_projects/tf_unet/tf_unet/unet.py in create_conv_net(x, keep_prob, channels, n_class, layers, features_root, filter_size, pool_size, summaries)
     55     nx = tf.shape(x)[1]
     56     ny = tf.shape(x)[2]
---> 57     x_image = tf.reshape(x, tf.pack([-1,nx,ny,channels]))
     58     in_node = x_image
     59     batch_size = tf.shape(x_image)[0]

AttributeError: module 'tensorflow' has no attribute 'pack'

Is there some solution to fix the issue?

how to save, restore and use the trained model?

Dear all
Please give an example hot to save, load and use trained model in this code?
Thanks

Training loss doesn't converge for custom dataset

My custom training dataset has 5000 color images and 5000 corresponding mask images.

2017-04-21 21:21:12,678 Start optimization
2017-04-21 21:21:14,525 Iter 0, Minibatch Loss= 0.6760, Training Accuracy= 0.6090, Minibatch error= 39.1%
2017-04-21 21:21:15,047 Iter 2, Minibatch Loss= 0.7317, Training Accuracy= 0.4318, Minibatch error= 56.8%
2017-04-21 21:21:15,608 Iter 4, Minibatch Loss= 0.5663, Training Accuracy= 0.7503, Minibatch error= 25.0%
2017-04-21 21:21:16,575 Iter 6, Minibatch Loss= 0.5671, Training Accuracy= 0.7178, Minibatch error= 28.2%
2017-04-21 21:21:17,157 Iter 8, Minibatch Loss= 0.3097, Training Accuracy= 0.8937, Minibatch error= 10.6%
2017-04-21 21:21:17,694 Iter 10, Minibatch Loss= 0.5114, Training Accuracy= 0.7774, Minibatch error= 22.3%
2017-04-21 21:21:18,233 Iter 12, Minibatch Loss= 0.6332, Training Accuracy= 0.6583, Minibatch error= 34.2%
2017-04-21 21:21:18,710 Iter 14, Minibatch Loss= 0.5695, Training Accuracy= 0.7293, Minibatch error= 27.1%
2017-04-21 21:21:19,249 Iter 16, Minibatch Loss= 0.4922, Training Accuracy= 0.8320, Minibatch error= 16.8%
2017-04-21 21:21:19,754 Iter 18, Minibatch Loss= 0.5962, Training Accuracy= 0.7211, Minibatch error= 27.9%
2017-04-21 21:21:19,977 Epoch 0, Average loss: 0.5595, learning rate: 0.2000
2017-04-21 21:21:20,200 Verification error= 15.3%, loss= 0.4723

2017-04-21 21:27:59,767 Epoch 48, Average loss: 0.4393, learning rate: 0.0171
2017-04-21 21:27:59,993 Verification error= 15.3%, loss= 0.4511
2017-04-21 21:28:01,688 Iter 980, Minibatch Loss= 0.6371, Training Accuracy= 0.7971, Minibatch error= 20.3%
2017-04-21 21:28:02,623 Iter 982, Minibatch Loss= 0.3423, Training Accuracy= 0.9056, Minibatch error= 9.4%
2017-04-21 21:28:03,648 Iter 984, Minibatch Loss= 0.6891, Training Accuracy= 0.6723, Minibatch error= 32.8%
2017-04-21 21:28:04,706 Iter 986, Minibatch Loss= 0.4940, Training Accuracy= 0.7985, Minibatch error= 20.1%
2017-04-21 21:28:05,809 Iter 988, Minibatch Loss= 0.3383, Training Accuracy= 0.9188, Minibatch error= 8.1%
2017-04-21 21:28:06,813 Iter 990, Minibatch Loss= 0.4692, Training Accuracy= 0.7797, Minibatch error= 22.0%
2017-04-21 21:28:07,792 Iter 992, Minibatch Loss= 0.7902, Training Accuracy= 0.5315, Minibatch error= 46.8%
2017-04-21 21:28:08,937 Iter 994, Minibatch Loss= 0.6003, Training Accuracy= 0.7040, Minibatch error= 29.6%
2017-04-21 21:28:09,864 Iter 996, Minibatch Loss= 0.4520, Training Accuracy= 0.7768, Minibatch error= 22.3%
2017-04-21 21:28:10,906 Iter 998, Minibatch Loss= 0.6925, Training Accuracy= 0.7355, Minibatch error= 26.5%
2017-04-21 21:28:11,301 Epoch 49, Average loss: 0.5205, learning rate: 0.0162
2017-04-21 21:28:11,525 Verification error= 15.3%, loss= 0.4540
2017-04-21 21:28:12,691 Optimization Finished!

I use this code for custom dataset training:

from tf_unet import image_util
from tf_unet import unet
from tf_unet import util

search_path = 'data/train/*.jpg'
data_provider = image_util.ImageDataProvider(search_path, data_suffix='.jpg', mask_suffix='.png')

net = unet.Unet(channels=data_provider.channels, n_class=data_provider.n_class, layers=3, features_root=32)

trainer = unet.Trainer(net, optimizer="momentum", opt_kwargs=dict(momentum=0.2))

path = trainer.train(data_provider, "./unet_trained", training_iters=20, epochs=50, display_step=2)

Format of input?

I have a numpy (.npy) file having data of 564 image files, shape is [564, 420, 580] and same for their masks, [564, 420, 580].

I am saving these numpy objects in a h5py file and then passing it like:

dataset_file = h5py.File(source_path, 'r')
generator = Generator(1, dataset_file)

but getting following errors:
AttributeError: 'int' object has no attribute 'encode'
at line: 41 of radio_util.py:
with h5py.File(self.files[self.file_idx], "r") as fp

Please help me...

Several observations during the empirical study

Hi Joel,

I have run multiple experiments concurrently, and have several observations. Thanks for your insight.

When the batch size is setup as 20, or higher, I found that the learning rate (using momentum) starts to decrease only after more than 10 epochs. It keeps the same at the very beginning. This is what I described in the other thread. Right now, I can see the learning rate is decreasing as expected after epoch 10.
When the batch size is setup at 2 or 4. I can see the learning rate starts to decrease from the very first several epochs.
I am not very clear about how to explain this behavior.
In the competition, they use so-called dice coefficient, which is different with the loss function you are using, do you have any specific consideration for this?
I have been trying to test the adam optimizer.

It will work if I call it as trainer = unet.Trainer(net, optimizer="adam", opt_kwargs=dict(learning_rate=0.0015))

However, it will give the following error message if I call it as
trainer = unet.Trainer(net, optimizer="adam", opt_kwargs=dict(momentum=0.0015))
It will give some error message as:

Traceback (most recent call last):
File "launcher.py", line 54, in
path = trainer.train(generator, "/data/unet_trained", training_iters=1406, epochs=100, display_step=100)
File "/test/u-net/ver6/unet.py", line 341, in train
init = self._initialize(training_iters, output_path, restore)
File "/test/u-net/ver6/unet.py", line 298, in _initialize
self.optimizer = self._get_optimizer(training_iters, global_step)
File "/test/u-net/ver6/unet.py", line 281, in _get_optimizer
**self.opt_kwargs).minimize(self.net.cost,
TypeError: init() got an unexpected keyword argument 'momentum'

After I reading your code, it seems to me the parameter in opt_kwargs should not impact at all regardless whether it is "momentum" or "learning_rate" because as shown
in the following, you always set it as "learning_rate" in opt_kwargs.pop. I think I may be not very clear about the mechanism of opt_kwargs

if self.optimizer == "momentum":
            learning_rate = self.opt_kwargs.pop("learning_rate", 0.2)
            decay_rate = self.opt_kwargs.pop("decay_rate", 0.95)

elif self.optimizer == "adam":
            learning_rate = self.opt_kwargs.pop("learning_rate", 0.001)
            self.learning_rate_node = tf.Variable(learning_rate)

Add support for Keras ImageDataGenerator

Keras provides a nice API for loading and also transforming training and validation data. Maybe with a few tweaks this could be supported by tf_unet.

Example:

train_datagen = image.ImageDataGenerator(
    preprocessing_function=preprocess_input,
    rotation_range=30,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    vertical_flip=True
)

train_generator = train_datagen.flow_from_directory(
  './data/food-101/train/',
  target_size=target_size,
  batch_size=32,
)

Afterwards the generator can be iterated indefinitely e.g. data, mask = next(train_generator).

regarding the usage of gradients and self.net.gradients_nodes

Hi Joel,

The previous thread has become so long. I just closed it. Thanks for your help as always.

I am re-running the code with batch size =2, which was reported by some blogs to get better result. At the same time, I am trying to normalize the image as you suggested.

Besides, I am still feeling confused with the line of code to perform the optimization. My understanding for your including gradients and self.net.gradients_nodes is for the debugging usages.

_, loss, lr, gradients = sess.run((self.optimizer, self.net.cost, self.learning_rate_node, self.net.gradients_node), feed_dict={self.net.x: batch_x, self.net.y: util.crop_to_shape(batch_y, pred_shape), self.net.keep_prob: dropout})

In other words, if I am only trying to run the training process, I can remove these two items and re-write your code as. Is my understanding right? In specific, I think I am not very clear about the usage of gradients here. Thanks.

_, loss, lr= sess.run((self.optimizer, self.net.cost, self.learning_rate_node), feed_dict={self.net.x: batch_x, self.net.y: util.crop_to_shape(batch_y, pred_shape), self.net.keep_prob: dropout})

Tensorflow Unet training error with TensorFlow 1.1.0

2017-05-08 10:23:35,963 Verification error= 76.2%, loss= -0.4966
2017-05-08 10:23:36,797 Start optimization
2017-05-08 10:23:39,794 Iter 0, Minibatch Loss= -0.5287, Training Accuracy= 0.6946, Minibatch error= 30.5%
2017-05-08 10:23:41,902 Iter 2, Minibatch Loss= -0.5188, Training Accuracy= 0.5191, Minibatch error= 48.1%
2017-05-08 10:23:43,797 Iter 4, Minibatch Loss= nan, Training Accuracy= 0.2501, Minibatch error= 22.1%
2017-05-08 10:23:44.476436: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.476717: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.477076: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.477192: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.477491: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.477643: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.477992: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.478101: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.478401: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.478537: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.478833: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.479264: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.479623: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.479750: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.480048: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.480512: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.480872: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.480996: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.481190: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.481314: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.481385: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.481639: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.481704: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.481769: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
Traceback (most recent call last):
File "train_cub_unet.py", line 12, in
path = trainer.train(data_provider, './unet_trained', training_iters=32, epochs=100, display_step=2)
File "/usr/local/lib/python2.7/dist-packages/tf_unet-0.1.0-py2.7.egg/tf_unet/unet.py", line 430, in train
self.output_minibatch_stats(sess, summary_writer, step, batch_x, util.crop_to_shape(batch_y, pred_shape))
File "/usr/local/lib/python2.7/dist-packages/tf_unet-0.1.0-py2.7.egg/tf_unet/unet.py", line 473, in output_minibatch_stats
self.net.keep_prob: 1.})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]

Caused by op u'norm_grads', defined at:
File "train_cub_unet.py", line 12, in
path = trainer.train(data_provider, './unet_trained', training_iters=32, epochs=100, display_step=2)
File "/usr/local/lib/python2.7/dist-packages/tf_unet-0.1.0-py2.7.egg/tf_unet/unet.py", line 390, in train
init = self._initialize(training_iters, output_path, restore)
File "/usr/local/lib/python2.7/dist-packages/tf_unet-0.1.0-py2.7.egg/tf_unet/unet.py", line 342, in _initialize
tf.summary.histogram('norm_grads', self.norm_gradients_node)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/summary/summary.py", line 209, in histogram
tag=scope.rstrip('/'), values=values, name=scope)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_logging_ops.py", line 139, in _histogram_summary
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1228, in init
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]

In addition to the training error, I also notice the loss is negative (for example, Minibatch Loss= -0.5287)

Data (dir+files) erased completely

Hi,
From a jupyter notebook, the following code completely erased the content of the above directory (~100 000 images, oops!):

output_path = os.path.join('..','DeepFISH','dataset','LowRes','train')
path = trainer.train(data_provider, output_path, training_iters=10, epochs=5)

The directory contained three directories (grey, groundtruth, UnetData).
Where should point output_path ?

Thanks

size of output images don't match with original paper

Joel,

I am currently working on your UNet architecture with my own dataset (images 500 x 376, according to your git topics #6).
I have several questions. Do you have time to share on them ?

On UNet paper the proposed architecture had 5 layers with a convolution filter 3x3 and 64 features.
If I compute the size of output image with the previous observation and my own dataset, I have in theory an output size of 308 x 184. In your code, you compute an offset which corresponds to the "real" offset equal to 192. But when I'm using your code I find a size of 308 x 180.

In the same way, when I'm trying to use a different size of convolution filter with different network depth (for instance 3 layers), the output size is not equal to the theoric size.

I start fixing this problem by changing the offset definition to take into account the number of border pixels with a variable size of convolution filter. So far, I think your code looks like only working with 3x3 convolution and it's necessary to re-compute the offset. May be I am wrong or I missed a step.
For instance you have writen on unet.py :

line 95 and 130 : size -=4
line 99 or 129 : size *= 2

I think these lines are correct only if the max-pooling is 2x2 and with a 3x3 convolution.

I try to understand how and why the network produce this gap size between theory and experiment.

I'm working on Python 3.5 and TensorFlow 0.12 GPU.

I'd be happy to share with you !

Gradients are zero after model is restored

Hello,

Someone else also has the problem that the gradients are initialized to zero after restoring the model? I would like to instead load the weights from the last saved point of training? Can someone help please?

Best regards, Amelia.

Typeerror in demo

Hi Jakeret,
I have been very interested in applying unet for my research. And it's been great to find out you already applied tensorflow with unet.
I've installed tensorflow on kubuntu linux. The version for tensorflow is 1.0.1 and Keras used by tensorflow is 1.2.2.

I wanted to try your demo_toy_problem.
however i expected typeerror problem during execution of the line net=unet.Unet

TypeError Traceback (most recent call last)
in ()
----> 1 net = unet.Unet(channels=generator.channels, n_class=generator.n_class, layers=3, features_root=16)

/home/davince/lib/python2.7/site-packages/tf_unet-0.1.0-py2.7.egg/tf_unet/unet.pyc in init(self, channels, n_class, cost, cost_kwargs, **kwargs)
187 self.keep_prob = tf.placeholder(tf.float32) #dropout (keep probability)
188
--> 189 logits, self.variables, self.offset = create_conv_net(self.x, self.keep_prob, channels, n_class, **kwargs)
190
191 self.cost = self._get_cost(logits, cost, cost_kwargs)

/home/davince/lib/python2.7/site-packages/tf_unet-0.1.0-py2.7.egg/tf_unet/unet.pyc in create_conv_net(x, keep_prob, channels, n_class, layers, features_root, filter_size, pool_size, summaries)
109 bd = bias_variable([features//2])
110 h_deconv = tf.nn.relu(deconv2d(in_node, wd, pool_size) + bd)
--> 111 h_deconv_concat = crop_and_concat(dw_h_convs[layer], h_deconv)
112 deconv[layer] = h_deconv_concat
113

/home/davince/lib/python2.7/site-packages/tf_unet-0.1.0-py2.7.egg/tf_unet/layers.pyc in crop_and_concat(x1, x2)
52 size = [-1, x2_shape[1], x2_shape[2], -1]
53 x1_crop = tf.slice(x1, offsets, size)
---> 54 return tf.concat(3, [x1_crop, x2])
55
56 def pixel_wise_softmax(output_map):

/home/davince/local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.pyc in concat(values, axis, name)
1027 ops.convert_to_tensor(axis,
1028 name="concat_dim",
-> 1029 dtype=dtypes.int32).get_shape(
1030 ).assert_is_compatible_with(tensor_shape.scalar())
1031 return identity(values[0], name=scope)

/home/davince/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.pyc in convert_to_tensor(value, dtype, name, preferred_dtype)
635 name=name,
636 preferred_dtype=preferred_dtype,
--> 637 as_ref=False)
638
639

/home/davince/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.pyc in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype)
700
701 if ret is None:
--> 702 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
703
704 if ret is NotImplemented:

/home/davince/local/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.pyc in _constant_tensor_conversion_function(v, dtype, name, as_ref)
108 as_ref=False):
109 _ = as_ref
--> 110 return constant(v, dtype=dtype, name=name)
111
112

/home/davince/local/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.pyc in constant(value, dtype, shape, name, verify_shape)
97 tensor_value = attr_value_pb2.AttrValue()
98 tensor_value.tensor.CopyFrom(
---> 99 tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
100 dtype_value = attr_value_pb2.AttrValue(type=tensor_value.tensor.dtype)
101 const_tensor = g.create_op(

/home/davince/local/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.pyc in make_tensor_proto(values, dtype, shape, verify_shape)
365 nparray = np.empty(shape, dtype=np_dt)
366 else:
--> 367 _AssertCompatible(values, dtype)
368 nparray = np.array(values, dtype=np_dt)
369 # check to them.

/home/davince/local/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.pyc in _AssertCompatible(values, dtype)
300 else:
301 raise TypeError("Expected %s, got %s of type '%s' instead." %
--> 302 (dtype.name, repr(mismatch), type(mismatch).name))
303
304

TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

I wonder if you know that could cause this problem? was it because my installation of a newer version of tensorflow?

thank you.
Best,

Questions about UNet architecture and Dice loss

Hello,

I have two questions about the UNet architecture and Dice loss function:

In the UNet paper, there is no Relu layer after the final conv. In experiments of my implemented version of UNet, such a Relu doesn't help. I don't whether the Relu in Line 136, unet.py helps or not.
IMHO, Dice loss can be regarded as # of True Positives / (# of Positives + # of False Positives), but the Dice loss in Line 231, unet.py count both positives and negatives if I understand the code correctly.

Please correct me if I am wrong. Thanks.

batch_size can't be bigger than 1?

Hi, your default batch_size for training is 1 in line 305 of unet.py. This confuses me because I think a bigger batch_size is better (I usually use 64 as batch_size in image classification).
Well I tested batch_size 2 or 3 or 5, and met "GPU out of memory" problem. So is it the reason why you use batch_size = 1? Or maybe there are other reasons that batch_size has to be 1? If batch_size could be bigger than 1, then is there any solutions to reduce parameter numbers to save GPU memory?
I'm new to this field, thank you so much about your sharing.
:)

No error backpropagation through skip connections

Hi,

tf.image.extract_glimpse() is used in the implementation of the crop_and_concat() function.

tf.image.extract_glimpse() does not allow for backpropagation of gradients (if it was the only connection between two nodes the gradient returned by TensorFlow would be None) and the gradients with respect to the segmentation error for all layers before a skip connection will therefore be off.

tf.slice() should be used instead.

tf.nn.conv2d_transpose?

Hi,

Thank you for the great work!

Reading the original U-Net paper and comparing with your implementation, it looks like you have used tf.nn.conv2d_transpose to implement "up-sampling followed by convolution". But the tensorflow documentation says that this function "the transpose (gradient) of conv2d". I am wondering if this is the right operation to use here.

Weights before softmax error in the weighted loss function

Hi,

in the implementation of the weighted loss function the weights are applied to the logits before the softmax activation function. The result for a two class problem is that the bigger value after the application of the softmax function will increase, the smaller value will decrease. In other words, the network will look more confident in its predictions. If the weight was large and the prediction was wrong the gradients will also be larger though not necessarily by the expected amount. If the prediction was right, however, the gradients will be smaller than they would have been otherwise.

To ensure correct scaling, the weights should be applied after the call to tf.nn.softmax_cross_entropy_with_logits() and before the call to tf.reduce_mean()

Error with using a custom dataset and image_util.ImageDataProvider()

Hey!

Thank you for your help and for the updates. Unfortunately, we're still having trouble using a custom dataset and we're hoping you can help us.

The main error produced is this;

....
tensorflow.python.framework.errors.InvalidArgumentError: logits and labels must be same size: logits_size=[709520,2] labels_size=[710500,2]
	 [[Node: SoftmaxCrossEntropyWithLogits = SoftmaxCrossEntropyWithLogits[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_29, Reshape_30)]]
....

We're working with rgb images of size (767x1022) and using the data provider function like this (where everything here seems to be working as expected);
data_provider = image_util.ImageDataProvider('reduced_segmentation_dataset/*', data_suffix=".jpg", mask_suffix='_mask.png')

The other thing we noticed is that when we call path = trainer.train(data_provider, "./skin_trained", training_iters=10, epochs=4, display_step=2) it defaults to calling test_x, test_y = data_provider(4) on line 367 of master/tf_unet/unet.py If we call data_provider(1) we seem to get the results we expect and bypass the error but the above error is still preventing us from a full run.

Do you have any ideas why we're having this mismatch? I'd be happy to provide more information as needed.

Optimization of last convolutional layer

Hello!

I've just noticed that "weight" and "bias" from the last convolutional layer are not included in "variables", aren't they supposed to be optimized as well? Is there any reason for not including them?

Best regards, Amelia.

prediction.shape different from input.shape

Here is my code

from tf_unet import unet, util, image_util
data_provider = image_util.ImageDataProvider("images3/*.png", data_suffix='.png', mask_suffix='_mask.png')
net = unet.Unet(layers=2, features_root=128, channels=3, n_class=2) # 
trainer = unet.Trainer(net)
path = trainer.train(data_provider, "train/", dropout=0.5, training_iters=32, epochs=1, display_step=16)
x_test = a._load_file("images4/0393_1.png")
x_test = a._process_data([x_test])
prediction = net.predict(path, x_test)

prediction.shape returns (1, 240, 240, 2)
x_test.shape returns (1, 256, 256, 3),
prediction image size is smaller

If layers=1, prediction shape is as 252x252.
At layers=5 I get prediction of something like 96x96.
And it is not scaled down version of the image, it just cuts the center of the image instead.

Is it a bug or a feature? How can I get a full shape while using more layers?

Segmenting RGB Images?

I'm trying to use this to segment RGB images, however I keep getting an error saying I need to be in grayscale like such...

ValueError: could not broadcast input array from shape (300,500,3) into shape (300,500)

So I converted all my labeled images to grayscale and then it started to work, but during optimization crashes again with the same error.

Can I take an RGB image, have it labelled in grayscale, and then run it with this library or do I have to do something else?

Thanks!

jakeret / tf_unet Goto Github PK

tf_unet's People

Contributors

Stargazers

Watchers

Forkers

tf_unet's Issues

Recommend Projects

Recommend Topics

Recommend Org