jakeret / tf_unet Goto Github PK
View Code? Open in Web Editor NEWGeneric U-Net Tensorflow implementation for image segmentation
License: GNU General Public License v3.0
Generic U-Net Tensorflow implementation for image segmentation
License: GNU General Public License v3.0
Hi
I started as per this link,
Got error when at this python command, data_provider = image_util.ImageDataProvider("fishes/train/*.tif")
AssertionError: No training files
How to get these training files? any other link for prerequisites available?, Please help.
Regards
Gopi. J
How to use tensorboard in this code?
I want to train the unet with nifit data. I load the training data and corresponding label into workspace and obtain two 3-dimensional matrices: train_data.shape = (512,512,200) , train_label = (512, 512, 200). Both the data types are int16.
I have no idea to train the unet with these data.
Would it be possible for you to tell me how to make data_provider for following training?
path = trainer.train(data_provider, output_path, training_iters=32, epochs=100)
Looking forward to your reply.
Best wishes to you.
Hi
I am using this code right now,
is there any setting that should be considered about using this code?
i.e. input pixel values range? ground-truth numbering?....?
I used this code
but the result is very bad and the code output is below 0.5 and should be considered as 0, at result
whole of output shown black.
How I can optimize the result?
please please help!!!
Hi
I am just new to this area.I am using dice loss function for image segmentation. I found different presentation so got confused. Can you please explain me why there 1-
for example : if my network for batch size 1 gives me values Top =0.9 ,bottom =0.9 then reduce mean is 0.9 and loss is 0.1 as per below equation.
loss = 1 - tf.reduce_mean(2 * intersection/ (union))
But I found some codes ,they are just using
loss =np.sum(2*intersection/(union))
so can you please clear this confusion?
if regularizer is not None:
regularizers = sum([tf.nn.l2_loss(variable) for variable in self.variables])
loss += (regularizer * regularizers)
it seems like that you have regularization on biases, as the self.variables included the biases
` variables = []
for w1,w2 in weights:
variables.append(w1)
variables.append(w2)
for b1,b2 in biases:
variables.append(b1)
variables.append(b2)`
Hello,
Importing image_util failed as shown bellow
/tf_unet$ python
Python 3.5.2 (default, Sep 10 2016, 08:21:44)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tf_unet
>>> from tf_unet import unet, util, image_util
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so.5.1.5 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so.8.0 locally
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name 'image_util'
Additionnal information: tf_unet was imported from python 3 virtual environement with :
>>> print(tf_unet.__version__)
0.1.0
>>> import tensorflow
>>> print(tensorflow.__version__)
0.11.0rc1
I am using Tensorflow 0.12 along with the latest master branch. I have trained the u-net for around a 1000 patches and the results from the prediction images are encouraging. But when I save the model and restore it in another session, the output is similar to that of a freshly initialized model. Please let me know if anybody has managed to run a save and restore operation and if any specific changes are needed.
I see that predictions are shifted when I overpay predictions on top of image. Is there any bug related to this?
Please check the second answer in this StackOverflow post.
The error is:
Traceback (most recent call last):
File "my_unet.py", line 12, in
path = trainer.train(data_provider, "./unet_trained", training_iters=20, epochs=10, display_step=2)
File "/usr/local/lib/python2.7/dist-packages/tf_unet-0.1.0-py2.7.egg/tf_unet/unet.py", line 399, in train
test_x, test_y = data_provider(self.verification_batch_size)
File "/usr/local/lib/python2.7/dist-packages/tf_unet-0.1.0-py2.7.egg/tf_unet/image_util.py", line 98, in call
X[i] = train_data
ValueError: could not broadcast input array from shape (500,333,3) into shape (375,500,3)
I have validated that the dimension of color image and corresponding mask is same for all.
numpy version: 1.12.1
tensorflow version: 1.0.1
Thanks for shearing the tf_unet code and it is an excellent work.
I want to train the unet with custom nifti image data. Each data is a three-dimensional volume data.
I failed to use the SimpleDataProvider and ImageDataProvider class.
So, I try to write 'NiiImageDataProvider' to read nifti data and train the unet.
Following is NiiImageDataProvider class code, It can correctly read the nifti data in local path.
However, when training the unet, I get an error in __call__
function.
I don't know how to modify this function. I'll appreciate a lot if anyone can help me fix it.
Here are my data and code
import nibabel as nb
class NiiImageDataProvider(BaseDataProvider):
"""
This is an introduction to NiiImageDataProvider (similar to ImageDataProvider provided by jakeret)
Generic data provider for nifti images.
Assumes that the data images and label images are stored in the same folder
and that the labels have a different file prefix
e.g. 'demo/data-1.nii' and 'demo/data-label-1.nii'
Usage:
data_provider = image_util.NiiImageDataProvider("../tf_unet-master/demo/*.nii")
:param nii_path: a glob search pattern to find all data and label images
:param a_min: (optional) min value used for clipping
:param a_max: (optional) max value used for clipping
:param Nii_data_prefix: prefix pattern for the data images. Default 'data-*.nii'
:param Nii_Label_prefix: prefix pattern for the label images. Default 'data-label-*.nii'
"""
channels = 1
n_class = 2
def __init__(self, nii_path, a_min=None, a_max=None, Nii_data_prefix="data", Nii_Label_prefix='data-label'):
super(NiiImageDataProvider, self).__init__(a_min, a_max)
self.Nii_data_prefix = Nii_data_prefix
self.Nii_Label_prefix = Nii_Label_prefix
self.file_idx = -1
self.nii_data_files = self._find_niidata_files(nii_path)
print(nii_path)
assert len(self.nii_data_files) > 0, "No nii training data"
print("Number of files used: %s" % len(self.nii_data_files))
img = self._load_nii_file(self.nii_data_files[0])
img = img.get_data() # get 3-dimension image data in nifti image
self.channels = 1 if len(img.shape) == 3 else img.shape[-1]
def _find_niidata_files(self, nii_path):
all_niifiles = glob.glob(nii_path)
return [name for name in all_niifiles if not self.Nii_Label_prefix in name]
def _load_nii_file(self, path):
return nb.load(path)
def _cylce_nii_file(self):
self.file_idx += 1
if self.file_idx >= len(self.nii_data_files):
self.file_idx
def _next_data(self):
self._cylce_nii_file()
nii_image_name = self.nii_data_files[self.file_idx]
nii_label_name = nii_image_name.replace(self.Nii_data_prefix, self.Nii_Label_prefix)
nii_image = self._load_nii_file(nii_image_name)
nii_image = nii_image.get_data()
nii_image = nii_image.transpose(2, 0, 1)
nz = nii_image.shape[0]
nii_image = nii_image.reshape(nz, 512, 512, 1)
nii_label = self._load_nii_file(nii_label_name)
nii_label = nii_label.get_data()
nii_label[nii_label == 2] = 1
nii_label = nii_label.transpose(2, 0, 1)
nii_label = nii_label.reshape(nz, 512, 512, 1)
return nii_image, nii_label
Test code:
from tf_unet import unet
from tf_unet import image_util
# read data
data_provider = image_util.NiiImageDataProvider("X:/tf_unet-master/test_nii_provider/*.nii")
# set parameter
net = unet.Unet(channels=1, n_class=3, layers=3, features_root=16)
trainer = unet.Trainer(net, optimizer="momentum", opt_kwargs=dict(momentum=0.2))
# train
path = trainer.train(data_provider, "./seg_3dim_data_trained_0509", training_iters=5, epochs=1, display_step=2)
Error:
Traceback (most recent call last):
File "G:/Tensorflow/tf_unet-master/test_nii_provider/test_nii_provider.py", line 9, in <module>
path = trainer.train(data_provider, "./seg_3dim_data_trained_0509", training_iters=5, epochs=1, display_step=2)
File "G:\Tensorflow\tf_unet-master\tf_unet\unet.py", line 403, in train
test_x, test_y = data_provider(self.verification_batch_size)
File "G:\Tensorflow\tf_unet-master\tf_unet\image_util.py", line 98, in __call__
train_data, labels = self._load_data_and_label()
File "G:\Tensorflow\tf_unet-master\tf_unet\image_util.py", line 58, in _load_data_and_label
return train_data.reshape(1, ny, nx, self.channels), labels.reshape(1, ny, nx, self.n_class),
ValueError: cannot reshape array of size 19398656 into shape (1,74,512,1)
Hi,
In the toy problem, if I remove output's 2nd class ( i.e. the output map's NOT class),
then will the Unet give similar results?
The reason I am asking this is - I am trying to implement unet for a 6 class problem and there is a high class imbalance towards the 'background' class. So, now I am considering removing the background class from output map (thus, making it a 5 class problem). In this scenario, the pixels corresponding to background class will have 0s in all the other 5 classes. Since, the dice coeff. does not take into account True Negatives for cost calculation, this trick should work out fine.
I am using Dice coeff with Momentum Optimizer and have tried various layers, feature_root combinations.
But, with this setup, the network is not learning. After some training, the most frequent class becomes white and all other classes become black. The weight histograms are not changing at all as training continues. The gradients are decreasing slowly. And Dice loss is remaining almost constant.
I have been experimenting with this for last 15-20 days. Any help regarding this would make my day. :)
(@jakeret - Since, this issue is not related to technical problems in tf_unet, I will close it after some time if there is no response.)
Hi,
I met an issue regarding a tensor dimension:
ValueError: Cannot feed value of shape (4, 80, 82, 2) for Tensor 'Placeholder_1:0', which has shape '(?, ?, ?, 4)'
The dataset consists in pairs of greyscale / groundtruth images, both images has a shape (80,82,1). The groundtruth contains pixels of value:
The following code (modified from https://tf-unet.readthedocs.io/en/latest/usage.html) illustrates the nature of the data and the way the code fails:
Thanks for your advices.
JP
Hi,
my own dataset directory structure is:
data
|--<label1>
|--<image001.jpg>
|--<image001.png> (mask image for image001.jpg)
|--<image002.jpg>
|--<image002.png>
|......
|--<label2>
|--<image001.jpg>
|--<image001.png>
|......
|......
I also split the dataset into train and validating sets:
data
|--train
|--<label1>
|--<image001.jpg>
|--<image001.png>
|......
|--<label2>
|--<image001.jpg>
|--<image001.png>
|......
|......
|--val
|--<label1>
|--<image002.jpg>
|--<image002.png>
|......
|--<label2>
|--<image002.jpg>
|--<image002.png>
|......
May I ask how to load this dataset for training and validating?
In calculating the dice loss, we should be taking the softmax of the logits before using it to calculate the loss function.
tf_unet ran into multiple issues after I upgraded to Tensorflow 1.0
Are there any plans to update the package?
Have you guys figured out how to incorporate Multi-GPU training as explained in the Cifar-10 tutorial?
Hello Joel,
Thank you very much for sharing your code, which is very well written.
I have several questions, would you mind sharing your thoughts on them?
row*column*2.
For my use case, the input masking training data set is of shape row*column*1
. Do I have to transform my input masking training data set into the form of row*column*2
. Are there any reason that you would like to specify the mask data set that way?in_size=1000, and size=in_size
. Value of size is changed during convolution, pooling, deconv and unpooling operations. Then create_conv_net will return in_size-size as offset, which will be used to compute px and py. This is copied from the program returns prediction: The unet prediction Shape [n, px, py, labels] (px=nx-self.offset/2)
test_x, test_y = data_provider(4)
pred_shape = self.store_prediction(sess, test_x, test_y, "_init")
Thank you very much for your help.
Thanks for your work.
I am trying to apply u-net to a small dataset of 16 patches (it is really small but I need to understand if I am doing some mistakes). Specifically, I prepared a binary mask to define what is positive (1) and negative (0).
Unfortunately, I found two problems:
1- If I select a patch without positive elements, the network normalised the prediction probability.
2- Typical negative elements (with completely different colours) are evaluated as positive.
I started from launcher file placed in script folder and change it to evaluate pictures.
To conclude, I use as test the same pictures of training.
Do you have any suggestion?
Thanks,
Giovanni
Hi,
I have a trouble while running my code:
# Import data
print('Loading dataset...\n')
X_data = np.load(DATASET_FOLDER+"X_data.npy")
y_data = np.load(DATASET_FOLDER+"y_data.npy")
X_test = np.load(DATASET_FOLDER+"X_test.npy")
y_test = np.load(DATASET_FOLDER+"y_test.npy")
print("TRAIN data shape: ", X_data.shape)
print("TRAIN labels shape", y_data.shape)
print("TEST data shape: ", X_test.shape)
print("TEST labels shape: ", y_test.shape)
X_data = np.float32(X_data)
y_data = np.float32(y_data)
X_test = np.float32(X_test)
y_test = np.float32(y_test)
training_iters = 20
epochs = 100
dropout = 0.75 # Dropout, probability to keep units
display_step = 2
restore = False
data_provider = image_util.SimpleDataProvider(X_data, y_data, channels=2, n_class=1)
net = unet.Unet(channels=2, n_class=1, layers=4, features_root=64, cost="dice_coefficient")
trainer = unet.Trainer(net, optimizer="adam")
path = trainer.train(data_provider, "./unet_trained", training_iters=training_iters, epochs=epochs, dropout=dropout, display_step=display_step, restore=restore)
prediction = net.predict(path, X_test)
print("Testing error rate: {:.2f}%".format(unet.error_rate(prediction, util.crop_to_shape(y_test, prediction.shape))))
The error is:
Loading dataset...
TRAIN data shape: (1560, 128, 128, 2)
TRAIN labels shape (1560, 128, 128)
TEST data shape: (120, 128, 128, 2)
TEST labels shape: (120, 128, 128)
2017-06-23 15:07:05,594 Layers 4, features 64, filter size 3x3, pool size: 2x2
2017-06-23 15:07:07,878 Removing '/home/stefano/Dropbox/DeepWave/prediction'
2017-06-23 15:07:07,878 Removing '/home/stefano/Dropbox/DeepWave/unet_trained'
2017-06-23 15:07:07,878 Allocating '/home/stefano/Dropbox/DeepWave/prediction'
2017-06-23 15:07:07,879 Allocating '/home/stefano/Dropbox/DeepWave/unet_trained'
2017-06-23 15:07:07.879575: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-23 15:07:07.879602: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-23 15:07:07.879615: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-06-23 15:07:10,201 Verification error= 0.0%, loss= -0.0000
Traceback (most recent call last):
File "Unet.py", line 45, in <module>
path = trainer.train(data_provider, "./unet_trained", training_iters=training_iters, epochs=epochs, dropout=dropout, display_step=display_step, restore=restore)
File "./tf_unet/unet.py", line 404, in train
pred_shape = self.store_prediction(sess, test_x, test_y, "_init")
File "./tf_unet/unet.py", line 457, in store_prediction
img = util.combine_img_prediction(batch_x, batch_y, prediction)
File "/home/stefano/Dropbox/DeepWave/tf_unet/util.py", line 104, in combine_img_prediction
to_rgb(crop_to_shape(gt[..., 1], pred.shape).reshape(-1, ny, 1)),
IndexError: index 1 is out of bounds for axis 3 with size 1
combile_img_prediction function has the following argument shapes:
(4, 128, 128, 1) --> gt
(4, 128, 128, 2) --> data
(4, 36, 36, 1) --> pred
My datasets have the following shapes:
TRAIN data shape: (1560, 128, 128, 2)
TRAIN labels shape (1560, 128, 128)
TEST data shape: (120, 128, 128, 2)
TEST labels shape: (120, 128, 128)
How can I solve the issue?
Thank you! 👍
EDIT: sorry.. obviously n_class was 2. I corrected the error... but now i have:
Traceback (most recent call last):
File "Unet.py", line 43, in <module>
path = trainer.train(data_provider, "./unet_trained", training_iters=training_iters, epochs=epochs, dropout=dropout, display_step=display_step, restore=restore)
File "./tf_unet/unet.py", line 403, in train
test_x, test_y = data_provider(self.verification_batch_size)
File "./tf_unet/image_util.py", line 89, in __call__
train_data, labels = self._load_data_and_label()
File "./tf_unet/image_util.py", line 50, in _load_data_and_label
labels = self._process_labels(label)
File "./tf_unet/image_util.py", line 65, in _process_labels
labels[..., 0] = ~label
TypeError: ufunc 'invert' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Hi Jakeret,
When running demo_toy_problem.ipynb, executing this line of code path = trainer.train(generator, "./unet_trained", training_iters=20, epochs=100, display_step=2)
. I got the following error messages, do you mind sharing any insights on what causes the problem
Verification error= 16.7%, loss= 0.8326
Start optimization
Traceback (most recent call last):
File "/test/tfw/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 480, in _process_fetches
allow_operation=True)
File "/test/tfw/lib/python3.4/site-packages/tensorflow/python/framework/ops.py", line 2301, in as_graph_element
% (type(obj).name, types_str))
TypeError: Can not convert a list into a Tensor or Operation.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test.py", line 30, in
path = trainer.train(generator, "./unet_trained", training_iters=20, epochs=100, display_step=2)
File "/home/user/test/u-net/ver3/unet.py", line 364, in train
self.net.keep_prob: dropout})
File "/test/tfw/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 340, in run
run_metadata_ptr)
File "/test/tfw/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 523, in _run
processed_fetches = self._process_fetches(fetches)
File "/test/tfw/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 493, in _process_fetches
% (subfetch, fetch, type(subfetch), str(e)))
TypeError: Fetch argument [<tf.Tensor 'gradients/Conv2D_grad/Conv2DBackpropFilter:0' shape=(3, 3, 1, 16) dtype=float32>, <tf.Tensor 'gradients/Conv2D_1_grad/Conv2DBackpropFilter:0' shape=(3, 3, 16, 16) dtype=float32>, <tf.Tensor 'gradients/Conv2D_2_grad/Conv2DBackpropFilter:0' shape=(3, 3, 16, 32) dtype=float32>, <tf.Tensor 'gradients/Conv2D_3_grad/Conv2DBackpropFilter:0' shape=(3, 3, 32, 32) dtype=float32>, <tf.Tensor 'gradients/Conv2D_4_grad/Conv2DBackpropFilter:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Tensor 'gradients/Conv2D_5_grad/Conv2DBackpropFilter:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Tensor 'gradients/Conv2D_6_grad/Conv2DBackpropFilter:0' shape=(3, 3, 64, 32) dtype=float32>, <tf.Tensor 'gradients/Conv2D_7_grad/Conv2DBackpropFilter:0' shape=(3, 3, 32, 32) dtype=float32>, <tf.Tensor 'gradients/Conv2D_8_grad/Conv2DBackpropFilter:0' shape=(3, 3, 32, 16) dtype=float32>, <tf.Tensor 'gradients/Conv2D_9_grad/Conv2DBackpropFilter:0' shape=(3, 3, 16, 16) dtype=float32>, <tf.Tensor 'gradients/add_grad/Reshape_1:0' shape=(16,) dtype=float32>, <tf.Tensor 'gradients/add_1_grad/Reshape_1:0' shape=(16,) dtype=float32>, <tf.Tensor 'gradients/add_2_grad/Reshape_1:0' shape=(32,) dtype=float32>, <tf.Tensor 'gradients/add_3_grad/Reshape_1:0' shape=(32,) dtype=float32>, <tf.Tensor 'gradients/add_4_grad/Reshape_1:0' shape=(64,) dtype=float32>, <tf.Tensor 'gradients/add_5_grad/Reshape_1:0' shape=(64,) dtype=float32>, <tf.Tensor 'gradients/add_7_grad/Reshape_1:0' shape=(32,) dtype=float32>, <tf.Tensor 'gradients/add_8_grad/Reshape_1:0' shape=(32,) dtype=float32>, <tf.Tensor 'gradients/add_10_grad/Reshape_1:0' shape=(16,) dtype=float32>, <tf.Tensor 'gradients/add_11_grad/Reshape_1:0' shape=(16,) dtype=float32>] of [<tf.Tensor 'gradients/Conv2D_grad/Conv2DBackpropFilter:0' shape=(3, 3, 1, 16) dtype=float32>, <tf.Tensor 'gradients/Conv2D_1_grad/Conv2DBackpropFilter:0' shape=(3, 3, 16, 16) dtype=float32>, <tf.Tensor 'gradients/Conv2D_2_grad/Conv2DBackpropFilter:0' shape=(3, 3, 16, 32) dtype=float32>, <tf.Tensor 'gradients/Conv2D_3_grad/Conv2DBackpropFilter:0' shape=(3, 3, 32, 32) dtype=float32>, <tf.Tensor 'gradients/Conv2D_4_grad/Conv2DBackpropFilter:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Tensor 'gradients/Conv2D_5_grad/Conv2DBackpropFilter:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Tensor 'gradients/Conv2D_6_grad/Conv2DBackpropFilter:0' shape=(3, 3, 64, 32) dtype=float32>, <tf.Tensor 'gradients/Conv2D_7_grad/Conv2DBackpropFilter:0' shape=(3, 3, 32, 32) dtype=float32>, <tf.Tensor 'gradients/Conv2D_8_grad/Conv2DBackpropFilter:0' shape=(3, 3, 32, 16) dtype=float32>, <tf.Tensor 'gradients/Conv2D_9_grad/Conv2DBackpropFilter:0' shape=(3, 3, 16, 16) dtype=float32>, <tf.Tensor 'gradients/add_grad/Reshape_1:0' shape=(16,) dtype=float32>, <tf.Tensor 'gradients/add_1_grad/Reshape_1:0' shape=(16,) dtype=float32>, <tf.Tensor 'gradients/add_2_grad/Reshape_1:0' shape=(32,) dtype=float32>, <tf.Tensor 'gradients/add_3_grad/Reshape_1:0' shape=(32,) dtype=float32>, <tf.Tensor 'gradients/add_4_grad/Reshape_1:0' shape=(64,) dtype=float32>, <tf.Tensor 'gradients/add_5_grad/Reshape_1:0' shape=(64,) dtype=float32>, <tf.Tensor 'gradients/add_7_grad/Reshape_1:0' shape=(32,) dtype=float32>, <tf.Tensor 'gradients/add_8_grad/Reshape_1:0' shape=(32,) dtype=float32>, <tf.Tensor 'gradients/add_10_grad/Reshape_1:0' shape=(16,) dtype=float32>, <tf.Tensor 'gradients/add_11_grad/Reshape_1:0' shape=(16,) dtype=float32>] has invalid type <class 'list'>, must be a string or Tensor. (Can not convert a list into a Tensor or Operation.)
Hi,
I get an error when I tried training with dice coefficient as the ago function. I noticed there was a new commit on this a couple days ago so I suspect it's some bug in the code. Would you know roughly where this might be?
InvalidArgumentError Traceback (most recent call last)
in ()
----> 1 path = trainer.train(generator, "./unet_trained", training_iters=100, epochs=100, display_step=5)
/home/proj/tf_unet/tf_unet/unet.pyc in train(self, data_provider, output_path, training_iters, epochs, dropout, display_step, restore)
424
425 if step % display_step == 0:
--> 426 self.output_minibatch_stats(sess, summary_writer, step, batch_x, util.crop_to_shape(batch_y, pred_shape))
427
428 total_loss += loss
/home/proj/tf_unet/tf_unet/unet.pyc in output_minibatch_stats(self, sess, summary_writer, step, batch_x, batch_y)
467 feed_dict={self.net.x: batch_x,
468 self.net.y: batch_y,
--> 469 self.net.keep_prob: 1.})
470 summary_writer.add_summary(summary_str, step)
471 summary_writer.flush()
/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
764 try:
765 result = self._run(None, fetches, feed_dict, options_ptr,
--> 766 run_metadata_ptr)
767 if run_metadata:
768 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
962 if final_fetches or final_targets:
963 results = self._do_run(handle, final_targets, final_fetches,
--> 964 feed_dict_string, options, run_metadata)
965 else:
966 results = []
/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1012 if handle is None:
1013 return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
-> 1014 target_list, options, run_metadata)
1015 else:
1016 return self._do_call(_prun_fn, self._session, handle, feed_dict,
/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _do_call(self, fn, *args)
1032 except KeyError:
1033 pass
-> 1034 raise type(e)(node_def, op, message)
1035
1036 def _extend_graph(self):
InvalidArgumentError: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_37/read)]]
Caused by op u'norm_grads', defined at:
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/local/lib/python2.7/dist-packages/ipykernel/main.py", line 3, in
app.launch_new_instance()
File "/usr/local/lib/python2.7/dist-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelapp.py", line 474, in start
ioloop.IOLoop.instance().start()
File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/ioloop.py", line 177, in start
super(ZMQIOLoop, self).start()
File "/usr/local/lib/python2.7/dist-packages/tornado/ioloop.py", line 887, in start
handler_func(fd_obj, events)
File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
self._handle_recv()
File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelbase.py", line 276, in dispatcher
return self.dispatch_shell(stream, msg)
File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelbase.py", line 228, in dispatch_shell
handler(stream, idents, msg)
File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelbase.py", line 390, in execute_request
user_expressions, allow_stdin)
File "/usr/local/lib/python2.7/dist-packages/ipykernel/ipkernel.py", line 196, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/usr/local/lib/python2.7/dist-packages/ipykernel/zmqshell.py", line 501, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2717, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2821, in run_ast_nodes
if self.run_code(code, result):
File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
path = trainer.train(generator, "./unet_trained", training_iters=100, epochs=100, display_step=5)
File "/home/proj/tf_unet/tf_unet/unet.py", line 389, in train
init = self._initialize(training_iters, output_path, restore)
File "/home/proj/tf_unet/tf_unet/unet.py", line 342, in _initialize
tf.summary.histogram('norm_grads', self.norm_gradients_node)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/summary/summary.py", line 205, in histogram
tag=scope.rstrip('/'), values=values, name=scope)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_logging_ops.py", line 139, in _histogram_summary
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1128, in init
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_37/read)]]
I am having trouble running the code provided in the usage section (http://tf-unet.readthedocs.io/en/latest/usage.html)
I created a folder named images and placed my .jpg files in that folder. Here is my code:
from tf_unet import unet, util, image_util
#preparing data loading
data_provider = image_util.ImageDataProvider("images/*.jpg")
#setup & training
net = unet.Unet(layers=3, features_root=64, channels=1, n_class=2)
trainer = unet.Trainer(net)
output_path = "train/"
path = trainer.train(data_provider, output_path, training_iters=32, epochs=100)
#verification
#...
prediction = net.predict(path, data)
unet.error_rate(prediction, util.crop_to_shape(label, prediction.shape))
img = util.combine_img_prediction(data, label, prediction)
util.save_image(img, "prediction.jpg")
And here is the error message:
[aaydin@sn-nvda3 unet_test]$ python test.py
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Number of files used: 100
2017-03-07 07:54:16,681 Layers 3, features 64, filter size 3x3, pool size: 2x2
Traceback (most recent call last):
File "test.py", line 12, in <module>
path = trainer.train(data_provider, output_path, training_iters=32, epochs=100)
File "/gpfs/home/aaydin/tf_unet/tf_unet/unet.py", line 386, in train
init = self._initialize(training_iters, output_path, restore)
File "/gpfs/home/aaydin/tf_unet/tf_unet/unet.py", line 345, in _initialize
self.optimizer = self._get_optimizer(training_iters, global_step)
File "/gpfs/home/aaydin/tf_unet/tf_unet/unet.py", line 321, in _get_optimizer
**self.opt_kwargs).minimize(self.net.cost,
TypeError: __init__() takes at least 3 arguments (2 given)
It looks like this is not related to my code. It should have something to do with tf-unet.
Hello,
I don't see any documentation for the parameters
"features_root" and "layers"
many thanks for updating
Peter
With in layers, line 24 till 29. These two function are exactly the same, why do they both exist?
def weight_variable(shape, stddev=0.1):
initial = tf.truncated_normal(shape, stddev=stddev)
return tf.Variable(initial)
def weight_variable_devonc(shape, stddev=0.1):
return tf.Variable(tf.truncated_normal(shape, stddev=stddev))
Hi and congrats for this promising project. Did you consider porting it to Keras 2 so people with little background in machine learning can easily understand and tweak your UNet implementation ?
Also Keras 2 will be supported for years and allow user to run their model on multiple backends such as Theano, Tensorflow, JavaScript, Scala, Java, etc). See here for more details : https://blog.keras.io/introducing-keras-2.html
That would be really nice !
tensorflow/models.py script tries to import conv3d and deconv3d wrappers from tf_unet/layers that don't exist.
I am trying to implement a binary segmentation with data of high imbalance. I am getting a true positive as zero and false negative is always 100%. I have modified the error_rate() function to print the same as follows.
def error_rate(predictions, labels):
y_ = labels[...,1]
tp = 0
fn = 0
Np = 0
Nn = 0
for i in range(y_.shape[0]):
for j in range(y_.shape[1]):
for k in range(y_.shape[2]):
if y_[i][j][k]:
Np += 1
if predictions[i][j][k][1] > predictions[i][j][k][0]:
tp += 1
else:
Nn += 1
if predictions[i][j][k][1] < predictions[i][j][k][0]:
fn += 1
if Np == 0:
tp = -1
else:
tp = tp/Np
return tp*100, (fn/Nn*100)
I am using cross-entropy loss and I have tried with various class weights [0.01, 0.99], [1, 100]. Kindly provide some insight into what might be going wrong. The dataset I am using has sliced CBCT images and every image has labels of both classes present.
Hi,
The demo_toy_problem.ipynb was run from a python 2 virtualenv. tensorflow 1.01 is installed (gpu version). When running cell 7, I got the following error:
AttributeError Traceback (most recent call last)
<ipython-input-7-778ffd89a317> in <module>()
----> 1 net = unet.Unet(channels=generator.channels, n_class=generator.n_class, layers=3, features_root=16)
/home/jeanpat/DeepFISH-Github_projects/tf_unet/tf_unet/unet.py in __init__(self, nx, ny, channels, n_class, add_regularizers, class_weights, **kwargs)
187 self.keep_prob = tf.placeholder(tf.float32) #dropout (keep probability)
188
--> 189 logits, self.variables, self.offset = create_conv_net(self.x, self.keep_prob, channels, n_class, **kwargs)
190
191 if class_weights is not None:
/home/jeanpat/DeepFISH-Github_projects/tf_unet/tf_unet/unet.py in create_conv_net(x, keep_prob, channels, n_class, layers, features_root, filter_size, pool_size, summaries)
55 nx = tf.shape(x)[1]
56 ny = tf.shape(x)[2]
---> 57 x_image = tf.reshape(x, tf.pack([-1,nx,ny,channels]))
58 in_node = x_image
59 batch_size = tf.shape(x_image)[0]
AttributeError: module 'tensorflow' has no attribute 'pack'
Is there some solution to fix the issue?
Dear all
Please give an example hot to save, load and use trained model in this code?
Thanks
My custom training dataset has 5000 color images and 5000 corresponding mask images.
2017-04-21 21:21:12,678 Start optimization
2017-04-21 21:21:14,525 Iter 0, Minibatch Loss= 0.6760, Training Accuracy= 0.6090, Minibatch error= 39.1%
2017-04-21 21:21:15,047 Iter 2, Minibatch Loss= 0.7317, Training Accuracy= 0.4318, Minibatch error= 56.8%
2017-04-21 21:21:15,608 Iter 4, Minibatch Loss= 0.5663, Training Accuracy= 0.7503, Minibatch error= 25.0%
2017-04-21 21:21:16,575 Iter 6, Minibatch Loss= 0.5671, Training Accuracy= 0.7178, Minibatch error= 28.2%
2017-04-21 21:21:17,157 Iter 8, Minibatch Loss= 0.3097, Training Accuracy= 0.8937, Minibatch error= 10.6%
2017-04-21 21:21:17,694 Iter 10, Minibatch Loss= 0.5114, Training Accuracy= 0.7774, Minibatch error= 22.3%
2017-04-21 21:21:18,233 Iter 12, Minibatch Loss= 0.6332, Training Accuracy= 0.6583, Minibatch error= 34.2%
2017-04-21 21:21:18,710 Iter 14, Minibatch Loss= 0.5695, Training Accuracy= 0.7293, Minibatch error= 27.1%
2017-04-21 21:21:19,249 Iter 16, Minibatch Loss= 0.4922, Training Accuracy= 0.8320, Minibatch error= 16.8%
2017-04-21 21:21:19,754 Iter 18, Minibatch Loss= 0.5962, Training Accuracy= 0.7211, Minibatch error= 27.9%
2017-04-21 21:21:19,977 Epoch 0, Average loss: 0.5595, learning rate: 0.2000
2017-04-21 21:21:20,200 Verification error= 15.3%, loss= 0.4723
2017-04-21 21:27:59,767 Epoch 48, Average loss: 0.4393, learning rate: 0.0171
2017-04-21 21:27:59,993 Verification error= 15.3%, loss= 0.4511
2017-04-21 21:28:01,688 Iter 980, Minibatch Loss= 0.6371, Training Accuracy= 0.7971, Minibatch error= 20.3%
2017-04-21 21:28:02,623 Iter 982, Minibatch Loss= 0.3423, Training Accuracy= 0.9056, Minibatch error= 9.4%
2017-04-21 21:28:03,648 Iter 984, Minibatch Loss= 0.6891, Training Accuracy= 0.6723, Minibatch error= 32.8%
2017-04-21 21:28:04,706 Iter 986, Minibatch Loss= 0.4940, Training Accuracy= 0.7985, Minibatch error= 20.1%
2017-04-21 21:28:05,809 Iter 988, Minibatch Loss= 0.3383, Training Accuracy= 0.9188, Minibatch error= 8.1%
2017-04-21 21:28:06,813 Iter 990, Minibatch Loss= 0.4692, Training Accuracy= 0.7797, Minibatch error= 22.0%
2017-04-21 21:28:07,792 Iter 992, Minibatch Loss= 0.7902, Training Accuracy= 0.5315, Minibatch error= 46.8%
2017-04-21 21:28:08,937 Iter 994, Minibatch Loss= 0.6003, Training Accuracy= 0.7040, Minibatch error= 29.6%
2017-04-21 21:28:09,864 Iter 996, Minibatch Loss= 0.4520, Training Accuracy= 0.7768, Minibatch error= 22.3%
2017-04-21 21:28:10,906 Iter 998, Minibatch Loss= 0.6925, Training Accuracy= 0.7355, Minibatch error= 26.5%
2017-04-21 21:28:11,301 Epoch 49, Average loss: 0.5205, learning rate: 0.0162
2017-04-21 21:28:11,525 Verification error= 15.3%, loss= 0.4540
2017-04-21 21:28:12,691 Optimization Finished!
I use this code for custom dataset training:
from tf_unet import image_util
from tf_unet import unet
from tf_unet import util
search_path = 'data/train/*.jpg'
data_provider = image_util.ImageDataProvider(search_path, data_suffix='.jpg', mask_suffix='.png')
net = unet.Unet(channels=data_provider.channels, n_class=data_provider.n_class, layers=3, features_root=32)
trainer = unet.Trainer(net, optimizer="momentum", opt_kwargs=dict(momentum=0.2))
path = trainer.train(data_provider, "./unet_trained", training_iters=20, epochs=50, display_step=2)
I have a numpy (.npy) file having data of 564 image files, shape is [564, 420, 580] and same for their masks, [564, 420, 580].
I am saving these numpy objects in a h5py file and then passing it like:
dataset_file = h5py.File(source_path, 'r')
generator = Generator(1, dataset_file)
but getting following errors:
AttributeError: 'int' object has no attribute 'encode'
at line: 41 of radio_util.py:
with h5py.File(self.files[self.file_idx], "r") as fp
Please help me...
Hi Joel,
I have run multiple experiments concurrently, and have several observations. Thanks for your insight.
When the batch size is setup as 20, or higher, I found that the learning rate (using momentum) starts to decrease only after more than 10 epochs. It keeps the same at the very beginning. This is what I described in the other thread. Right now, I can see the learning rate is decreasing as expected after epoch 10.
When the batch size is setup at 2 or 4. I can see the learning rate starts to decrease from the very first several epochs.
I am not very clear about how to explain this behavior.
In the competition, they use so-called dice coefficient, which is different with the loss function you are using, do you have any specific consideration for this?
I have been trying to test the adam optimizer.
It will work if I call it as trainer = unet.Trainer(net, optimizer="adam", opt_kwargs=dict(learning_rate=0.0015))
However, it will give the following error message if I call it as
trainer = unet.Trainer(net, optimizer="adam", opt_kwargs=dict(momentum=0.0015))
It will give some error message as:
Traceback (most recent call last):
File "launcher.py", line 54, in
path = trainer.train(generator, "/data/unet_trained", training_iters=1406, epochs=100, display_step=100)
File "/test/u-net/ver6/unet.py", line 341, in train
init = self._initialize(training_iters, output_path, restore)
File "/test/u-net/ver6/unet.py", line 298, in _initialize
self.optimizer = self._get_optimizer(training_iters, global_step)
File "/test/u-net/ver6/unet.py", line 281, in _get_optimizer
**self.opt_kwargs).minimize(self.net.cost,
TypeError: init() got an unexpected keyword argument 'momentum'
After I reading your code, it seems to me the parameter in opt_kwargs should not impact at all regardless whether it is "momentum" or "learning_rate" because as shown
in the following, you always set it as "learning_rate" in opt_kwargs.pop. I think I may be not very clear about the mechanism of opt_kwargs
if self.optimizer == "momentum":
learning_rate = self.opt_kwargs.pop("learning_rate", 0.2)
decay_rate = self.opt_kwargs.pop("decay_rate", 0.95)
elif self.optimizer == "adam":
learning_rate = self.opt_kwargs.pop("learning_rate", 0.001)
self.learning_rate_node = tf.Variable(learning_rate)
Keras provides a nice API for loading and also transforming training and validation data. Maybe with a few tweaks this could be supported by tf_unet
.
Example:
train_datagen = image.ImageDataGenerator(
preprocessing_function=preprocess_input,
rotation_range=30,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
vertical_flip=True
)
train_generator = train_datagen.flow_from_directory(
'./data/food-101/train/',
target_size=target_size,
batch_size=32,
)
Afterwards the generator can be iterated indefinitely e.g. data, mask = next(train_generator)
.
Hi Joel,
The previous thread has become so long. I just closed it. Thanks for your help as always.
I am re-running the code with batch size =2, which was reported by some blogs to get better result. At the same time, I am trying to normalize the image as you suggested.
Besides, I am still feeling confused with the line of code to perform the optimization. My understanding for your including gradients
and self.net.gradients_nodes
is for the debugging usages.
_, loss, lr, gradients = sess.run((self.optimizer, self.net.cost, self.learning_rate_node, self.net.gradients_node), feed_dict={self.net.x: batch_x, self.net.y: util.crop_to_shape(batch_y, pred_shape), self.net.keep_prob: dropout})
In other words, if I am only trying to run the training process, I can remove these two items and re-write your code as. Is my understanding right? In specific, I think I am not very clear about the usage of gradients
here. Thanks.
_, loss, lr= sess.run((self.optimizer, self.net.cost, self.learning_rate_node), feed_dict={self.net.x: batch_x, self.net.y: util.crop_to_shape(batch_y, pred_shape), self.net.keep_prob: dropout})
2017-05-08 10:23:35,963 Verification error= 76.2%, loss= -0.4966
2017-05-08 10:23:36,797 Start optimization
2017-05-08 10:23:39,794 Iter 0, Minibatch Loss= -0.5287, Training Accuracy= 0.6946, Minibatch error= 30.5%
2017-05-08 10:23:41,902 Iter 2, Minibatch Loss= -0.5188, Training Accuracy= 0.5191, Minibatch error= 48.1%
2017-05-08 10:23:43,797 Iter 4, Minibatch Loss= nan, Training Accuracy= 0.2501, Minibatch error= 22.1%
2017-05-08 10:23:44.476436: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.476717: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.477076: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.477192: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.477491: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.477643: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.477992: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.478101: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.478401: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.478537: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.478833: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.479264: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.479623: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.479750: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.480048: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.480512: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.480872: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.480996: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.481190: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.481314: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.481385: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.481639: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.481704: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
2017-05-08 10:23:44.481769: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
Traceback (most recent call last):
File "train_cub_unet.py", line 12, in
path = trainer.train(data_provider, './unet_trained', training_iters=32, epochs=100, display_step=2)
File "/usr/local/lib/python2.7/dist-packages/tf_unet-0.1.0-py2.7.egg/tf_unet/unet.py", line 430, in train
self.output_minibatch_stats(sess, summary_writer, step, batch_x, util.crop_to_shape(batch_y, pred_shape))
File "/usr/local/lib/python2.7/dist-packages/tf_unet-0.1.0-py2.7.egg/tf_unet/unet.py", line 473, in output_minibatch_stats
self.net.keep_prob: 1.})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]Caused by op u'norm_grads', defined at:
File "train_cub_unet.py", line 12, in
path = trainer.train(data_provider, './unet_trained', training_iters=32, epochs=100, display_step=2)
File "/usr/local/lib/python2.7/dist-packages/tf_unet-0.1.0-py2.7.egg/tf_unet/unet.py", line 390, in train
init = self._initialize(training_iters, output_path, restore)
File "/usr/local/lib/python2.7/dist-packages/tf_unet-0.1.0-py2.7.egg/tf_unet/unet.py", line 342, in _initialize
tf.summary.histogram('norm_grads', self.norm_gradients_node)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/summary/summary.py", line 209, in histogram
tag=scope.rstrip('/'), values=values, name=scope)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_logging_ops.py", line 139, in _histogram_summary
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1228, in init
self._traceback = _extract_stack()InvalidArgumentError (see above for traceback): Nan in summary histogram for: norm_grads
[[Node: norm_grads = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](norm_grads/tag, Variable_27/read/_85)]]
In addition to the training error, I also notice the loss is negative (for example, Minibatch Loss= -0.5287
)
Hi,
From a jupyter notebook, the following code completely erased the content of the above directory (~100 000 images, oops!):
output_path = os.path.join('..','DeepFISH','dataset','LowRes','train')
path = trainer.train(data_provider, output_path, training_iters=10, epochs=5)
The directory contained three directories (grey, groundtruth, UnetData).
Where should point output_path ?
Thanks
Joel,
I am currently working on your UNet architecture with my own dataset (images 500 x 376, according to your git topics #6).
I have several questions. Do you have time to share on them ?
On UNet paper the proposed architecture had 5 layers with a convolution filter 3x3 and 64 features.
If I compute the size of output image with the previous observation and my own dataset, I have in theory an output size of 308 x 184. In your code, you compute an offset which corresponds to the "real" offset equal to 192. But when I'm using your code I find a size of 308 x 180.
In the same way, when I'm trying to use a different size of convolution filter with different network depth (for instance 3 layers), the output size is not equal to the theoric size.
I start fixing this problem by changing the offset definition to take into account the number of border pixels with a variable size of convolution filter. So far, I think your code looks like only working with 3x3 convolution and it's necessary to re-compute the offset. May be I am wrong or I missed a step.
For instance you have writen on unet.py :
line 95 and 130 : size -=4
line 99 or 129 : size *= 2
I think these lines are correct only if the max-pooling is 2x2 and with a 3x3 convolution.
I try to understand how and why the network produce this gap size between theory and experiment.
I'm working on Python 3.5 and TensorFlow 0.12 GPU.
I'd be happy to share with you !
Hi Jakeret,
I have been very interested in applying unet for my research. And it's been great to find out you already applied tensorflow with unet.
I've installed tensorflow on kubuntu linux. The version for tensorflow is 1.0.1 and Keras used by tensorflow is 1.2.2.
I wanted to try your demo_toy_problem.
however i expected typeerror problem during execution of the line net=unet.Unet
TypeError Traceback (most recent call last)
in ()
----> 1 net = unet.Unet(channels=generator.channels, n_class=generator.n_class, layers=3, features_root=16)
/home/davince/lib/python2.7/site-packages/tf_unet-0.1.0-py2.7.egg/tf_unet/unet.pyc in init(self, channels, n_class, cost, cost_kwargs, **kwargs)
187 self.keep_prob = tf.placeholder(tf.float32) #dropout (keep probability)
188
--> 189 logits, self.variables, self.offset = create_conv_net(self.x, self.keep_prob, channels, n_class, **kwargs)
190
191 self.cost = self._get_cost(logits, cost, cost_kwargs)
/home/davince/lib/python2.7/site-packages/tf_unet-0.1.0-py2.7.egg/tf_unet/unet.pyc in create_conv_net(x, keep_prob, channels, n_class, layers, features_root, filter_size, pool_size, summaries)
109 bd = bias_variable([features//2])
110 h_deconv = tf.nn.relu(deconv2d(in_node, wd, pool_size) + bd)
--> 111 h_deconv_concat = crop_and_concat(dw_h_convs[layer], h_deconv)
112 deconv[layer] = h_deconv_concat
113
/home/davince/lib/python2.7/site-packages/tf_unet-0.1.0-py2.7.egg/tf_unet/layers.pyc in crop_and_concat(x1, x2)
52 size = [-1, x2_shape[1], x2_shape[2], -1]
53 x1_crop = tf.slice(x1, offsets, size)
---> 54 return tf.concat(3, [x1_crop, x2])
55
56 def pixel_wise_softmax(output_map):
/home/davince/local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.pyc in concat(values, axis, name)
1027 ops.convert_to_tensor(axis,
1028 name="concat_dim",
-> 1029 dtype=dtypes.int32).get_shape(
1030 ).assert_is_compatible_with(tensor_shape.scalar())
1031 return identity(values[0], name=scope)
/home/davince/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.pyc in convert_to_tensor(value, dtype, name, preferred_dtype)
635 name=name,
636 preferred_dtype=preferred_dtype,
--> 637 as_ref=False)
638
639
/home/davince/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.pyc in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype)
700
701 if ret is None:
--> 702 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
703
704 if ret is NotImplemented:
/home/davince/local/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.pyc in _constant_tensor_conversion_function(v, dtype, name, as_ref)
108 as_ref=False):
109 _ = as_ref
--> 110 return constant(v, dtype=dtype, name=name)
111
112
/home/davince/local/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.pyc in constant(value, dtype, shape, name, verify_shape)
97 tensor_value = attr_value_pb2.AttrValue()
98 tensor_value.tensor.CopyFrom(
---> 99 tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
100 dtype_value = attr_value_pb2.AttrValue(type=tensor_value.tensor.dtype)
101 const_tensor = g.create_op(
/home/davince/local/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.pyc in make_tensor_proto(values, dtype, shape, verify_shape)
365 nparray = np.empty(shape, dtype=np_dt)
366 else:
--> 367 _AssertCompatible(values, dtype)
368 nparray = np.array(values, dtype=np_dt)
369 # check to them.
/home/davince/local/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.pyc in _AssertCompatible(values, dtype)
300 else:
301 raise TypeError("Expected %s, got %s of type '%s' instead." %
--> 302 (dtype.name, repr(mismatch), type(mismatch).name))
303
304
TypeError: Expected int32, got list containing Tensors of type '_Message' instead.
I wonder if you know that could cause this problem? was it because my installation of a newer version of tensorflow?
thank you.
Best,
Hello,
I have two questions about the UNet architecture and Dice loss function:
# of True Positives / (# of Positives + # of False Positives)
, but the Dice loss in Line 231, unet.py count both positives and negatives if I understand the code correctly.Please correct me if I am wrong. Thanks.
Hi, your default batch_size for training is 1 in line 305 of unet.py. This confuses me because I think a bigger batch_size is better (I usually use 64 as batch_size in image classification).
Well I tested batch_size 2 or 3 or 5, and met "GPU out of memory" problem. So is it the reason why you use batch_size = 1? Or maybe there are other reasons that batch_size has to be 1? If batch_size could be bigger than 1, then is there any solutions to reduce parameter numbers to save GPU memory?
I'm new to this field, thank you so much about your sharing.
:)
Hi,
tf.image.extract_glimpse()
is used in the implementation of the crop_and_concat()
function.
tf.image.extract_glimpse()
does not allow for backpropagation of gradients (if it was the only connection between two nodes the gradient returned by TensorFlow would be None
) and the gradients with respect to the segmentation error for all layers before a skip connection will therefore be off.
tf.slice()
should be used instead.
Hi,
Thank you for the great work!
Reading the original U-Net paper and comparing with your implementation, it looks like you have used tf.nn.conv2d_transpose to implement "up-sampling followed by convolution". But the tensorflow documentation says that this function "the transpose (gradient) of conv2d". I am wondering if this is the right operation to use here.
Hi,
in the implementation of the weighted loss function the weights are applied to the logits before the softmax activation function. The result for a two class problem is that the bigger value after the application of the softmax function will increase, the smaller value will decrease. In other words, the network will look more confident in its predictions. If the weight was large and the prediction was wrong the gradients will also be larger though not necessarily by the expected amount. If the prediction was right, however, the gradients will be smaller than they would have been otherwise.
To ensure correct scaling, the weights should be applied after the call to tf.nn.softmax_cross_entropy_with_logits()
and before the call to tf.reduce_mean()
Hey!
Thank you for your help and for the updates. Unfortunately, we're still having trouble using a custom dataset and we're hoping you can help us.
The main error produced is this;
....
tensorflow.python.framework.errors.InvalidArgumentError: logits and labels must be same size: logits_size=[709520,2] labels_size=[710500,2]
[[Node: SoftmaxCrossEntropyWithLogits = SoftmaxCrossEntropyWithLogits[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_29, Reshape_30)]]
....
We're working with rgb images of size (767x1022) and using the data provider function like this (where everything here seems to be working as expected);
data_provider = image_util.ImageDataProvider('reduced_segmentation_dataset/*', data_suffix=".jpg", mask_suffix='_mask.png')
The other thing we noticed is that when we call path = trainer.train(data_provider, "./skin_trained", training_iters=10, epochs=4, display_step=2)
it defaults to calling test_x, test_y = data_provider(4)
on line 367 of master/tf_unet/unet.py
If we call data_provider(1)
we seem to get the results we expect and bypass the error but the above error is still preventing us from a full run.
Do you have any ideas why we're having this mismatch? I'd be happy to provide more information as needed.
Hello!
I've just noticed that "weight" and "bias" from the last convolutional layer are not included in "variables", aren't they supposed to be optimized as well? Is there any reason for not including them?
Best regards, Amelia.
Here is my code
from tf_unet import unet, util, image_util
data_provider = image_util.ImageDataProvider("images3/*.png", data_suffix='.png', mask_suffix='_mask.png')
net = unet.Unet(layers=2, features_root=128, channels=3, n_class=2) #
trainer = unet.Trainer(net)
path = trainer.train(data_provider, "train/", dropout=0.5, training_iters=32, epochs=1, display_step=16)
x_test = a._load_file("images4/0393_1.png")
x_test = a._process_data([x_test])
prediction = net.predict(path, x_test)
prediction.shape
returns (1, 240, 240, 2)
x_test.shape
returns (1, 256, 256, 3),
prediction image size is smaller
If layers=1, prediction shape is as 252x252.
At layers=5 I get prediction of something like 96x96.
And it is not scaled down version of the image, it just cuts the center of the image instead.
Is it a bug or a feature? How can I get a full shape while using more layers?
I'm trying to use this to segment RGB images, however I keep getting an error saying I need to be in grayscale like such...
ValueError: could not broadcast input array from shape (300,500,3) into shape (300,500)
So I converted all my labeled images to grayscale and then it started to work, but during optimization crashes again with the same error.
Can I take an RGB image, have it labelled in grayscale, and then run it with this library or do I have to do something else?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.