tianbaochou / nasunet Goto Github PK

HTML 3.30% C 3.53% C++ 1.25% Python 91.92%

nasunet's Introduction

NAS-Unet: Neural Architecture Search for Medical Image Segmentation

In this paper, we design three types of primitive operation set on search space to automatically find two cell architecture DownSC and UpSC for semantic image segmentation especially medical image segmen- tation. The architectures of DownSC and UpSC updated simultaneously by a differential architecture strategy during search stage. We demonstrate the well segmentation results of the proposed method on Promise12, Chaos and ultrasound nerve datasets, which collected by Magnetic Resonance Imaging (MRI), Computed Tomography (CT), and ultrasound respectively.

Requirement

Ubuntu14.04/16.04 or Window10 (Win7 may also support.)
python 3.7
torch >= 1.0
torchvision >= 0.2.1
tqdm
numpy
pydicom (for chao dataset)
SimpleITK (for promise12 dataset)
Pillow
scipy
scikit-image
adabound
PyYAML
opencv-python
tensorboardX
matplotlib (optional)
pydensecrf (optional)
pygraphviz (optional)
graphviz (optional)

TODO:

Parallel compute no-topology related operation in DownSC and UpSC
Optimize multi objective by adding hardware metrics (delay), GPU cost and network parameters
Support multi gpus when update the architecture parameters (2019/04/28)
Extend search strategy for a flexible backbone network
Merge this work to CNASV ( Search-Train Prototype for Computer Vision CNASV)

Usage

pip3 install -r requirements.txt

Noticing

1. Display Cell Architecture
- If you use win10, and want to show the cell architecture with graph, you need install the pygraphviz and add $workdir$\\3rd_tools\\graphviz-2.38\\bin into environment path. Here $workdir$ is the custom work directory. such as E:\\workspace\\NasUnet
- If you use ubuntu, install graphviz by : sudo apt-get install graphviz libgraphviz-dev pkg-config

After that install pygraphviz : pip install pygraphviz

2. If you use win10, and you also need to add the bin path of nvidia-smi to you environment path. Because we will automatically choose the gpu device with max free gpu memory to run!.
3. If you want to use multi-gpus during training phase. you need make sure the batch size can divide into gpus evenly. which (may be pytorch bug!). For example, if you have 3 gpus, and the batch size need to be 3, 6, 9 ... 3xM.
4. When you meet CUDA OOM problems, the following tricks will help you:
- A. set lower init_channels in configure file, such as 16, 32, or 48.
- B. set lower batch_size such as 2, 4, 6, 8.
- C. when you use a large image size, such as 480, 512, 720 et. al. the initial channels and batch size may be much smaller.

Search the architecture

cd experiment
# search on pascal voc2012
python train.py --config ../configs/nas_unet/nas_unet_voc.yml

Evaluate the architecture on medical image datasets

train on promise12 dataset use nasunet

python train.py --config ../configs/nas_unet/nas_unet_promise12.yml --model nasunet

if you want to fine tune model:

python train.py --config ../configs/nas_unet/nas_unet_promise12.yml --model nasunet --ft

use multi-gpus

edit configs/nas_unet/nas_unet_promise12.yml

training:
    geno_type: NASUNET
    init_channels: 32
    depth: 5
    epoch: 200
    batch_size: 6
    report_freq: 10
    n_workers: 2
    multi_gpus: True # need set to True for multi gpus

and then

python train.py --config ../configs/nasunet/nas_unet_promise12.yml --model nasunet --ft

We will use the all gpu devices for training.

Both in search and train stage, if you run in one gpu, we will find a max free gpu and transfer model to it. So you can run N instances without manual set the device ids, if you have N gpu devices.

The final architectures of DownSC and UpSC we searched on pascal voc 2012.

Custom your dataset

Firstly, normalize the custom dataset. you need mean and std see util/dataset/calc_mean_std
Secondly, add CustomDataset in util/dataset/CustomDataset.py
Finally, edit util/dataset/__init__.py, add your CutomDataset dataset and replace the dir = '/train_tiny_data/imgseg/' to dir = '/your custom dataset root path/'

Citation

If you use this code in your research, please cite our paper.

@ARTICLE{8681706, 
author={Y. {Weng} and T. {Zhou} and Y. {Li} and X. {Qiu}}, 
journal={IEEE Access}, 
title={NAS-Unet: Neural Architecture Search for Medical Image Segmentation}, 
year={2019}, 
volume={7}, 
number={}, 
pages={44247-44257}, 
keywords={Computer architecture;Image segmentation;Magnetic resonance imaging;Medical diagnostic imaging;Task analysis;Microprocessors;Medical image segmentation;convolutional neural architecture search;deep learning}, 
doi={10.1109/ACCESS.2019.2908991}, 
ISSN={2169-3536}, 
month={},}

nasunet's People

Contributors

Stargazers

Watchers

Forkers

berisfu cwhuang888 xiaofengqing justusschock kepengxu sharifelguindi project-concept-dev yangbing-9527 845968074 809253928 avinash-chouhan hfahad kaiqiao1992 scape1989 josephinerabbit bigheqiduo trendingtechnology hvning weihao94 thinkronize luckysonkhaidem thithaotran branikas pandinosaurus zhwzhong samra-irshad chnxindong sjjdd moon0710 yfh-yufeihu rayburnchen adafranca zhuimengxuebao paul-ang 43reyerhrstj farhadi76m sodabe622 bbgcz chinmoy06 ruchira2k zirakkk yafehlis jocker200 shreyan-12345 zugzwang03

nasunet's Issues

关于experiment/test.py中的valid和test数据集问题

test.py是为了测试最后得到网络的mIOU的
其中，trainset、valset、testset都是相同的数据集目录，也就是data都是相同的，那么后面用train的数据集再作为test的数据集，理论上不是有错误吗？
还望解答！

Searching Space Problem

Thank you for your nice work, I read your paper and the corresponding code, I found that your work involves the cell parameters (alpha) searching which consisted of 3 kinds of cells, however, the stacking order of these cells is manual design?

Shouldn't the architectural parameters be excluded in the SearchNetwork.model_optimizer?

In experiment/search_cell.py line 125:
self.model_optimizer = optimizer_cls1(self.model.parameters(), **optimizer_params1)
in which self.model.parameters() includes the arch parameters self.model.alphas().
Is this model_optimizer meant to update all the parameters?

Thank you :-)

CRF Parameters

您好，我想请问一下，CRF中addPairwiseGaussian函数里的参数是什么意思啊，求指教，谢谢。

请问可视化应该怎么做？

关于batch_intersection_union的计算

你好，nice work!
关于metrics.py中IOU的计算，这个地方是不是有错误：

def batch_intersection_union(output, target, nclass):
    predict = torch.max(output, 1)[1]
    mini = 1
    maxi = nclass-1
    nbins = nclass-1
    # label is: 0, 1, 2, ..., nclass-1
    # Note: 0 is background
    predict = predict.cpu().numpy().astype('int64') + 1
    target = target.cpu().numpy().astype('int64') + 1

    predict = predict * (target > 0).astype(predict.dtype)
    intersection = predict * (predict == target)

    # areas of intersection and union
    area_inter, _ = np.histogram(intersection, bins=nbins, range=(mini, maxi))
    area_pred, _ = np.histogram(predict, bins=nbins, range=(mini, maxi))
    area_lab, _ = np.histogram(target, bins=nbins, range=(mini, maxi))
    area_union = area_pred + area_lab - area_inter
    assert (area_inter <= area_union).all(), \
        "Intersection area should be smaller than Union area"
    return area_inter, area_union

这样求取的并不是IOU，主要问题，我认为是在 + 1这个地方。

Is experiment/train.py really searching architectures or just training a chosen architecture?

In the README.md your instruction is to run experiment/train.py with a specified configuration file in order to search for architectures on the dataset specified in that configuration file.

However, experiment/train.py does not seem to do any searching over architectures at all, and it seems to me that it only trains the architecture specified by 'training | geno_type' in the configuration file and models/geno_searched.py.

The architecture search seems to be actually performed by experiment/search_cell.py and it seems to me that you put the best-performing models that came out of this search into the file models/geno_searched.py.

Is my understanding correct about experiment/train.py, experiment/search_cell.py, and models/geno_searched.py?

a little question about the script "prim_ops_set.py"

Hello, Thanks for your work!
I found a question when I run your script.
In the script "prim_ops_set.py", row 183.
according to my understanding, the first step of depth conv should make the "in_channels" and the "out_channel" the same. So I'm confused now.

if the shape of my images is neither RGB(in_channel=3) nor monochrome(in_channel=1).its shape is [110,110,7].can the NASUnet support this case?

at the same time,the output shape is [110,110,1]

is the NasUnet can support GPU-CLUSTER?or do you plan to do so?

dataset問題

請問在Pascal voc上跑準確度如何? 因為我跑起來準確度好像不高，所以想請教一下

I find that you have add coco.py for MS COCO data，while no config file for COCO data?have you ever try the NASUNET based on COCO?

RuntimeError: CUDA out of memory

Hi,

First of all, thanks for sharing your code which is a great benefit for us to experiment more ideas.

However, on a GTX 1080 8GB card, I have encountered a Runtime Error CUDA OOM issue
When executing the following:
python3 train.py --config ../configs/nas_unet/nas_unet_voc.yml
even reducing the batch_size to 1 would not help.

Could you give some suggestions to resolve this?

Thank you !

Dataset

Hi there,

Thank you for your work. I was wondering if you could tell us a bit about how the dataset should be in order to train a model on our own (masks, patches, should they be on the same folder, how should their name be, etc). I m trying to figure this out from the BaseDataset class and all the py files in the /utils/datasets but I am a bit confused.
Thank you very much.

重现时出现的问题

你好，我在重现这个代码的时候，在GTX 1080Ti上同样使用Pascal的数据库进行探索网络结构的时候，它只运行了12个小时就结束了。和你在论文里写的一天半时间相比，是不是这次探索没有训练完啊？(我没有动里面的代码，完全就是按照你在readme上面写的步骤一步一步执行的。)

当我想验证网络结果的时候，我使用了promise12的数据集进行实验，增加了在train.py中的test函数，但是我看不到测试出来的结果图。直接print像素的话全部都是0。而且运算结果中DSC只有0.70左右，和了论文中的结果相差甚远。不仅如此原图我也没有办法直接查看，请问你在实验的时候是怎么样查看数据库图片的？

多卡训练和搜索均会卡死

您好，我下载了您的代码，Nasunet在Promise12数据集上单卡均正常，而使用多卡时就会在0%处卡死。
我使用三张1060
设置如下：
model:
arch: nasunet
data:
dataset: promise12
train_split: train_aug
split: val
img_rows: 'same'
img_cols: 'same'
searching:
init_channels: 16
depth: 4
epoch: 300
batch_size: 6
report_freq: 20
n_workers: 2
alpha_begin: 10
max_patience: 40
gpu: True
multi_gpus: True
sharing_normal: True
double_down_channel: False
meta_node_num: 3
grad_clip: 5
train_portion: 0.5
model_optimizer:
name: 'sgd'
lr: 0.025
weight_decay: 3.0e-4
arch_optimizer:
name: 'adam'
lr: 3.0e-4
weight_decay: 5.0e-3
loss:
name: 'dice_loss'
size_average: False
aux_weight:
resume:
training:
geno_type: NASUNET
init_channels: 32
depth: 5
epoch: 200
batch_size: 6
report_freq: 10
n_workers: 2
multi_gpus: True
double_down_channel: False
grad_clip: 3
max_patience: 100
model_optimizer:
name: 'adam'
lr: 1.0e-3
weight_decay: 5.0e-4
loss:
name: 'dice_loss'
aux_weight: 0
backbone:
lr_schedule:
name: 'cos'
T_max: 100
resume:

Actually searching for a neural architecture

@tianbaochou
Thanks for your hard work,

1- The Readme doesn't show how to use search_cell.py in-order to search for a new architecture
2- How many images are enough to search or a new architecture?
3- Does NasUnet search using cropped of the ground-truth images, or downscaled images?

Search for a new neural architecture using my own dataset

@tianbaochou

You have stated before that train.py uses architectures that you already searched by yourself, which you have placed in geno_searched.py

My question is:
How can I search for a new neural architecture using my own dataset.
I have attached a sample of the dataset below.

sample.zip

requirements are not complete

@tianbaochou Thank you for your hard work,

The requirements.txt file that you provided is not complete, and does not have all the packages required to run NasUnet properly.
Can you upload environment.yml, which you can get by running this command:

conda env export > environment.yml

请问训练完后如何获得测试集的分割结果图像

when i run python search_cell.py for Ultrasound_nerve dataset, error occurs: ['searcing']['alpha_begin'] and ['searching']['max_patience'] are not found.

您好，请问可视化功能如何使用

NameError: name 'CitySegmentation' is not defined

When trying to train using pascal voc2012, an error message appear:

(nasunet) home@home-lnx:~/Desktop/programs/NasUnet/experiment$ python train.py --config ../configs/nas_unet/nas_unet_voc.yml
Traceback (most recent call last):
  File "train.py", line 14, in <module>
    from util.datasets import get_dataset
  File "../util/datasets/__init__.py", line 21, in <module>
    'citys': CitySegmentation,
NameError: name 'CitySegmentation' is not defined

How do you do testing?

I have tried to run the
python train.py --config ../configs/nas_unet/nas_unet_voc.yml

I tried running it for a short 100 epochs and 2 gpus, the running was ok. It generates some files in the logs folder. I have the following queries:

There was nothing in the saved_val_images folder. Is it supposed to be so?
I have gotten the checkpint.pth.tar and model_best.pth.tar. I can load the weights. How do I test them? Just run it again like the training?

Thank you.

edit #1: 2019-09-11
I am also having difficulty finding the trained structure, I mean where do I look for the architecture after searching? I could load the saved model parameters into a model rebuilt from the initial training configurations.

whether your code is just for IN_CHANNELS equal 1?

when i try to run the code for camvid, the following error appears.
whether the target must be 3 dimension which means the channel of target must be 1.

I also find that your paper figure it out the nas-unet can get better performance on chaos,ultrasound_nerve,promise12 ,which are have only one channel.

File "C:\Users\f00335061\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\nn\functional.py", line 1792, in nll_loss
ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: invalid argument 3: only batches of spatial targets supported (3D tensors) but got targets of dimension: 4 at c:\a\w\1\s\windows\pytorch\aten\src\thnn\generic/SpatialClassNLLCriterion.c:59

SegmentationMetric problem

In SegmentationMertic:

# Remove classes from unlabeled pixels in gt image. 
# We should not penalize detections in unlabeled portions of the image.

how to understand unlabeled pixels in gt image.

Thanks !

How to search a new architecture on my own dataset?

Thank you for publishing the code. The whole work is attractive.
I read your code and paper, and have a question. Why the search strategy is the same as train strategy:

search strategy: python train.py --config ../configs/nas_unet/nas_unet_promise12.yml;
train strategy: python train.py --config ../configs/nas_unet/nas_unet_promise12.yml --model nasunet,

Should the search strategy be "python search_cell.py --config ../configs/nas_unet/nas_unet_voc.yml"?

你好,请问可以用一张gpu跑吗?

你好请问可以用一张gpu跑吗?

predicts[0] in ./experiment/train.py 's dice_coefficient has channel size 21?

No util.loader module

HI
I want to run the code, but there is no util.loader module in the code and the error is:
from util.loader import get_loader
ModuleNotFoundError: No module named 'util.loader'

RuntimeError: CUDA error: device-side assert triggered

I have crop the data from 110 to 96, the error message disappear. but new error occur: /pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [65,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [66,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [80,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [81,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [83,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [84,0,0] Assertion t >= 0 && t < n_classes failed. Traceback (most recent call last): File "search_cell_sim.py", line 312, in search_network.run() File "search_cell_sim.py", line 212, in run self.train() File "search_cell_sim.py", line 266, in train self.train_loss_meter.update(train_loss.item())

RuntimeError: CUDA error: device-side assert triggered

The result of different model on NERVE dataset shows unusual similarity(Different from the results in the paper)

Thank you for your inspring work
I tried your code on the NERVE dataset, but the result of Unet and NasUnet is very strange and similar.
I run the code

python train.py --config='../../config/baseline/nerve.yml' --model='unet'

to train the unet on NERVE dataset and the result is

python train.py --config='../../config/nas_unet/nas_unet_nerve.yml' --model='nasunet'

to train the nasunet and the result is

It seems the DSC of two models are both about 0.77 and mIOU are both 0.983
However, the result in your paper is

Why the performence of Unet is better than expected and the performence of NasUnet is worse than expected?
Why the performence of two model are so similar, am I doing something wrong?
Looking forward to your answer.

search_cell.py results in pixACC 0.67 & mIoU 0.03 & loss is constant

@tianbaochou Thank you for your hard work.
I ran python search_cell.py for 18 hours, and used only 300 images from the voc 2012 dataset, the results were mostly pixACC 0.67 & mIoU 0.03

Why is the mIoU very low, pixACC not improving and loss is constant?

The search folder is attached.
search-20190516-163658.zip

use my images，the error RuntimeError: The size of tensor a (12) must match the size of tensor b (13) at non-singleton dimension 3 appears.

Traceback (most recent call last):
File "search_cell_sim.py", line 312, in
search_network.run()
File "search_cell_sim.py", line 212, in run
self.train()
File "search_cell_sim.py", line 262, in train
predicts = self.model(input)
File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "../search/backbone/nas_unet_search.py", line 181, in forward
return self.net(x, weights1_down, weights1_up, weights2_down, weights2_up)
File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "../search/backbone/nas_unet_search.py", line 68, in forward
s0, s1 = s1, cell(s0, s1, weights1_down, weights2_down)
File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "../search/backbone/cell.py", line 82, in forward
tmp_list += [self._ops[offset+j](h, weight1[offset+j], weight2[offset+j])]
File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "../search/backbone/cell.py", line 32, in forward
rst = sum(w * op(x) for w, op in zip(weights2, self._ops))
RuntimeError: The size of tensor a (12) must match the size of tensor b (13) at non-singleton dimension 3