xiaoyufenfei / efficient-segmentation-networks Goto Github PK

Lightweight models for real-time semantic segmentationon PyTorch (include SQNet, LinkNet, SegNet, UNet, ENet, ERFNet, EDANet, ESPNet, ESPNetv2, LEDNet, ESNet, FSSNet, CGNet, DABNet, Fast-SCNN, ContextNet, FPENet, etc.)

License: MIT License

Python 99.66% Shell 0.34%

pytorch real-time-semantic-segmentation camvid cityscapes computer-vision scene-understanding segmentation efficient-segmentation-networks semantic-segmentation-models image-segmentation

efficient-segmentation-networks's People

Contributors

Stargazers

Watchers

Forkers

zhengfangwu zhangyuelong yangtong1989 gaoqiangwu jchenpanyu jdc08161063 suyanzhou626 kuan-li dredwardlee darkknightzh zhangyin497 saqibmamoon snowhou lyf6 liuweiming2008 qing0991 fengzhiteng hjffily hanyeliu dbofseuofhust pursu creatorcen bigdatasciencegroup carlosfudan ruguowoshiyu dcrmg fdsjk royzon dl-alva doctorwk007 mathpopo abhinavm24 liuguoyou mjanddy oscarsoto6 geochri satoshirobatofujimoto bharath5673 askintution mahmudabdelnaby cupwater kiesnow nikhilroxtomar basit26374 super-iron-man temitopeoladokun gheyret trevolan77 andregraubner lx11 dongxuewang-123 tuananh1007 genhao3 gaimjkp dandingbudanding trungthanhnguyen0502 happyday-lkj xuwenlong0315 pranjalsahu dfayzur qqr1 guanyonglai cavalleria cooparation derrickwang005 ratnajitmukherjee anotherkey zubairbaqai lsscumt hqss xiaowuge1201 boussoffara ethan-ye mksarker super-ljg belkhir-nacim hyaihjq soulmateb hanzhy-code vipermdl yjingyu mengkunzhao chentao199617 ilpapds ashishpatel26 ydaigo ken2eth noticeable templeblock zhangjunhit chenghuige liangxiaoyun elmirakhajei vincentpalma leeys888 abdul-nasir11 ghali007 hehongjie ltb801 brandleyzhou

efficient-segmentation-networks's Issues

This repository now supports two datasets: cityscapes and camvid, cityscapes is not included

Process finished with exit code 1

Fast SCNN

why ValueError: num_samples should be a positive integer value, but got num_samples=0
I use the cityscapes dataset,it can't get it.But directory is true

Inference Speed of ENet

Hello Great Work !
I did training using the provided ENet and ERFNet Models.
I am facing a problem, while testing the Models, ENet inference speed is very low, it is even lower than the ERFNet. I don't understand why, it shouldn't be the case.

unet+backbone

dear sir,
is unet+backbone suported(such as resnet50)?

ENet prediction become worse after 100 epoch

I use this project to train ENet on cityscapes dataset(640*480). The loss decase normally, but predict result become worse when trainning for more than 50 epochs...

from left to right, 10epoch, 50epoch, 80epoch.
and the training loss vs epochs looks fine:

Can anyone give me some advice? Thanks!

init problem

cityscapes data set processing

After downloading cityscapes data sets, put them in the corresponding folder, which will be displayed during training：NotImplementedError: This repository now supports two datasets: cityscapes and camvid, cityscapes is not included。What should I do about it？

run predict error

Namespace(batch_size=1, checkpoint='', classes=11, cuda=True, dataset='camvid', gpus='0', model='ENet', num_workers=2, save_seg_dir='./server/camvid/predict/ENet')
=====> use gpu id: '0'
find file: ./dataset/inform/camvid_inform.pkl
length of Validation set: 233
=====> beginning testing
test set length: 233
Traceback (most recent call last):
File "/home/Downloads/Efficient-Segmentation-Networks/predict.py", line 119, in
test_model(args)
File "/home/Downloads/Efficient-Segmentation-Networks/predict.py", line 102, in test_model
predict(args, testLoader, model)
File "/home/Downloads/Efficient-Segmentation-Networks/predict.py", line 44, in predict
for i, (input, size, name) in enumerate(test_loader):
ValueError: too many values to unpack (expected 3)

FPENet is difficult to reproduce

Hi, this is a good project.
I tried it, and the overall installation and training were very simple and straightforward. I experimented with FPENet, but the final result was larger than the original one. specifically:
I modified the hyperparameters to the original:

model = FPENet
dataset = cityscapes
input_size = 512, 1024
classes = 19
train_type = train
max_epochs = 400
lr_schedule = poly
The loss used CrossEntropy2d:
That is, line 133 of train: criteria = CrossEntropyLoss2dLabelSmooth (weight = weight, ignore_label = ignore_label), changed to: CrossEntropyLoss2d function, while other settings remain unchanged, using an RTX2080Ti GPU, training for 9h

But in the end, mIOU is only 46% on the val set, and the original effect is 60–70% on the test set, but I feel that there should not be such a big gap between val and test. I checked some output and noticed that the model parameters printed by train.py were 0.12M, but the original model was 0.4M. At first, I thought the model was wrong, but after checking the paper, I felt that your implementation was correct. Then I used torchsummary in the model to see that the model was 0.44M, so I didn't know what went wrong.

Maybe FPENet itself is difficult to reproduce? (Although this is common in AI papers).
Has anyone used this project to reproduce and roughly achieve the effect of an original model? Can you discuss and share the parameters and strategies set?

How to add new class to this project

I have a question to consult you. I want to add three categories (water, bridge and others) to the cityscapes data sets; Should I attribute "the else class" to "ignore_ Label", or add a new kind of class "else"; That means to set classes to 21 or 22. Thank you.

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

Traceback (most recent call last):
File "D:/Code/CodePycharm/Deep_Learning/segment/Efficient-Segmentation-Networks-master/train.py", line 409, in
train_model(args)
File "D:/Code/CodePycharm/Deep_Learning/segment/Efficient-Segmentation-Networks-master/train.py", line 222, in train_model
lossTr, lr = train(args, trainLoader, model, criteria, optimizer, epoch)
File "D:/Code/CodePycharm/Deep_Learning/segment/Efficient-Segmentation-Networks-master/train.py", line 336, in train
loss.backward()
File "D:\Environment\Anaconda\envs\lmffnet\lib\site-packages\torch\tensor.py", line 118, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "D:\Environment\Anaconda\envs\lmffnet\lib\site-packages\torch\autograd_init_.py", line 93, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

Process finished with exit code 1

can you help me? please.

About metric mistake, recall and precison

Efficient-Segmentation-Networks/utils/metric/metric.py

Lines 43 to 56 in 0f0c32e

    
           # Pii为预测正确的数量，Pij和Pji分别被解释为假正和假负，尽管两者都是假正与假负之和 
        
           def recall(self):  # 预测为正确的像素中确认为正确像素的个数 
        
               recall = 0.0 
        
               for i in range(self.nclass): 
        
                   recall += self.M[i, i] / np.sum(self.M[:, i]) 
        
               return recall / self.nclass 
        
           def accuracy(self):  # 分割正确的像素除以总像素 
        
               accuracy = 0.0 
        
               for i in range(self.nclass): 
        
                   accuracy += self.M[i, i] / np.sum(self.M[i, :]) 
        
               return accuracy / self.nclass

it's reverse.
` # 理解混淆矩阵生成的代码的关键，行为真实值，列为预测值
# Pii为预测正确的数量，Pij=FN, Pji=FP
# 每一列之和表示被预测为该类别的样本数量 = TP+FP, precision = TP/(TP+FP), 所有被预测为正类中真正正类的比例
# 每一行之和表示该类别的真实样本数量 = TP+FN, recall = TP/TP+FN , 所有正类中，被找出的正类的比例

def recall(self):
    recall = 0.0
    class_recall = []
    for i in range(self.nclass):
        recall_i = self.M[i, i] / np.sum(self.M[i, :])
        recall += recall_i
        class_recall.append(recall_i)
    return recall / self.nclass, class_recall

def accuracy(self):
    accuracy = 0.0
    class_accuracy = []
    for i in range(self.nclass):
        accuracy_i = self.M[i, i] / np.sum(self.M[:, i])
        accuracy += accuracy_i
        class_accuracy.append(accuracy_i)
    return accuracy / self.nclass, class_accuracy`

So, change

Efficient-Segmentation-Networks/utils/metric/metric.py

Line 47 in 0f0c32e

recall += self.M[i, i] / np.sum(self.M[:, i])

and

Efficient-Segmentation-Networks/utils/metric/metric.py

Line 54 in 0f0c32e

accuracy += self.M[i, i] / np.sum(self.M[i, :])

Where are the models and code?

mobilenetv3+LR-ASPP

Thanks for your excellent work!

Will you reproduce mobilenetv3+LR-ASPP ?

(-215:Assertion failed) !ssize.empty() in function 'resize'

/home/ll/anaconda3/envs/t1.2/bin/python /media/ll/L/Efficient-Segmentation-Networks/train.py
1.2
=====> input size:(512, 1024)
Namespace(batch_size=4, classes=19, cuda=True, dataset='cityscapes', gpus='0', input_size='512,1024', logFile='log.txt', lr=0.0005, lr_schedule='warmpoly', max_epochs=1000, model='FastSCNN', num_cycles=1, num_workers=4, optim='adam', poly_exp=0.9, random_mirror=True, random_scale=True, resume='', savedir='./checkpoint/', train_type='trainval', use_focal=True, use_label_smoothing=False, use_lovaszsoftmax=False, use_ohem=False, warmup_factor=0.3333333333333333, warmup_iters=500)
=====> use gpu id: '0'
=====> set Global Seed: 1234
=====> building network
=====> computing network parameters and FLOPs
the number of parameters: 1138051 ==> 1.14 M
find file: ./dataset/inform/cityscapes_inform.pkl
length of dataset: 122
length of dataset: 59
=====> Dataset statistics
data['classWeights']: [ 1.4705521 9.505282 10.492059 10.492059 10.492059 10.492059
10.492059 10.492059 10.492059 10.492059 10.492059 10.492059
10.492059 10.492059 10.492059 10.492059 10.492059 10.492059
5.131664 ]
mean and std: [72.3924 82.90902 73.158325] [45.319206 46.15292 44.91484 ]
single GPU for training
=====> beginning training
=====> the number of iterations per epoch: 30
Traceback (most recent call last):
File "/media/ll/L/Efficient-Segmentation-Networks/train.py", line 401, in
train_model(args)
File "/media/ll/L/Efficient-Segmentation-Networks/train.py", line 218, in train_model
lossTr, lr = train(args, trainLoader, model, criteria, optimizer, epoch)
File "/media/ll/L/Efficient-Segmentation-Networks/train.py", line 301, in train
for iteration, batch in enumerate(train_loader, 0):
File "/home/ll/anaconda3/envs/t1.2/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in next
return self._process_data(data)
File "/home/ll/anaconda3/envs/t1.2/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
data.reraise()
File "/home/ll/anaconda3/envs/t1.2/lib/python3.7/site-packages/torch/_utils.py", line 369, in reraise
raise self.exc_type(msg)
cv2.error: Caught error in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/ll/anaconda3/envs/t1.2/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/ll/anaconda3/envs/t1.2/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/ll/anaconda3/envs/t1.2/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/media/ll/L/Efficient-Segmentation-Networks/dataset/cityscapes.py", line 64, in getitem
label = cv2.resize(label, None, fx=f_scale, fy=f_scale, interpolation=cv2.INTER_NEAREST)
cv2.error: OpenCV(4.5.1) /tmp/pip-req-build-7m_g9lbm/opencv/modules/imgproc/src/resize.cpp:4051: error: (-215:Assertion failed) !ssize.empty() in function 'resize'

Process finished with exit code 1

Error in training -------ZMQError: Address already in use

hi,
every time i train a model, after some epochs, this error occurs:

Traceback (most recent call last):
File "/home/thu‑microe/anaconda3/envs/tensorflow/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/thu‑microe/anaconda3/envs/tensorflow/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/thu‑microe/anaconda3/envs/tensorflow/lib/python3.6/site‑packages/spyder_kernels/console/main.py", line 11, in
start.main()
File "/home/thu‑microe/anaconda3/envs/tensorflow/lib/python3.6/site‑packages/spyder_kernels/console/start.py", line 306, in main
kernel.initialize()
File "", line 2, in initialize
File "/home/thu‑microe/anaconda3/envs/tensorflow/lib/python3.6/site‑packages/traitlets/config/application.py", line 87, in catch_config_error
return method(app, *args, **kwargs)
File "/home/thu‑microe/anaconda3/envs/tensorflow/lib/python3.6/site‑packages/ipykernel/kernelapp.py", line 567, in initialize
self.init_sockets()
File "/home/thu‑microe/anaconda3/envs/tensorflow/lib/python3.6/site‑packages/ipykernel/kernelapp.py", line 271, in init_sockets
self.shell_port = self._bind_socket(self.shell_socket, self.shell_port)
File "/home/thu‑microe/anaconda3/envs/tensorflow/lib/python3.6/site‑packages/ipykernel/kernelapp.py", line 218, in _bind_socket
return self._try_bind_socket(s, port)
File "/home/thu‑microe/anaconda3/envs/tensorflow/lib/python3.6/site‑packages/ipykernel/kernelapp.py", line 194, in _try_bind_socket
s.bind("tcp://%s:%i" % (self.ip, port))
File "zmq/backend/cython/socket.pyx", line 547, in zmq.backend.cython.socket.Socket.bind
File "zmq/backend/cython/checkrc.pxd", line 25, in zmq.backend.cython.checkrc._check_rc
zmq.error.ZMQError: Address already in use.

can you help, thanks

How to test

I can't find chekpoint['modle'],how to test?

Training with cityscapes dataset and ohem loss

Hello, I am using your nice suite since a couple of days. Running on two RTX GPUs (24GB) and there is a problem with the ohem loss:
python3 train.py --dataset cityscapes --use_ohem --gpus 0,1 --batch_size 32 --num_worker 8

1.4 --> Pytorch version (same error for 1.1, though)

=====> input size:(512, 1024)
Namespace(batch_size=32, classes=19, cuda=True, dataset='cityscapes', gpus='0,1', input_size='512,1024', logFile='log.txt', lr=0.0005, lr_schedule='warmpoly', max_epochs=1000, model='ENet', num_cycles=1, num_wor
kers=8, optim='adam', poly_exp=0.9, random_mirror=True, random_scale=True, resume='', savedir='./checkpoint/', train_type='trainval', use_focal=False, use_label_smoothing=False, use_lovaszsoftmax=False, use_ohem
=True, warmup_factor=0.3333333333333333, warmup_iters=500)
=====> use gpu id: '0,1'
=====> set Global Seed:  1234
=====> building network
=====> computing network parameters and FLOPs
the number of parameters: 360422 ==> 0.36 M
find file:  ./dataset/inform/cityscapes_inform.pkl
length of dataset:  3475
length of dataset:  500
=====> Dataset statistics
data['classWeights']:  [ 1.4705521  9.505282  10.492059  10.492059  10.492059  10.492059
 10.492059  10.492059  10.492059  10.492059  10.492059  10.492059
 10.492059  10.492059  10.492059  10.492059  10.492059  10.492059
  5.131664 ]
mean and std:  [72.3924   82.90902  73.158325] [45.319206 46.15292  44.91484 ]
w/ class balance
torch.cuda.device_count()= 2
=====> beginning training
=====> the number of iterations per epoch:  108
Traceback (most recent call last):
  File "train.py", line 398, in <module>
    train_model(args)
  File "train.py", line 215, in train_model
    lossTr, lr = train(args, trainLoader, model, criteria, optimizer, epoch)
  File "train.py", line 327, in train
    loss = criterion(output, labels)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/git/Dummy-Efficient-Segmentation-Networks/utils/losses/loss.py", line 192, in forward
    prob = prob.masked_fill_(1 - valid_mask, 1)     #
  File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 394, in __rsub__
    return _C._VariableFunctions.rsub(self, other)
RuntimeError: Subtraction, the `-` operator, with

No error occurs if I use focal loss (FocalLoss2d)...

Plan to support instance segmentation?

Thanks for creating a great repo, organizing many segmentation models/datasets in one place!

Wondering if you're planning to add support for instance segmentation / panoptic segmentation. The cityscape dataset provides instance ids, although camvid doesn't. Any thoughts/pointers would be appreciated.

issues with ContextNet and FastSCNN models

I trained 7 of the models and in all of them, I got more than %80 validation mIoU with the default settings (CamVid dataset after 1000 epochs). But when I tested the 1000th checkpoints with test.py code, I get these mIoUs: CGNet_0.651, ContextNet_0.060, DABNet_0.652, EDANet_0.288, ENet_0.590, ERFNet_0.672, FastSCNN_0.011.

So I'm wondering why there's a significant difference between validation and test scores?
and why ContextNet and FastSCNN checkpoints are not trained?

I have Python 3.7 and Pytorch 1.4
hope we can solve the testing issues soon

results of each model?

Hi, it's a wonderful repository. But I am curious whether you have trained any of them and if yea, could you supply the results of them (in CityScape or CamVid)?

ESNet

Has anyone tried using ESNet model?I got poor training results only 53%！！

RecursionError: maximum recursion depth exceeded while calling a Python object

when train on 18epoch, it would report error:

Traceback (most recent call last):
  File "train.py", line 396, in <module>
    train_model(args)
  File "train.py", line 213, in train_model
    lossTr, lr = train(args, trainLoader, model, criteria, optimizer, epoch)
  File "train.py", line 328, in train
    optimizer.step()
  File "/home/lgq/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/optim/lr_scheduler.py", line 36, in wrapper
    return func(*args, **kwargs)
  File "/home/lgq/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/optim/lr_scheduler.py", line 36, in wrapper
    return func(*args, **kwargs)
  File "/home/lgq/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/optim/lr_scheduler.py", line 36, in wrapper
    return func(*args, **kwargs)
  [Previous line repeated 991 more times]
  File "/home/lgq/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/optim/adam.py", line 69, in step
    state = self.state[p]
  File "/home/lgq/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/tensor.py", line 393, in __hash__
    return id(self)
RecursionError: maximum recursion depth exceeded while calling a Python object

I found that you put WarmupPolyLR in every iters, it would make every iter call WarmupPolyLR, maybe it cause the error.

ESNet: Error after 161 iterations

Such like this:
=====> epoch[161/300] iter: (526/1329) cur_lr: 0.000251 loss: 0.445 time:0.67
=====> epoch[161/300] iter: (527/1329) cur_lr: 0.000251 loss: 0.196 time:0.58
Traceback (most recent call last):
File "D:/my/torch-gpu/Efficient-Segmentation-Networks/Efficient-Segmentation-Networks/train.py", line 398, in
train_model(args)
File "D:/my/torch-gpu/Efficient-Segmentation-Networks/Efficient-Segmentation-Networks/train.py", line 215, in train_model
lossTr, lr = train(args, trainLoader, model, criteria, optimizer, epoch)
File "D:/my/torch-gpu/Efficient-Segmentation-Networks/Efficient-Segmentation-Networks/train.py", line 329, in train
loss.backward()
File "D:\deeplearning\anaconda\envs\torch-gpu\lib\site-packages\torch\tensor.py", line 107, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "D:\deeplearning\anaconda\envs\torch-gpu\lib\site-packages\torch\autograd_init_.py", line 87, in backward
grad_tensors = make_grads(tensors, grad_tensors)
File "D:\deeplearning\anaconda\envs\torch-gpu\lib\site-packages\torch\autograd_init.py", line 28, in _make_grads
raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar outputs

	# Pii为预测正确的数量，Pij和Pji分别被解释为假正和假负，尽管两者都是假正与假负之和
	def recall(self): # 预测为正确的像素中确认为正确像素的个数
	recall = 0.0
	for i in range(self.nclass):
	recall += self.M[i, i] / np.sum(self.M[:, i])

	return recall / self.nclass

	def accuracy(self): # 分割正确的像素除以总像素
	accuracy = 0.0
	for i in range(self.nclass):
	accuracy += self.M[i, i] / np.sum(self.M[i, :])

	return accuracy / self.nclass

xiaoyufenfei / efficient-segmentation-networks Goto Github PK

efficient-segmentation-networks's People

Contributors

Stargazers

Watchers

Forkers

efficient-segmentation-networks's Issues

Recommend Projects

Recommend Topics

Recommend Org