naoto0804 / pytorch-inpainting-with-partial-conv Goto Github PK

View Code? Open in Web Editor NEW

575.0 20.0 132.0 2.05 MB

Unofficial pytorch implementation of 'Image Inpainting for Irregular Holes Using Partial Convolutions' [Liu+, ECCV2018]

License: MIT License

Python 100.00%

pytorch inpainting cnn

pytorch-inpainting-with-partial-conv's Introduction

pytorch-inpainting-with-partial-conv

Official implementation is released by the authors.

Note that this is an ongoing re-implementation and I cannot fully reproduce the results. Suggestions and PRs are welcome!

This is an unofficial pytorch implementation of a paper, Image Inpainting for Irregular Holes Using Partial Convolutions [Liu+, arXiv2018].

Requirements

Python 3.6+
Pytorch 0.4.1+

pip install -r requirements.txt

Usage

Preprocess

download Places2 and place it somewhere. The dataset should contain data_large, val_large, and test_large as the subdirectories. Don't forget to specify the root of the dataset by --root ROOT when using train.py or test.py
Generate masks by following [1] (saved under ./masks by default). Note that the way of the mask generation is different from the original work

python generate_data.py

Train

CUDA_VISIBLE_DEVICES=<gpu_id> python train.py

Fine-tune

CUDA_VISIBLE_DEVICES=<gpu_id> python train.py --finetune --resume <checkpoint_name>

Test

CUDA_VISIBLE_DEVICES=<gpu_id> python test.py --snapshot <snapshot_path>

Results

Here are some results from the test set after the training of 500,000 iterations and fine-tuning (freezing BN in encoder) of 500,000 iterations. The model is available here, but I don't ensure the quality. (Top to bottom: input, mask, image generated by the network, image which is combined with the original non-masked region of image, ground truth)

References

[1]: Unofficial implementation in Chainer

pytorch-inpainting-with-partial-conv's People

Contributors

Stargazers

Watchers

Forkers

minmax100 wanfengkai mengmengbai esube gegetang gwonan thehamsta hyzcn xinxin12345 junhocho a9i kumapowerliu szw2017 zebrajack micklexqg kanbo0409 seitaroshinagawa qoboty lincn gan-challenger tonyxia2016 jpatrickpark dsaplbg daghty zhangyuancv neuralnetworklab kira0096 huangwenjing1027 ipclab zghzgh1779 ts003428300 sunshine352 yejg2017 zzj0311 fireae yonghoonkwon jingyuanli001 zejizhu babyjia yaoman3 wingillis nekogami666666 gsygsy96 188zzoon zandaya mk-minchul xmjiayou lf-openthos qinwend 1165048017 jifenghu huahongzhang deepcharle xiusdk linghushaoxia videogorillas hsveh greysou1 jacobaustin123 zengyh1900 goulonggg shnitzelkiller wenjiawang0312 williamashbee jadielam ahwxz123 tsunghan-wu 18842462207 abexit xuwhale paperkites warlomak oliiveralien mijiacang tamwaiban mtlong intjun pandorasan jiajun-xiang angliunimelb hiroshi-kuriyama buiduchanh reveurrafc mallloc robertsamoilescu rensimon pongotwistleton cv-ip liuqi-liuqi diego-pedroso lbjcelsius runze-wang-sjtu wwenu violetamenendez tkone2018 tubbz-alt joyce725 etri-visualcommonsense jinwook-shim pin-hh

pytorch-inpainting-with-partial-conv's Issues

about CUDA_VISIBLE_DEVICES id

sorry for asking many questions, i tried to train the network but i got the following error

@naoto0804 yr help is really appreciated

Slight scaling issue in PartialConv function

Hi, thanks for your great project. I just wanted to point out a potential issue with the implementation of the PartialConv function here, which is easily spotted if you run the following:

size = (1, 1, 10, 10)
X = torch.ones(size) # > Input layer
Y = torch.ones(size) # > Mask layer (=all elements are good to go)
convH0 = torch.nn.Conv2d(1,1,3,1,1,bias=False)
with torch.no_grad(): # > Manually set the weights of the convolution kernel
    convH0.weight = nn.Parameter(torch.FloatTensor([[[[ 0.2273,  0.1403, -1.0889],
                                                      [-0.0351, -0.2992,  0.2029],
                                                      [ 0.0280,  0.2878,  0.5101]]]]))
output0 = convH0(X) # > Result from standard convolution kernel
PConv = PartialConv(1,1,3,1,1,bias=False) 
with torch.no_grad(): # > Set weights of PConv layer equal to conv. layer
    PConv.input_conv.weight = nn.Parameter(torch.FloatTensor([[[[ 0.2273,  0.1403, -1.0889],
                                                                [-0.0351, -0.2992,  0.2029],
                                                                [ 0.0280,  0.2878,  0.5101]]]]))
output1, mask1 = PConv(X,Y) # > Result from partial convolution layer

I would expect the result for both operations to be the same. However, output1=output0/9! The cause of the error lies in the following line:

pytorch-inpainting-with-partial-conv/net.py

Line 87 in 7f0aa4d

output_pre = (output - output_bias) / mask_sum + output_bias

where 'mask_sum' is a tensor mostly filled with the value 9. In the original papers, that corresponds to the sum(M) in the denominator. But what is missing is the sum(1) numerator, which should cancel this value of 9 again. I think it can be fixed if you compute the following in the __init__ part of PartialConv

self.sumI = kernel_size**2*in_channels

and then in the forward call you compute

output_pre = (output - output_bias) * self.sumI / mask_sum + output_bias

These changes [assuming a square kernel -- otherwise I suppose you could compute self.sumI by multiplying the shape of the weights or something like that] also correctly fix the results in case holes are present. That is, it would then be fully in line with the original paper.

I don't know how big the effect will be on the training, but they could be non-zero.

Oops. I only just now see that this is the same as issue #44 ! Well, this time with some more background then.

Pre-trained Model

Is the pre-trained model trained on the Places2 or ImageNet dataset?

Because in the net.py file, a pre-trained model is called (vgg16) and it's trained on ImageNet.

pytorch-inpainting-with-partial-conv/net.py

Line 35 in e04c84d

vgg16 = models.vgg16(pretrained=True)

And there is a python file called Places2.py, so I'm not sure on which dataset is the 100000.pth trained on.

Thank you for your time.

test input size

I have image masks with different sizes, but I find that I can't set the test input size randomly.

How to load Places2 dataset？

How to access the Places2 datasets in this code? Should I download or not? If download in advance, where does it need to place? thanks!

test.py uses Places2 Class incorrectly

Bug:

Places2-Class Signature in places.py

class Places2(torch.utils.data.Dataset):
    def __init__(self, img_root, mask_root, img_transform, mask_transform, split='train'): ....

How Places2 is called in test.py:

dataset_val = Places2(args.root, img_transform, mask_transform, 'val')

This misses the mask_root folder

Suggested Fix:
Either use a specific value in the image to create a mask, such as:

mask = np.zeros_like(img)
black_pixels_mask = np.all(img== [0, 0, 0], axis=-1)

or add a mask-root folder to the args

Results

Can you share some of your training results like 5000 iteration 10000 itertation and so on?

there is a patch in image corner when training?

I train my images, and I find that there is a patch like a QR code pattern on image corner
Sometimes disappear when image is not complex. And not always on the same location but on 4 corner.

Is there a worng understand in total variation?

I find this does not conform to the original paper’s method, I think the sum of the abs value should be taken into the Loss（tv）, and the tv loss is not the global difference of the whole picture, it just around the hole areas (P is the region of 1-pixel dilation of the hole region).

def total_variation_loss(image):
    # shift one pixel and get difference (for both x and y direction)
    loss = torch.mean(torch.abs(image[:, :, :, :-1] - image[:, :, :, 1:])) + \
        torch.mean(torch.abs(image[:, :, :-1, :] - image[:, :, 1:, :]))
    return loss

why change mask before output computation?

hi,
Thanks for the implementation. I want to know why you have decided to do this

pytorch-inpainting-with-partial-conv/net.py

Line 85 in 76ba1ee

mask_sum = output_mask.masked_fill_(no_update_holes, 1.0)

in your implementation when the original implementation details don't mention it.
http://masc.cs.gmu.edu/wiki/partialconv

Hi, where are the trained models in your project?

About Model Size

Thanks for your codes. I got a question that is it possible to compress the model (it is >393M now) to make it run faster？

when I running train.py

Traceback (most recent call last):
File "C:/Users/xxx/Downloads/pytorch-inpainting-with-partial-conv-master/train.py", line 85, in
num_workers=args.n_threads))
File "C:\Users\xxx\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 501, in iter
return _DataLoaderIter(self)
File "C:\Users\xxx\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 297, in init
self._put_indices()
File "C:\Users\xxx\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 345, in _put_indices
indices = next(self.sample_iter, None)
File "C:\Users\xxx\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\sampler.py", line 138, in iter
for idx in self.sampler:
File "C:/Users/xxx/Downloads/pytorch-inpainting-with-partial-conv-master/train.py", line 34, in loop
yield order[i]
IndexError: index 0 is out of bounds for axis 0 with size 0

Inquiry about LICENSE

I would like to know the License about pytorch-inpainting-with-partial-conv.
Will it be MIT LICENSE such as other your repository, or if there is any restriction, I would like to know for using this code.

Thank you for your great job.

bad examples

@naoto0804 hi， so appreciate with your great work. When I test my image with your pretrained model，it coms bad result:

how about yours result? can you help me?

Pretrained model issue

Hi,

I tried the pretrained model you shared here and run it using test.py code. However, there are so many artifacts in the results. Could you show me where is the problem?

requirements.txt is missing

The requirements.txt is mentioned in the README.md but is not in the repository.

regarding updating the mask step

thanks for sharing yr work
i have a question regarding how do u update the mask? i really don't get it from paper
your help is really appreciated

Is the test.py script wrong?

Should I have to provide the corresponding mask when testing?
i find no mask dataset root in the test.py, .
in the test.py:
dataset_val = Places2(args.root, img_transform, mask_transform, 'val') (no mask root)
in the train.py:
dataset_val = Places2(args.root, args.mask_root, img_tf, mask_tf, 'val') (with mask root)
so, how to implement 'test' exactly?

A problem in net.py

Thanks for your awesome reproduce work!

But, I think you have a inadvertent error in net.py.

output_pre = (output - output_bias) / mask_sum + output_bias

I think it should be as follows:

output_pre = (output - output_bias) / mask_sum * (self.kernel_size * self.kernel_size * self.in_channels) + output_bias

test

Why do I use your code to test that the output image has a mask edge and more than one part, such as red circles？

How output all results on the test set in Places2?

Hi, I have tested test.py file and it does work. Now, I try to test on
all images in Places (in fact, only 50 images for test), and the
evaluation.py file as follows (default 8 -> len(dataset)):

import torch
from torchvision.utils import make_grid
from torchvision.utils import save_image

from util.image import unnormalize


def evaluate(model, dataset, device, filename):
    image, mask, gt = zip(*[dataset[i] for i in range(len(dataset)])
    image = torch.stack(image)
    mask = torch.stack(mask)
    gt = torch.stack(gt)
    with torch.no_grad():
        output, _ = model(image.to(device), mask.to(device))
    output = output.to(torch.device('cpu'))
    output_comp = mask * image + (1 - mask) * output
    output_comp = unnormalize(output_comp)
  for j in range(len(dataset):
      each_output_comp = output_comp[j]
      absolut_file_name = os.path.join(save_result_dir, file_name[j])
      save_image(each_output_comp, absolut_file_name)

but it outputs errors:

Traceback (most recent call last):
  File "test.py", line 42, in <module>
    evaluate(model, dataset_val, device, save_result_dir)
  File "/home/user/FG/code/pytorch-inpainting-with-partial-conv/evaluation.py", line 15, in evaluate
    output, _ = model(image.to(device), mask.to(device))
  File "/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/FG/code/pytorch-inpainting-with-partial-conv/net.py", line 180, in forward
    h, h_mask = getattr(self, dec_l_key)(h, h_mask)
  File "/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/FG/code/pytorch-inpainting-with-partial-conv/net.py", line 117, in forward
    h, h_mask = self.conv(input, input_mask)
  File "/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/FG/code/pytorch-inpainting-with-partial-conv/net.py", line 87, in forward
    output_pre = (output - output_bias) / mask_sum + output_bias
RuntimeError: CUDA out of memory. Tried to allocate 640.00 MiB (GPU 0; 10.73 GiB total capacity; 9.23 GiB already allocated; 399.62 MiB free; 370.56 MiB cached)

Please help me. How to solve this such that I can test several thousands
images. Thank you.

finetune lr parameter overwritten on resuming (last fix seems insufficient)

pytorch-inpainting-with-partial-conv/train.py

Line 101 in 0ef695a

args.resume, [('model', model)], [('optimizer', optimizer)])

When you want to resume your model for finetuning, you overwritting new optimizer with the old one, and smaller lr is not applied. You would need something like this:

    for param_group in optimizer.param_groups:
        param_group['lr'] = lr

btw. finetuning with bn=False and smaller lr is giving nice results in my experiments with similar UNet architecture. I would suggest you to try it, if you didn't do that yet.

Running on CPU

Is it possible to use CPU for training instead of CUDA GPU?

where is the mask_root?

there is parser.add_argument 'mask_root' in the train.py, but i did not find it in the cloned files. is it needed?

Could you please mention the weight of network?

Blurry problem in training

Hi @naoto0804 , Thanks for your helpful project. I'm opening this issue to ask whether you have meeted the problem of blurry artifacts in the training process. It seems that the results in 30w iter are still blurry. Could you give some hints for the results in the training process?

can it be used for arbitrary inpaiting(including random positions and shapes)?

hi, thanks for your work.
as the title, can it be used for arbitrary inpaiting(including random positions and shapes)?

Pytorch version issue & Pretrained weight

Hello,

First of all, thanks for sharing your code.

I have several issues regarding to pytorch version and pretrained weight sharing request.

Due to dependency problem of another code, I am using now pytorch 0.4.0 version not 0.4.1 which is noted as a required version for this repo.

However, I found that by replacing F.interpolate function with F.upsample function makes everything fine. Is it okay to use this repo as this way?

Moreover, could you please share your pretrained weight ?

Looking forward to your reply.

Thanks.

About resizing the mask

Thanks for your sharing！@naoto0804
I found that you generate 512 * 512 masks and resize them to 256 * 256 using transforms.Resize(size=256).
However, the default downsampling method is the linear interpolation, which causes the value of the resized mask to be not only 0 and 1, but also other values in [0,1]. The resized mask is not exactly the mask defined in the original paper.
Therefore, when I use 256 * 256 mask ( without resizing, i.e. its value only contains 0 and 1 ) for testing the 1000000.pth pre-trained model, the result is not good.
To avoid this, maybe resizing the mask with the nearest neighbor downsampling method when training the network works.

train.py error

After installing torch0.4 in python3.6, I run the train.py. At the beginning, error was

So, I add interplolate copy from net like below into functional.py

def interpolate(input, size=None, scale_factor=None, mode='nearest', align_corners=None):
r"""Down/up samples the input to either the given :attr:size or the given
:attr:scale_factor
The algorithm used for interpolation is determined by :attr:mode.
Currently temporal, spatial and volumetric sampling are supported, i.e.
expected inputs are 3-D, 4-D or 5-D in shape.
The input dimensions are interpreted in the form:
mini-batch x channels x [optional depth] x [optional height] x width.
The modes available for resizing are: nearest, linear (3D-only),
bilinear (4D-only), trilinear (5D-only), area
Args:
input (Tensor): the input tensor
size (int or Tuple[int] or Tuple[int, int] or Tuple[int, int, int]):
output spatial size.
scale_factor (float or Tuple[float]): multiplier for spatial size. Has to match input size if it is a tuple.
mode (string): algorithm used for upsampling:
'nearest' | 'linear' | 'bilinear' | 'trilinear' | 'area'. Default: 'nearest'
align_corners (bool, optional): if True, the corner pixels of the input
and output tensors are aligned, and thus preserving the values at
those pixels. This only has effect when :attr:mode is linear,
bilinear, or trilinear. Default: False
.. warning::
With align_corners = True, the linearly interpolating modes
(linear, bilinear, and trilinear) don't proportionally align the
output and input pixels, and thus the output values can depend on the
input size. This was the default behavior for these modes up to version
0.3.1. Since then, the default behavior is align_corners = False.
See :class:~torch.nn.Upsample for concrete examples on how this
affects the outputs.
"""
from numbers import Integral
from .modules.utils import _ntuple
def _check_size_scale_factor(dim):
if size is None and scale_factor is None:
raise ValueError('either size or scale_factor should be defined')
if size is not None and scale_factor is not None:
raise ValueError('only one of size or scale_factor should be defined')
if scale_factor is not None and isinstance(scale_factor, tuple)
and len(scale_factor) != dim:
raise ValueError('scale_factor shape must match input shape. '
'Input is {}D, scale_factor size is {}'.format(dim, len(scale_factor)))
def _output_size(dim):
_check_size_scale_factor(dim)
if size is not None:
return size
scale_factors = _ntuple(dim)(scale_factor)
# math.floor might return float in py2.7
return [int(math.floor(input.size(i + 2) * scale_factors[i])) for i in range(dim)]
if mode in ('nearest', 'area'):
if align_corners is not None:
raise ValueError("align_corners option can only be set with the "
"interpolating modes: linear | bilinear | trilinear")
else:
if align_corners is None:
warnings.warn("Default upsampling behavior when mode={} is changed "
"to align_corners=False since 0.4.0. Please specify "
"align_corners=True if the old behavior is desired. "
"See the documentation of nn.Upsample for details.".format(mode))
align_corners = False
if input.dim() == 3 and mode == 'nearest':
return torch._C._nn.upsample_nearest1d(input, _output_size(1))
elif input.dim() == 4 and mode == 'nearest':
return torch._C._nn.upsample_nearest2d(input, _output_size(2))
elif input.dim() == 5 and mode == 'nearest':
return torch._C._nn.upsample_nearest3d(input, _output_size(3))
elif input.dim() == 3 and mode == 'area':
return adaptive_avg_pool1d(input, _output_size(1))
elif input.dim() == 4 and mode == 'area':
return adaptive_avg_pool2d(input, _output_size(2))
elif input.dim() == 5 and mode == 'area':
return adaptive_avg_pool3d(input, _output_size(3))
elif input.dim() == 3 and mode == 'linear':
return torch._C._nn.upsample_linear1d(input, _output_size(1), align_corners)
elif input.dim() == 3 and mode == 'bilinear':
raise NotImplementedError("Got 3D input, but bilinear mode needs 4D input")
elif input.dim() == 3 and mode == 'trilinear':
raise NotImplementedError("Got 3D input, but trilinear mode needs 5D input")
elif input.dim() == 4 and mode == 'linear':
raise NotImplementedError("Got 4D input, but linear mode needs 3D input")
elif input.dim() == 4 and mode == 'bilinear':
return torch._C._nn.upsample_bilinear2d(input, _output_size(2), align_corners)
elif input.dim() == 4 and mode == 'trilinear':
raise NotImplementedError("Got 4D input, but trilinear mode needs 5D input")
elif input.dim() == 5 and mode == 'linear':
raise NotImplementedError("Got 5D input, but linear mode needs 3D input")
elif input.dim() == 5 and mode == 'bilinear':
raise NotImplementedError("Got 5D input, but bilinear mode needs 4D input")
elif input.dim() == 5 and mode == 'trilinear':
return torch._C._nn.upsample_trilinear3d(input, _output_size(3), align_corners)
else:
raise NotImplementedError("Input Error: Only 3D, 4D and 5D input Tensors supported"
" (got {}D) for the modes: nearest | linear | bilinear | trilinear"
" (got {})".format(input.dim(), mode))
Now the error is Could you help me?

Pretrained model for testing issue

Hi
I test the pretrained model you shared online using test.py but the results are very different from yours (as you can see above). There are many artifacts in the masked region. Could you please help me figure this out? Maybe I missed something during the implementation?

Thank you very much!

Resume from 1000000.pth pretrained checkpoint seems not working?

Problem while using net.py

when i tried to link your model and loss function into a project of mine i encountered thus issue
RuntimeError: Sizes of tensors must match except in dimension 1. Got 10 and 9 in dimension 2 (The offending index is 1)
line h = torch.cat([h, h_dict[enc_h_key]], dim=1)
the sizes of the tensors in order were
torch.Size([32, 512, 10, 10])
torch.Size([32, 512, 9, 9])
can you help me fix it
thanks in advance

Do you have to provide the corresponding mask when testing?

I watched the introduction video，it can generate pictures based on different parts wiped by the user.

Accuracy Function

Hello

I'm wondering if you have used an accuracy function for determining the accuracy during the training because I couldn't find any metric function to tell how good is the output.

Also, does the code run over the whole dataset in each iteration (epoch) or does it pick 8 images randomly each iteration? (considering batch size=8)

Thank you in advance!

How to test this code on my own dataset?

Hi, I want to know how to use this code to test on my own dataset? I try to run the following code "python test.py --snapshot ./1000000.pth --root ./dataset", where "./dataset" is my own dataset (.jpg). But it outputs the error:
Traceback (most recent call last):
File "test.py", line 32, in
dataset_val = Places2(args.root, img_transform, mask_transform, 'val')
File "/home/user/FG/code/pytorch-inpainting-with-partial-conv/places2.py", line 21, in init
self.mask_paths = glob('{:s}/.jpg'.format(mask_root))
TypeError: unsupported format string passed to Compose.format

How to solve this. Very thank you.

why the generated masks are different from that in the original paper?

thanks for your work. i note that the generated masks using the code are different from the the original paper.
the generated masks in the paper are as follows:

the generated masks using the code are as follows:

do you follow the method of the paper or just use your own way for generating masks?

misunderstading about Unet architecture in your work

Thanks for your awesome reproduce work! while reading your code, I am a little curious about the number of Unet layers. According to your code in net.py, you use 14 layers Unet in your work:

layer_size= 7 
self.layer_size = layer_size
        self.enc_1 = PCBActiv(input_channels, 64, bn=False, sample='down-7')
        self.enc_2 = PCBActiv(64, 128, sample='down-5')
        self.enc_3 = PCBActiv(128, 256, sample='down-5')
        self.enc_4 = PCBActiv(256, 512, sample='down-3')
        for i in range(4, self.layer_size):
            name = 'enc_{:d}'.format(i + 1)
            setattr(self, name, PCBActiv(512, 512, sample='down-3')

It seems a little different from the paper, since the paper uses 16 layers totally, both encoder and decoder are 8 layers equally.
I am wandering if its your trick to train size 256*256 images? or its just a inadvertent error here? Thank you for your time.

Loss Functions

Hello

Why does the code sum the loss functions together?

pytorch-inpainting-with-partial-conv/train.py

Lines 114 to 117 in e04c84d

    
           loss = 0.0 
        
           for key, coef in opt.LAMBDA_DICT.items(): 
        
               value = coef * loss_dict[key] 
        
               loss += value

I appreciate your help :)

can not find the dataset Places2

Trained model

Hi!
May you share traineed model?

The input image should be a square with equal height and weight? How to keep the output image the same size as input?

Hi @naoto0804 , I finally got the code running! Thank you for sharing.

I have two problems in using it.

how to break the rule for the setted square size? I mean that I have to input a image with a 512x512, the same height and width. Is it possible to input any size of image?
the output image are so small grid, Is it possible to output a image the same size with the input?

The question about the size of the mask

I am sorry to trouble you. I cannot understand the size of the mask and the feature here (https://github.com/naoto0804/pytorch-inpainting-with-partial-conv/blob/master/net.py#L74). The size of the mask here is 1? The size of the feature here is as the same as kernel or 1? Is the result of the formula here the result of every pixel?
Thank you very much!

Suggestions: Efficient Partial Convolution

Firstly, I would like to thank you for your implementation.

Below is my version. The main improvement is in using PyTorch's masked_fill_. I guess this is the fastest method without creating a customized C++ function.

class PartialConv(nn.Module):
    # reference:
    # Image Inpainting for Irregular Holes Using Partial Convolutions
    # http://masc.cs.gmu.edu/wiki/partialconv/show?time=2018-05-24+21%3A41%3A10
    # https://github.com/naoto0804/pytorch-inpainting-with-partial-conv/blob/master/net.py
    # https://github.com/SeitaroShinagawa/chainer-partial_convolution_image_inpainting/blob/master/common/net.py
    # mask is binary, 0 is holes; 1 is not
    def __init__(self, in_channels, out_channels, kernel_size, stride=1,
                 padding=0, dilation=1, groups=1, bias=True):
        super(PartialConv, self).__init__()
        random.seed(0)
        self.feature_conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride,
                                      padding, dilation, groups, bias)
        nn.init.kaiming_normal_(self.feature_conv.weight)

        self.mask_conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride,
                                   padding, dilation, groups, bias=False)
        torch.nn.init.constant_(self.mask_conv.weight, 1.0)

        for param in self.mask_conv.parameters():
            param.requires_grad = False

    def forward(self, args):
        x, mask = args
        output = self.feature_conv(x * mask)
        if self.feature_conv.bias is not None:
            output_bias = self.feature_conv.bias.view(1, -1, 1, 1).expand_as(output)
        else: 
            output_bias = torch.zeros_like(output)

        with torch.no_grad():
            output_mask = self.mask_conv(mask)  # mask sums

        no_update_holes = output_mask == 0
        # because those values won't be used , assign a easy value to compute
        mask_sum = output_mask.masked_fill_(no_update_holes, 1.0)

        output_pre = (output - output_bias) / mask_sum + output_bias
        output = output_pre.masked_fill_(no_update_holes, 0.0)

        new_mask = torch.ones_like(output)
        new_mask = new_mask.masked_fill_(no_update_holes, 0.0)
        return output, new_mask

Benchmark:

Your code:
Runtime: 1.7311532497406006
Memory increment on a forward pass: 125.9 MiB

My code:
Runtime: 0.3832552433013916
Memory increment on a forward pass: 57.1 MiB

Output feature difference: 0.0
Mask output difference: 0.0

Codes for the benchmark

import time
from memory_profiler import profile
import torch
from torch import nn
import random
from torch.nn import functional as F


def proftime(func):
    def timed(*args, **kw):
        ts = time.time()
        result = func(*args, **kw)
        te = time.time()
        print(f"Runtime: {te-ts}")
        return result

    return timed


class PConv2d(nn.Module):
    def __init__(self, in_ch, out_ch, kernel_size, stride=1, padding=0):
        super().__init__()
        random.seed(0)
        self.conv2d = nn.Conv2d(in_ch, out_ch, kernel_size, stride, padding)
        nn.init.kaiming_normal_(self.conv2d.weight)
        self.mask2d = nn.Conv2d(in_ch, out_ch, kernel_size, stride, padding)
        self.mask2d.weight.data.fill_(1.0)
        self.mask2d.bias.data.fill_(0.0)

        # mask is not updated
        for param in self.mask2d.parameters():
            param.requires_grad = False

    @profile
    @proftime
    def forward(self, input, input_mask):
        # http://masc.cs.gmu.edu/wiki/partialconv
        # C(X) = W^T * X + b, C(0) = b, D(M) = 1 * M + 0 = sum(M)
        # W^T* (M .* X) / sum(M) + b = [C(M .* X) – C(0)] / D(M) + C(0)

        input_0 = input.new_zeros(input.size())

        output = F.conv2d(
            input * input_mask, self.conv2d.weight, self.conv2d.bias,
            self.conv2d.stride, self.conv2d.padding, self.conv2d.dilation,
            self.conv2d.groups)

        output_0 = F.conv2d(input_0, self.conv2d.weight, self.conv2d.bias,
                            self.conv2d.stride, self.conv2d.padding,
                            self.conv2d.dilation, self.conv2d.groups)

        with torch.no_grad():
            output_mask = F.conv2d(
                input_mask, self.mask2d.weight, self.mask2d.bias,
                self.mask2d.stride, self.mask2d.padding, self.mask2d.dilation,
                self.mask2d.groups)

        n_z_ind = (output_mask != 0.0)
        z_ind = (output_mask == 0.0)  # skip all the computation

        output[n_z_ind] = \
            (output[n_z_ind] - output_0[n_z_ind]) / output_mask[n_z_ind] + \
            output_0[n_z_ind]
        output[z_ind] = 0.0

        output_mask[n_z_ind] = 1.0
        output_mask[z_ind] = 0.0

        return output, output_mask


class PartialConv(nn.Module):
    # reference:
    # Image Inpainting for Irregular Holes Using Partial Convolutions
    # http://masc.cs.gmu.edu/wiki/partialconv/show?time=2018-05-24+21%3A41%3A10
    # https://github.com/naoto0804/pytorch-inpainting-with-partial-conv/blob/master/net.py
    # https://github.com/SeitaroShinagawa/chainer-partial_convolution_image_inpainting/blob/master/common/net.py
    # mask is binary, 0 is holes; 1 is not
    def __init__(self, in_channels, out_channels, kernel_size, stride=1,
                 padding=0, dilation=1, groups=1, bias=True):
        super(PartialConv, self).__init__()
        random.seed(0)
        self.feature_conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride,
                                      padding, dilation, groups, bias)
        nn.init.kaiming_normal_(self.feature_conv.weight)

        self.mask_conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride,
                                   padding, dilation, groups, bias=False)
        torch.nn.init.constant_(self.mask_conv.weight, 1.0)

        for param in self.mask_conv.parameters():
            param.requires_grad = False

    @profile
    @proftime
    def forward(self, args):
        x, mask = args
        output = self.feature_conv(x * mask)
        if self.feature_conv.bias is not None:
            output_bias = self.feature_conv.bias.view(1, -1, 1, 1).expand_as(output)
        else:
            output_bias = torch.zeros_like(output)

        with torch.no_grad():
            output_mask = self.mask_conv(mask)  # mask sums

        no_update_holes = output_mask == 0
        # because those values won't be used , assign a easy value to compute
        mask_sum = output_mask.masked_fill_(no_update_holes, 1.0)

        output_pre = (output - output_bias) / mask_sum + output_bias
        output = output_pre.masked_fill_(no_update_holes, 0.0)

        new_mask = torch.ones_like(output)
        new_mask = new_mask.masked_fill_(no_update_holes, 0.0)
        return output, new_mask

# Your method
model1 = PConv2d(in_ch=256, out_ch=256, kernel_size=3, stride=1, padding=1)

# My method
model2 = PartialConv(in_channels=256, out_channels=256, kernel_size=3, stride=1,
                     padding=1, dilation=1, groups=1, bias=True)

# mask sure all learnable convolutions share the same weights 
model2.feature_conv.weight.data.copy_(model1.conv2d.weight.data)
model2.feature_conv.bias.data.copy_(model1.conv2d.bias.data)
random.seed(0)
x1 = torch.randn(1, 256, 64, 64)
x2 = x1.clone()
mask1 = torch.ones_like(x1)
mask1[:, :, 25:50, 25:50] = 0
mask2 = mask1.clone()
y1 = model1.forward(x1, mask1)
y2 = model2.forward((x2, mask2))

print(f"Output feature output difference {torch.sum(y2[0] - y1[0])}")
print(f'Mask output difference {torch.sum(y2[1] - y1[1])}')

Some comments:

I find you are using batch norm after partial convolution. I would suggest disabling the bias term in the convolution right before batch norm which also include a bias term and offset the convolution's bias.
In addition, I prefer in place batch norm that is able to save around 20% - 40% memory usage while maintaining fast computation.

In your training script, the default learning rate is 2e-4. I highly recommend using cyclical learning rate and PyTorch's implementation . I am using 0.04-0.08 learning rate. If you are able to train a large batch size, the learning rate can be moving on [0.1, 1] or even larger, which is called super convergence.

Personal ad:

I am using partial convolution to create an manga inpainting tool: use image segmentation to figure out text locations, and then use inpainting to repair background & color. Suggestions and comments are very welcome.

can I just run test.py without dowload Places2 dataset?

Can I just download the trained model file and won't download the dataset? The dataset is really too big!! I just wanna to test the result(some pictures I found from interet for random for testing), can it work?

I downloaded the 1000000.pth file your trained and placed it in ./snapshots/default/1000000.pth
then I change the test.py to parser.add_argument('--snapshot', type=str, default='./snapshots/default/1000000.pth')

I made a folder structure like(without any pictures or dataset):