microsoft / styleswin Goto Github PK

[CVPR 2022] StyleSwin: Transformer-based GAN for High-resolution Image Generation

Home Page: https://arxiv.org/abs/2112.10762

License: MIT License

Python 86.20% C++ 1.62% Cuda 12.18%

computer-vision deep-learning deep-neural-networks pytorch generative-adversarial-network gans image-generation image-synthesis styleswin transformer

styleswin's Introduction

StyleSwin

This repo is the official implementation of "StyleSwin: Transformer-based GAN for High-resolution Image Generation" (CVPR 2022).

By Bowen Zhang, Shuyang Gu, Bo Zhang, Jianmin Bao, Dong Chen, Fang Wen, Yong Wang and Baining Guo.

Abstract

Despite the tantalizing success in a broad of vision tasks, transformers have not yet demonstrated on-par ability as ConvNets in high-resolution image generative modeling. In this paper, we seek to explore using pure transformers to build a generative adversarial network for high-resolution image synthesis. To this end, we believe that local attention is crucial to strike the balance between computational efficiency and modeling capacity. Hence, the proposed generator adopts Swin transformer in a style-based architecture. To achieve a larger receptive field, we propose double attention which simultaneously leverages the context of the local and the shifted windows, leading to improved generation quality. Moreover, we show that offering the knowledge of the absolute position that has been lost in window-based transformers greatly benefits the generation quality. The proposed StyleSwin is scalable to high resolutions, with both the coarse geometry and fine structures benefit from the strong expressivity of transformers. However, blocking artifacts occur during high-resolution synthesis because performing the local attention in a block-wise manner may break the spatial coherency. To solve this, we empirically investigate various solutions, among which we find that employing a wavelet discriminator to examine the spectral discrepancy effectively suppresses the artifacts. Extensive experiments show the superiority over prior transformer-based GANs, especially on high resolutions, e.g., 1024x1024. The StyleSwin, without complex training strategies, excels over StyleGAN on CelebA-HQ 1024x1024, and achieves on-par performance on FFHQ 1024x1024, proving the promise of using transformers for high-resolution image generation.

Quantitative Results

Dataset	Resolution	FID	Pretrained Model
FFHQ	256x256	2.81	Google Drive/Azure Storage
LSUN Church	256x256	2.95	Google Drive/Azure Storage
CelebA-HQ	256x256	3.25	Google Drive/Azure Storage
FFHQ	1024x1024	5.07	Google Drive/Azure Storage
CelebA-HQ	1024x1024	4.43	Google Drive/Azure Storage

Requirements

To install the dependencies:

python -m pip install -r requirements.txt

Generating image samples with pretrained model

To generate 50k image samples of resolution 1024 and evaluate the fid score:

python -m torch.distributed.launch --nproc_per_node=1 train_styleswin.py --sample_path /path_to_save_generated_samples --size 1024 --ckpt /path/to/checkpoint --eval --val_num_batches 12500 --val_batch_size 4 --eval_gt_path /path_to_real_images_50k

To generate 50k image samples of resolution 256 and evaluate the fid score:

python -m torch.distributed.launch --nproc_per_node=1 train_styleswin.py --sample_path /path_to_save_generated_samples --size 256 --G_channel_multiplier 2 --ckpt /path/to/checkpoint --eval --val_num_batches 12500 --val_batch_size 4 --eval_gt_path /path_to_real_images_50k

Training

Data preparing

When training FFHQ and CelebA-HQ, we use ImageFolder datasets. The data structure is like this:

FFHQ
├── images
│  ├── 000001.png
│  ├── ...

When training LSUN Church, please follow stylegan2-pytorch to create a lmdb dataset first. After this, the data structure is like this:

LSUN Church
├── data.mdb
└── lock.mdb

FFHQ-1024

To train a new model of FFHQ-1024 from scratch:

python -m torch.distributed.launch --nproc_per_node=8 train_styleswin.py --batch 2 --path /path_to_ffhq_1024 --checkpoint_path /tmp --sample_path /tmp --size 1024 --D_lr 0.0002 --D_sn --ttur --eval_gt_path /path_to_ffhq_real_images_50k --lr_decay --lr_decay_start_steps 600000

CelebA-HQ 1024

To train a new model of CelebA-HQ 1024 from scratch:

python -m torch.distributed.launch --nproc_per_node=8 train_styleswin.py --batch 2 --path /path_to_celebahq_1024 --checkpoint_path /tmp --sample_path /tmp --size 1024 --D_lr 0.0002 --D_sn --ttur --eval_gt_path /path_to_celebahq_real_images_50k

FFHQ-256

To train a new model of FFHQ-256 from scratch:

python -m torch.distributed.launch --nproc_per_node=8 train_styleswin.py --batch 4 --path /path_to_ffhq_256 --checkpoint_path /tmp --sample_path /tmp --size 256 --G_channel_multiplier 2 --bcr --D_lr 0.0002 --D_sn --ttur --eval_gt_path /path_to_ffhq_real_images_50k --lr_decay --lr_decay_start_steps 775000 --iter 1000000

CelebA-HQ 256

To train a new model of CelebA-HQ 256 from scratch:

python -m torch.distributed.launch --nproc_per_node=8 train_styleswin.py --batch 4 --path /path_to_celebahq_256 --checkpoint_path /tmp --sample_path /tmp --size 256 --G_channel_multiplier 2 --bcr --r1 5 --D_lr 0.0002 --D_sn --ttur --eval_gt_path /path_to_celebahq_real_images_50k --lr_decay --lr_decay_start_steps 500000

LSUN Church 256

To train a new model of LSUN Church 256 from scratch:

python -m torch.distributed.launch --nproc_per_node=8 train_styleswin.py --batch 4 --path /path_to_lsun_church_256 --checkpoint_path /tmp --sample_path /tmp --size 256 --G_channel_multiplier 2 --use_flip --r1 5 --lmdb --D_lr 0.0002 --D_sn --ttur --eval_gt_path /path_to_lsun_church_real_images_50k --lr_decay --lr_decay_start_steps 1300000 --iter 1500000

Notice: When training on 16 GB GPUs, you could add --use_checkpoint to save GPU memory.

Qualitative Results

Image samples of FFHQ-1024 generated by StyleSwin:

Image samples of CelebA-HQ 1024 generated by StyleSwin:

Latent code interpolation examples of FFHQ-1024 between the left-most and the right-most images:

Citing StyleSwin

@misc{zhang2021styleswin,
      title={StyleSwin: Transformer-based GAN for High-resolution Image Generation}, 
      author={Bowen Zhang and Shuyang Gu and Bo Zhang and Jianmin Bao and Dong Chen and Fang Wen and Yong Wang and Baining Guo},
      year={2021},
      eprint={2112.10762},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Responsible AI Considerations

Our work does not directly modify the exiting images which may alter the identity or expression of the people. We discourage the use of our work in such applications as it is not designed to do so. We have quantitatively verified that the proposed method does not show evident disparity, on gender and ages as the model mostly follows the dataset distribution; however, we encourage additional care if you intend to use the system on certain demographic groups. We also encourage use of fair and representative data when training on customized data. We caution that the high-resolution images produced by our model may potentially be misused for impersonating humans and viable solutions so avoid this include adding tags or watermarks when distributing the generated photos.

Acknowledgements

This code borrows heavily from stylegan2-pytorch and Swin-Transformer. We also thank the contributors of code Positional Encoding in GANs, DiffAug, StudioGAN and GIQA.

Maintenance

This is the codebase for our research work. Please open a GitHub issue for any help. If you have any questions regarding the technical details, feel free to contact [email protected] or [email protected].

License

The codes and the pretrained model in this repository are under the MIT license as specified by the LICENSE file. We use our labeled dataset to train the scratch detection model.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

styleswin's People

Contributors

Stargazers

Watchers

Forkers

peterzhousz taki0112 ml-lab ak391 ishine mfkiwl evizero g-arj yunfei920406 stevenwalton peterouzh zzr525 dl-vit excurl vccheng2001 cv-synthesis augustlee93 inarikami liuqinglong110 nojano kishankancharagunta facevoid ca-joe-yang hyunyongjeon wozaimoyu leonidalekseev hebychen eejuncao diningsystem roqvist miguelgondu azelaicacid jimmysue eng-class pkulwj1994 z216z arindamchoudhury advance-deep-learning muajiitsai iumemon5 yinyin-llll francksoma birch-san shreyakurdukar dtbinh zorrowm

styleswin's Issues

你好

你好，我使用python3.7运行requirements命令安装好环境后一直报arch_list[-1] += '+PTX’错误，请问这是什么原因？

pre_training and fine-tune

Hi, thank you so much for sharing your code！
I have a question. I'd like to transfer the model to medical images.
But I don't I retrain directly or I need load the pre-training model and fine-tune it?
What do you think about it? Looking forward to your reply!

Doubts about the networks parameters and FLOPs

Hi, Fancy! Thanks for your excellent work.

StyleSwin synthesizes a 1024x1024 image with 40.86M params and 50.90B FLOPs, as shown in the paper of Table 6.

But I reproduced the results by running:

from thop import profile
flops, params = profile(generator, (noise,))             # noise: torch.Size([1, 512])
print('flops: ', flops / 1000000000, 'params: ', params / 1000000)
flops, params = profile(discriminator, (real_img,))  # real_img: torch.Size([1, 3, 1024, 1024])
print('flops: ', flops / 1000000000, 'params: ', params / 1000000)

The generator params are 28.28M with 47.36B FLOPs.
The discriminator params are 27.73M with 50.19B FLOPs.

I don't know where the problem is. Looking forward to your reply!

cuda memory

I train StyleSwin for FFHQ 1024 resolution. But I got this error:

RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 3; 23.65 GiB total capacity; 19.49 GiB already allocated; 474.00 MiB free; 22.10 GiB reserved in total by PyTorch) If res
erved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I'm using 4 RTX 24G GPUs, and there'is not other programs. The batch is set to 2. Why is not enough this to train StyleSwin?

Betas

Hello, i forgot to ask in the previous issue, but what is the intuition behind beta1=0.0 and beta2=0.99? I've seen it in a couple more projects (Such as CIPS), and i always wondered how did they come up with these values (As usually, GANs have beta1=0.5 and beta2=0.999). Is there some property of these values that helps training? Or is it just betas that seemed to work the most?

Training time

How long should i expect it to train on 256x256 resolution? I only have 1 GPU, if that helps

About Automatic Mixed Precision

Thank you for awesome research and code release!
Is there any reason that you don't use automatic mixed precision package of pytorch?
Did it lower the performance of model when you use it?

Query: How to save the generated samples

Thank you authors for sharing the interesting work you have done in deep generative modeling. I have a rather small doubt, this is regarding the command to generate samples.

python -m torch.distributed.launch --nproc_per_node=1 train_styleswin.py --sample_path /path_to_save_generated_samples --size 256 --G_channel_multiplier 2 --ckpt /path/to/checkpoint --eval --val_num_batches 12500 --val_batch_size 4 --eval_gt_path /path_to_real_images_50k

Here, I wanted to know what should bepath_to_save_generated_samples?

I am getting the following error message:

Traceback (most recent call last):
  File "train_styleswin.py", line 382, in <module>
    os.mkdir(args.sample_path)
FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/MyDrive/Repositories/StyleSwin/StyleSwin_generated_samples/samples'

I made a directory in the cloned repository called StyleSwin_generated_samples for saving the samples.

Training Artifacts

I am training with my own custom faces dataset but I am getting some artifacts during training, is this normal and eventually disappear with longer training, or have I done something wrong?

when release the train code？

Query: 64x64 Datasets

Thank you authors for sharing the interesting work you have done in deep generative modeling. I have a rather small doubt regarding changing the architecture for training smaller 64x64 or 32x32 datasets. It would be great if you could guide me.

Query: Samples in grid

Thank you for the project. I wanted to know if one has saved the generated images in a directory, how to have them in a grid N*M dimensions for better analysis.

Effect of equalized learning rate in generator architecture

Hi, thanks for this great work!

In generator code, mapping network and AdaIN uses EqualLinear from StyleGAN2, and transformer block uses nn.Linear.

I know this configuration may follow the original implementation of mapping network and attention block,
but I wonder if this component affects the image generation performance.

E.g. FID when using EqualLinear in qkv of attention block

Do you have any idea of the effects of equalized learning rate in transformer block?

Thanks,

suppress artifacts

Hi, Fancy! Thanks for your excellent work.
I would like to ask which part of the code uses Haar wavelet to suppress artifacts?

Size mismatch

/home/jirib/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects --local_rankargument to be set, please change it to read fromos.environ['LOCAL_RANK']` instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

warnings.warn(
/home/jirib/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1634272068185/work/aten/src/ATen/native/TensorShape.cpp:2157.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
load model: samples000001.pt
Traceback (most recent call last):
File "/home/jirib/Desktop/StyleSwin-main/train_styleswin.py", line 410, in
generator.load_state_dict(ckpt["g"], strict=False)
File "/home/jirib/anaconda3/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1482, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Generator:
size mismatch for to_rgbs.5.conv.weight: copying a param with shape torch.Size([3, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([3, 32, 1, 1]).
`
Hello, i modified the generator and discriminator, then trained the model (Was trying something out). When i then go on to load the model, this error message pops out

Missing attribution for `Inceptionv3`

https://github.com/microsoft/StyleSwin/blob/main/utils/inception.py seems to be an exact copy, without any attribution, of https://github.com/mseitzer/pytorch-fid/blob/master/src/pytorch_fid/inception.py
with the exception of adding a copyright text @ Microsoft

run in colab

After installtion libraries in google colab and using my own images, I encountered this message.

/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py:188: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects --local_rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

FutureWarning,
WARNING:torch.distributed.run:

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1906, in _run_ninja_build
env=env)
File "/usr/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "train_styleswin.py", line 27, in
from models.discriminator import Discriminator
File "/content/StyleSwin/models/discriminator.py", line 7, in
from op import FusedLeakyReLU, upfirdn2d
File "/content/StyleSwin/op/init.py", line 4, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/StyleSwin/op/fused_act.py", line 19, in
os.path.join(module_path, "fused_bias_act_kernel.cu"),
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1296, in load
keep_intermediates=keep_intermediates)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1518, in _jit_compile
is_standalone=is_standalone)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1626, in _write_ninja_file_and_build_library
error_prefix=f"Error building extension '{name}'")
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'fused': [1/2] c++ -MMD -MF fused_bias_act.o.d -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /usr/local/lib/python3.7/dist-packages/torch/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.7/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /content/StyleSwin/op/fused_bias_act.cpp -o fused_bias_act.o
FAILED: fused_bias_act.o
c++ -MMD -MF fused_bias_act.o.d -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /usr/local/lib/python3.7/dist-packages/torch/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.7/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.7/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /content/StyleSwin/op/fused_bias_act.cpp -o fused_bias_act.o
In file included from /usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/Device.h:4,
from /usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/api/include/torch/python.h:8,
from /usr/local/lib/python3.7/dist-packages/torch/include/torch/extension.h:6,
from /content/StyleSwin/op/fused_bias_act.cpp:4:
/usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/python_headers.h:12:10: fatal error: Python.h: No such file or directory
12 | #include <Python.h>
| ^~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.

Traceback (most recent call last):
File "train_styleswin.py", line 27, in
from models.discriminator import Discriminator
File "/content/StyleSwin/models/discriminator.py", line 7, in
from op import FusedLeakyReLU, upfirdn2d
File "/content/StyleSwin/op/init.py", line 4, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/StyleSwin/op/fused_act.py", line 19, in
os.path.join(module_path, "fused_bias_act_kernel.cu"),
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1296, in load
keep_intermediates=keep_intermediates)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
ImportError: /root/.cache/torch_extensions/py37_cu117/fused/fused.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "train_styleswin.py", line 27, in
from models.discriminator import Discriminator
File "/content/StyleSwin/models/discriminator.py", line 7, in
from op import FusedLeakyReLU, upfirdn2d
File "/content/StyleSwin/op/init.py", line 4, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/StyleSwin/op/fused_act.py", line 19, in
os.path.join(module_path, "fused_bias_act_kernel.cu"),
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1296, in load
keep_intermediates=keep_intermediates)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
ImportError: /root/.cache/torch_extensions/py37_cu117/fused/fused.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "train_styleswin.py", line 27, in
from models.discriminator import Discriminator
File "/content/StyleSwin/models/discriminator.py", line 7, in
from op import FusedLeakyReLU, upfirdn2d
File "/content/StyleSwin/op/init.py", line 4, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/StyleSwin/op/fused_act.py", line 19, in
os.path.join(module_path, "fused_bias_act_kernel.cu"),
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1296, in load
keep_intermediates=keep_intermediates)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
ImportError: /root/.cache/torch_extensions/py37_cu117/fused/fused.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "train_styleswin.py", line 27, in
from models.discriminator import Discriminator
File "/content/StyleSwin/models/discriminator.py", line 7, in
from op import FusedLeakyReLU, upfirdn2d
File "/content/StyleSwin/op/init.py", line 4, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/StyleSwin/op/fused_act.py", line 19, in
os.path.join(module_path, "fused_bias_act_kernel.cu"),
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1296, in load
keep_intermediates=keep_intermediates)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
ImportError: /root/.cache/torch_extensions/py37_cu117/fused/fused.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "train_styleswin.py", line 27, in
from models.discriminator import Discriminator
File "/content/StyleSwin/models/discriminator.py", line 7, in
from op import FusedLeakyReLU, upfirdn2d
File "/content/StyleSwin/op/init.py", line 4, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/StyleSwin/op/fused_act.py", line 19, in
os.path.join(module_path, "fused_bias_act_kernel.cu"),
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1296, in load
keep_intermediates=keep_intermediates)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
ImportError: /root/.cache/torch_extensions/py37_cu117/fused/fused.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "train_styleswin.py", line 27, in
from models.discriminator import Discriminator
File "/content/StyleSwin/models/discriminator.py", line 7, in
from op import FusedLeakyReLU, upfirdn2d
File "/content/StyleSwin/op/init.py", line 4, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/StyleSwin/op/fused_act.py", line 19, in
os.path.join(module_path, "fused_bias_act_kernel.cu"),
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1296, in load
keep_intermediates=keep_intermediates)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
ImportError: /root/.cache/torch_extensions/py37_cu117/fused/fused.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "train_styleswin.py", line 27, in
from models.discriminator import Discriminator
File "/content/StyleSwin/models/discriminator.py", line 7, in
from op import FusedLeakyReLU, upfirdn2d
File "/content/StyleSwin/op/init.py", line 4, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/StyleSwin/op/fused_act.py", line 19, in
os.path.join(module_path, "fused_bias_act_kernel.cu"),
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1296, in load
keep_intermediates=keep_intermediates)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
ImportError: /root/.cache/torch_extensions/py37_cu117/fused/fused.so: cannot open shared object file: No such file or directory
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 7343) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 195, in
main()
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 191, in main
launch(args)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 176, in launch
run(args)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/run.py", line 756, in run
)(*cmd_args)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launcher/api.py", line 248, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

train_styleswin.py FAILED

Failures:
[1]:
time : 2024-03-23_15:44:50
host : fe0cdf2fa382
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 7344)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2024-03-23_15:44:50
host : fe0cdf2fa382
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 7345)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
time : 2024-03-23_15:44:50
host : fe0cdf2fa382
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 7346)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[4]:
time : 2024-03-23_15:44:50
host : fe0cdf2fa382
rank : 4 (local_rank: 4)
exitcode : 1 (pid: 7347)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[5]:
time : 2024-03-23_15:44:50
host : fe0cdf2fa382
rank : 5 (local_rank: 5)
exitcode : 1 (pid: 7348)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[6]:
time : 2024-03-23_15:44:50
host : fe0cdf2fa382
rank : 6 (local_rank: 6)
exitcode : 1 (pid: 7349)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[7]:
time : 2024-03-23_15:44:50
host : fe0cdf2fa382
rank : 7 (local_rank: 7)
exitcode : 1 (pid: 7350)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2024-03-23_15:44:50
host : fe0cdf2fa382
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 7343)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Why do you replace noise with SPE?

Compared with Stylegan2, I notice that you you replace noise with SPE at the same place. What are the differences between SPE and noise? Can SPE achieve the effect of noise? Seems like SPE is a fixed vector?

Thanks.

Error using ckpt when resuming

Thanks for sharing, I am having this error:

Traceback (most recent call last):
File "train_styleswin.py", line 409, in
generator.load_state_dict(ckpt["g"])
File "/mnt/anaconda3/envs/StyleSwin/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1668, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Generator:
Unexpected key(s) in state_dict: "layers.4.blocks.0.attn_mask2", "layers.4.blocks.0.norm1.style.weight", "layers.4.blocks.0.norm1.style.bias", "layers.4.blocks.0.qkv.weight", "layers.4.blocks.0.qkv.bias", "layers.4.blocks.0.proj.weight", "layers.4 and so on

this is the command I am running:

python -m torch.distributed.launch --nproc_per_node=2 train_styleswin.py --batch 4 --path /mnt/DATASETS/FFHQ --checkpoint_path /mnt/PROCESSEDdata/StyleSwin/Train --sample_path /mnt/PROCESSEDdata/StyleSwin/Train --size 32 --G_channel_multiplier 2 --bcr --D_lr 0.0002 --D_sn --ttur --eval_gt_path /mnt/DATASETS/FFHQ --lr_decay --lr_decay_start_steps 775000 --iter 1000000 --ckpt /mnt/PROCESSEDdata/StyleSwin/FFHQ_1024.pt --use_checkpoint

I tried with and without the use_checkpoint flag and also the 256 version giving back the same error.

Best

abnormal image generation

I don't know if this problem occurred in your build process? Is this due to lack of tRGB module or something else?

How to finetune?

Sorry if this is a silly question, but I wanted ask if you could provide and example of how to fine-tune one of your existing models to a new dataset?

An example from StyleGan3
# Fine-tune StyleGAN3-R for MetFaces-U using 1 GPU, starting from the pre-trained FFHQ-U pickle.
python train.py --outdir=~/training-runs --cfg=stylegan3-r --data=~/datasets/metfacesu-1024x1024.zip \ --gpus=8 --batch=32 --gamma=6.6 --mirror=1 --kimg=5000 --snap=5 \ --resume=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-r-ffhqu-1024x1024.pkl

latent code interpolation

Hi Authors,

Will you be releasing code for latent code interpolation?

Thanks !

is that able to run on cpu?

cvpr2022 call for demos

Hi, there is a call for demos this year for cvpr 2022

https://cvpr2022.thecvf.com/call-demos

where a demo can be added to the Hugging Face organization here: https://huggingface.co/cvpr

would you be interested in submitting a demo for this?

Model Training

Is it possible to train a new model on the FFHQ dataset with an image size of 512x512 from scratch?

Source of FFHQ-256

Which source did you get the FFHQ-256 dataset from? The downsampling method can be a critical factor in FID evaluation. Thanks!

colab

please add a google colab for inference

FID Curve

Great work! However, I use 4 x 3090 GPUs to train StyleSwin on the FFHQ-256 dataset, and evaluate FID on the same dataset. Then get the following FID curve from 0-500k after 4 days.

Is this normal? I might not see the probability of getting FID less than 10 after 1000k. Could you please show your FID curve in this dataset?

Size mismatch

I am using generator with image size: 512, size: 256, style_dim: 512, channel_multiplier: 2. I am getting error:

Can you let me know what should be size config to have the network for image size 512 x 512?

generator.py", line 252, in forward
out = gamma * out + beta
RuntimeError: The size of tensor a (1024) must match the size of tensor b (512) at non-singleton dimension 2

Can styleSwin train like pix2pix?

Hi there,
I was wondering, can styleswin train like pix2pix? I wanna use vit to do img2img task, Thanks

start_dim and style_dim

hi, I am trying to train using --size 256 --start_dim 256 --style_dim 256 but:

File "/mnt/d/Projects/python/StyleSwin/models/basic_layers.py", line 91, in forward
    out = F.linear(input, self.weight * self.scale)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x512 and 256x256)

How can I fix?

Can I get ONNX format for the pre-trained models?

Hello:
I have all kinds of issue to use any models in python to make predication. But I can run ONNX format model in C#.
Is it possible to convert the pre-trained model to ONNX format and published here?
Thanks,

Train and Validation Split

What was the train and validation split used? I'm using the checkpoint provided and testing with a validation set of the top 10k, similar to Co-Mod-GAN's split (section 5.1). Using this split I am getting a FID of 4.26.

Continue training

Hello, when I trained 400000 iterations, the power failure, how to continue training?

question strategy in resolution progression

Congratulations on this great job. I would like to ask you if your training strategy is similar to the StyleGAN resolution progression (e.g. 64x64, then 128x128) Thanks!

nvcc fatal: Unknown option '-generate-dependencies-with-compile'

Hi, I was referring to your github and trying to implement the StyleSwin repo. ecountering the following problem:
nvcc fatal : Unknown option '-generate-dependencies-with-compile', not sure whast the problem.

Training log of losses

Hi,
Thank you for your awesome research!
I am training the model with my own dataset, i want to know the training logs of losses if it is possible.
Discriminator loss seems to converge so fast (close to 0) , is it right?

Best regards,
Hankyu Jang

About FLOPs

When I print the flops of generator, there is an error. I found the 'self.attn' on line 356 is a list. In fact there are 2 attns to be calculated. How can I fix it? When I change 'self.attn' to 'self.attn[0]', the flops shows 68109933824.0 (68B), larger than 50.9B than paper.

Could you tell us how long it took you to train the models, please?

How long did it take to train 256 * 256 a and 1024*1024 CelebA models. How many GPUs and what model are used？

显存问题

请问12g的显存可以运行改代码嘛，如果我将该模型作为一个生成器像stylegan一样。是否可以通过该预训练的模型进行人脸编辑？期待您的回复。