Comments (10)
same with me
2021-05-17 21:15:05,719 - mmgen - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.7 (default, May 7 2020, 21:25:33) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr
NVCC: Cuda compilation tools, release 10.1, V10.1.243
GPU 0,1: GeForce GTX 1080 Ti
GCC: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
PyTorch: 1.6.0
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.1 Product Build 20200208 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 10.1
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
- CuDNN 7.6.3
- Magma 2.5.2
- Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,
TorchVision: 0.7.0
OpenCV: 4.2.0
MMCV: 1.3.4
MMGen: 0.1.0+0ece0cd
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.1
------------------------------------------------------------
2021-05-17 21:15:05,984 - mmgen - INFO - Distributed training: False
2021-05-17 21:15:06,178 - mmgen - INFO - Config:
model = dict(
type='CycleGAN',
generator=dict(
type='ResnetGenerator',
in_channels=3,
out_channels=3,
base_channels=64,
norm_cfg=dict(type='IN'),
use_dropout=False,
num_blocks=9,
padding_mode='reflect',
init_cfg=dict(type='normal', gain=0.02)),
discriminator=dict(
type='PatchDiscriminator',
in_channels=3,
base_channels=64,
num_conv=3,
norm_cfg=dict(type='IN'),
init_cfg=dict(type='normal', gain=0.02)),
gan_loss=dict(
type='GANLoss',
gan_type='lsgan',
real_label_val=1.0,
fake_label_val=0.0,
loss_weight=1.0),
cycle_loss=dict(type='L1Loss', loss_weight=10.0, reduction='mean'),
id_loss=dict(type='L1Loss', loss_weight=0.5, reduction='mean'))
train_cfg = dict(direction='a2b', buffer_size=50)
test_cfg = dict(direction='a2b', show_input=False, test_direction='a2b')
train_dataset_type = 'UnpairedImageDataset'
val_dataset_type = 'UnpairedImageDataset'
img_norm_cfg = dict(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
train_pipeline = [
dict(
type='LoadImageFromFile', io_backend='disk', key='img_a',
flag='color'),
dict(
type='LoadImageFromFile', io_backend='disk', key='img_b',
flag='color'),
dict(
type='Resize',
keys=['img_a', 'img_b'],
scale=(286, 286),
interpolation='bicubic'),
dict(
type='Crop',
keys=['img_a', 'img_b'],
crop_size=(256, 256),
random_crop=True),
dict(type='Flip', keys=['img_a'], direction='horizontal'),
dict(type='Flip', keys=['img_b'], direction='horizontal'),
dict(type='RescaleToZeroOne', keys=['img_a', 'img_b']),
dict(
type='Normalize',
keys=['img_a', 'img_b'],
to_rgb=False,
mean=[0.5, 0.5, 0.5],
std=[0.5, 0.5, 0.5]),
dict(type='ImageToTensor', keys=['img_a', 'img_b']),
dict(
type='Collect',
keys=['img_a', 'img_b'],
meta_keys=['img_a_path', 'img_b_path'])
]
test_pipeline = [
dict(
type='LoadImageFromFile', io_backend='disk', key='img_a',
flag='color'),
dict(
type='LoadImageFromFile', io_backend='disk', key='img_b',
flag='color'),
dict(
type='Resize',
keys=['img_a', 'img_b'],
scale=(256, 256),
interpolation='bicubic'),
dict(type='RescaleToZeroOne', keys=['img_a', 'img_b']),
dict(
type='Normalize',
keys=['img_a', 'img_b'],
to_rgb=False,
mean=[0.5, 0.5, 0.5],
std=[0.5, 0.5, 0.5]),
dict(type='ImageToTensor', keys=['img_a', 'img_b']),
dict(
type='Collect',
keys=['img_a', 'img_b'],
meta_keys=['img_a_path', 'img_b_path'])
]
data_root = None
data = dict(
samples_per_gpu=1,
workers_per_gpu=4,
drop_last=True,
val_samples_per_gpu=1,
val_workers_per_gpu=0,
train=dict(
type='UnpairedImageDataset',
dataroot='./data/horse2zebra',
pipeline=[
dict(
type='LoadImageFromFile',
io_backend='disk',
key='img_a',
flag='color'),
dict(
type='LoadImageFromFile',
io_backend='disk',
key='img_b',
flag='color'),
dict(
type='Resize',
keys=['img_a', 'img_b'],
scale=(286, 286),
interpolation='bicubic'),
dict(
type='Crop',
keys=['img_a', 'img_b'],
crop_size=(256, 256),
random_crop=True),
dict(type='Flip', keys=['img_a'], direction='horizontal'),
dict(type='Flip', keys=['img_b'], direction='horizontal'),
dict(type='RescaleToZeroOne', keys=['img_a', 'img_b']),
dict(
type='Normalize',
keys=['img_a', 'img_b'],
to_rgb=False,
mean=[0.5, 0.5, 0.5],
std=[0.5, 0.5, 0.5]),
dict(type='ImageToTensor', keys=['img_a', 'img_b']),
dict(
type='Collect',
keys=['img_a', 'img_b'],
meta_keys=['img_a_path', 'img_b_path'])
],
test_mode=False),
val=dict(
type='UnpairedImageDataset',
dataroot='./data/horse2zebra',
pipeline=[
dict(
type='LoadImageFromFile',
io_backend='disk',
key='img_a',
flag='color'),
dict(
type='LoadImageFromFile',
io_backend='disk',
key='img_b',
flag='color'),
dict(
type='Resize',
keys=['img_a', 'img_b'],
scale=(256, 256),
interpolation='bicubic'),
dict(type='RescaleToZeroOne', keys=['img_a', 'img_b']),
dict(
type='Normalize',
keys=['img_a', 'img_b'],
to_rgb=False,
mean=[0.5, 0.5, 0.5],
std=[0.5, 0.5, 0.5]),
dict(type='ImageToTensor', keys=['img_a', 'img_b']),
dict(
type='Collect',
keys=['img_a', 'img_b'],
meta_keys=['img_a_path', 'img_b_path'])
],
test_mode=True),
test=dict(
type='UnpairedImageDataset',
dataroot='./data/horse2zebra',
pipeline=[
dict(
type='LoadImageFromFile',
io_backend='disk',
key='img_a',
flag='color'),
dict(
type='LoadImageFromFile',
io_backend='disk',
key='img_b',
flag='color'),
dict(
type='Resize',
keys=['img_a', 'img_b'],
scale=(256, 256),
interpolation='bicubic'),
dict(type='RescaleToZeroOne', keys=['img_a', 'img_b']),
dict(
type='Normalize',
keys=['img_a', 'img_b'],
to_rgb=False,
mean=[0.5, 0.5, 0.5],
std=[0.5, 0.5, 0.5]),
dict(type='ImageToTensor', keys=['img_a', 'img_b']),
dict(
type='Collect',
keys=['img_a', 'img_b'],
meta_keys=['img_a_path', 'img_b_path'])
],
test_mode=True))
checkpoint_config = dict(interval=100, by_epoch=False, save_optimizer=True)
log_config = dict(
interval=100, hooks=[dict(type='TextLoggerHook', by_epoch=False)])
custom_hooks = [
dict(
type='VisualizeUnconditionalSamples',
output_dir='training_samples',
interval=1000)
]
runner = dict(
type='DynamicIterBasedRunner',
is_dynamic_ddp=True,
pass_training_status=True)
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
find_unused_parameters = True
cudnn_benchmark = True
dataroot = './data/horse2zebra'
optimizer = dict(
generators=dict(type='Adam', lr=0.0002, betas=(0.5, 0.999)),
discriminators=dict(type='Adam', lr=0.0002, betas=(0.5, 0.999)))
lr_config = None
total_iters = 80000
exp_name = 'cyclegan_facades_id0'
work_dir = './work_dirs/cyclegan_facades_id0'
metrics = dict(
FID=dict(type='FID', num_images=140, image_shape=(3, 256, 256)),
IS=dict(type='IS', num_images=140, image_shape=(3, 256, 256)))
gpu_ids = range(0, 1)
2021-05-17 21:15:06,178 - mmgen - INFO - Set random seed to 2021, deterministic: False
/home/jay/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/cnn/bricks/conv_module.py:107: UserWarning: ConvModule has norm and bias at the same time
warnings.warn('ConvModule has norm and bias at the same time')
2021-05-17 21:15:08,000 - mmgen - INFO - Start running, host: jay@Tachikoma, work_dir: /home/jay/git/jay-mmgeneration/work_dirs/cyclegan_facades_id0
2021-05-17 21:15:08,000 - mmgen - INFO - workflow: [('train', 1)], max: 80000 iters
Traceback (most recent call last):
File "tools/train.py", line 163, in <module>
main()
File "tools/train.py", line 159, in main
meta=meta)
File "/home/jay/git/jay-mmgeneration/mmgen/apis/train.py", line 196, in train_model
runner.run(data_loaders, cfg.workflow, cfg.total_iters)
File "/home/jay/git/jay-mmgeneration/mmgen/core/runners/dynamic_iterbased_runner.py", line 284, in run
iter_runner(iter_loaders[i], **kwargs)
File "/home/jay/git/jay-mmgeneration/mmgen/core/runners/dynamic_iterbased_runner.py", line 206, in train
kwargs.update(dict(ddp_reducer=self.model.reducer))
File "/home/jay/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 772, in __getattr__
type(self).__name__, name))
torch.nn.modules.module.ModuleAttributeError: 'MMDataParallel' object has no attribute 'reducer'
from mmgeneration.
Hi @xinzhichao and @jayagami , our
MMGeneration
does NOT supportDataparallel
training. If you start the training by directly usingpython train.py xxx
, the PyTorch will automatically run inDataparallel
. Thus, we recommend that all of our users use distributed training by:bash tools/dist_train.sh {CONFIG} {GPU_NUM} --work-dir {WORK_DIR}In addition, after checking the detailed codes, I have just found another bug in the config files for
cyclegan
andpix2pix
models. We will fix it asap but you can quickly fix it by adding this code at the end of the config file:runner = None use_ddp_wrapper = True@plyfager will further follow this issue and fix the bugs.
thanks for replying, I found that cyclegan from mmediting worked for me.
In the future, the image translation model will be removed from MMEditing
and supported in MMGeneration
. We hope that you can switch to MMGeneration
and sorry for the inconvenience.
from mmgeneration.
I use python tools/train.py configs/cyclegan/cyclegan_lsgan_resnet_in_1x1_266800_horse2zebra.py
from mmgeneration.
Hi @xinzhichao and @jayagami , our MMGeneration
does NOT support Dataparallel
training. If you start the training by directly using python train.py xxx
, the PyTorch will automatically run in Dataparallel
. Thus, we recommend that all of our users use distributed training by:
bash tools/dist_train.sh {CONFIG} {GPU_NUM} --work-dir {WORK_DIR}
In addition, after checking the detailed codes, I have just found another bug in the config files for cyclegan
and pix2pix
models. We will fix it asap but you can quickly fix it by adding this code at the end of the config file:
runner = None
use_ddp_wrapper = True
@plyfager will further follow this issue and fix the bugs.
from mmgeneration.
Hi @xinzhichao and @jayagami , our
MMGeneration
does NOT supportDataparallel
training. If you start the training by directly usingpython train.py xxx
, the PyTorch will automatically run inDataparallel
. Thus, we recommend that all of our users use distributed training by:bash tools/dist_train.sh {CONFIG} {GPU_NUM} --work-dir {WORK_DIR}In addition, after checking the detailed codes, I have just found another bug in the config files for
cyclegan
andpix2pix
models. We will fix it asap but you can quickly fix it by adding this code at the end of the config file:runner = None use_ddp_wrapper = True@plyfager will further follow this issue and fix the bugs.
thanks for replying, I found that cyclegan from mmediting worked for me.
from mmgeneration.
Hi @xinzhichao and @jayagami , our
MMGeneration
does NOT supportDataparallel
training. If you start the training by directly usingpython train.py xxx
, the PyTorch will automatically run inDataparallel
. Thus, we recommend that all of our users use distributed training by:bash tools/dist_train.sh {CONFIG} {GPU_NUM} --work-dir {WORK_DIR}In addition, after checking the detailed codes, I have just found another bug in the config files for
cyclegan
andpix2pix
models. We will fix it asap but you can quickly fix it by adding this code at the end of the config file:runner = None use_ddp_wrapper = True@plyfager will further follow this issue and fix the bugs.
thanks for replying, I found that cyclegan from mmediting worked for me.
In the future, the image translation model will be removed from
MMEditing
and supported inMMGeneration
. We hope that you can switch toMMGeneration
and sorry for the inconvenience.
Thanks again, I will take your advice.
from mmgeneration.
Hi @xinzhichao and @jayagami , our
MMGeneration
does NOT supportDataparallel
training. If you start the training by directly usingpython train.py xxx
, the PyTorch will automatically run inDataparallel
. Thus, we recommend that all of our users use distributed training by:bash tools/dist_train.sh {CONFIG} {GPU_NUM} --work-dir {WORK_DIR}In addition, after checking the detailed codes, I have just found another bug in the config files for
cyclegan
andpix2pix
models. We will fix it asap but you can quickly fix it by adding this code at the end of the config file:runner = None use_ddp_wrapper = True@plyfager will further follow this issue and fix the bugs.
thanks for replying, I found that cyclegan from mmediting worked for me.
In the future, the image translation model will be removed from
MMEditing
and supported inMMGeneration
. We hope that you can switch toMMGeneration
and sorry for the inconvenience.
``
Thanks again, I will take your advice.
Sorry for the inconvenience. This bug has been fixed in #38.
from mmgeneration.
@nbei hi, I only use the python tools/train.py config_file
on a single-gpu machine, but I still encounter the above problems. I haven't modified any files yet. The current commit is 3542102
from mmgeneration.
@nbei hi, I only use the
python tools/train.py config_file
on a single-gpu machine, but I still encounter the above problems. I haven't modified any files yet. The current commit is 3542102
We suggest using dist_train.sh
to start your training. You can use the following command to start a single GPU training:
bash dist_train.sh CONFIG 1
from mmgeneration.
@nbei hi, I only use the
python tools/train.py config_file
on a single-gpu machine, but I still encounter the above problems. I haven't modified any files yet. The current commit is 3542102We suggest using
dist_train.sh
to start your training. You can use the following command to start a single GPU training:bash dist_train.sh CONFIG 1
Thank you. Its ok. Let me debug in DDP mode first.
from mmgeneration.
Related Issues (20)
- Latent-Diffusion Models HOT 7
- the Noise Injection method of styleGANv1 is different from StyleGANv2 HOT 2
- RuntimeError: fused_bias_leakyrelu_op_impl: implementation for device cuda:0 not found. HOT 1
- "assert self.is_domain_reachable"
- DDPM training problem HOT 1
- [Bug] Non-distributed Training Error HOT 5
- [Bug] Error with "Positional Encoding as Spatial Inductive Bias in GANs" inference result have artifact HOT 1
- RuntimeError: ModuleNotFoundError: No module named 'matplotlib.blocking_input'
- [Feature] mmgeneration support High dynamic range(HDR) imaging based Deep learning? HOT 1
- Can't work on device cuda:1
- [New Models] Wav2Lip
- [Bug] subprocess.CalledProcessError: Command 'gcc --version | head -n1' returned non-zero exit status 255.
- There is a discrepancy in the results when reproducing the facades in cyclegan, and the FID and IS are inconsistent with the provided ones
- About train by epoch
- How to modify the configuration file when using styleganv2 to generate a single-channel 800x800 SAR image?
- [Feature] How to generate images from my own dataset by DCGAN?
- [Bug] 安装错误 HOT 1
- [New Models] HOT 1
- MMCV版本问题 HOT 1
- in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mmgeneration.