Coder Social home page Coder Social logo

Comments (10)

jayagami avatar jayagami commented on June 29, 2024 1

same with me

2021-05-17 21:15:05,719 - mmgen - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.7 (default, May  7 2020, 21:25:33) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr
NVCC: Cuda compilation tools, release 10.1, V10.1.243
GPU 0,1: GeForce GTX 1080 Ti
GCC: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
PyTorch: 1.6.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.1 Product Build 20200208 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.3
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, 

TorchVision: 0.7.0
OpenCV: 4.2.0
MMCV: 1.3.4
MMGen: 0.1.0+0ece0cd
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.1
------------------------------------------------------------

2021-05-17 21:15:05,984 - mmgen - INFO - Distributed training: False
2021-05-17 21:15:06,178 - mmgen - INFO - Config:
model = dict(
    type='CycleGAN',
    generator=dict(
        type='ResnetGenerator',
        in_channels=3,
        out_channels=3,
        base_channels=64,
        norm_cfg=dict(type='IN'),
        use_dropout=False,
        num_blocks=9,
        padding_mode='reflect',
        init_cfg=dict(type='normal', gain=0.02)),
    discriminator=dict(
        type='PatchDiscriminator',
        in_channels=3,
        base_channels=64,
        num_conv=3,
        norm_cfg=dict(type='IN'),
        init_cfg=dict(type='normal', gain=0.02)),
    gan_loss=dict(
        type='GANLoss',
        gan_type='lsgan',
        real_label_val=1.0,
        fake_label_val=0.0,
        loss_weight=1.0),
    cycle_loss=dict(type='L1Loss', loss_weight=10.0, reduction='mean'),
    id_loss=dict(type='L1Loss', loss_weight=0.5, reduction='mean'))
train_cfg = dict(direction='a2b', buffer_size=50)
test_cfg = dict(direction='a2b', show_input=False, test_direction='a2b')
train_dataset_type = 'UnpairedImageDataset'
val_dataset_type = 'UnpairedImageDataset'
img_norm_cfg = dict(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
train_pipeline = [
    dict(
        type='LoadImageFromFile', io_backend='disk', key='img_a',
        flag='color'),
    dict(
        type='LoadImageFromFile', io_backend='disk', key='img_b',
        flag='color'),
    dict(
        type='Resize',
        keys=['img_a', 'img_b'],
        scale=(286, 286),
        interpolation='bicubic'),
    dict(
        type='Crop',
        keys=['img_a', 'img_b'],
        crop_size=(256, 256),
        random_crop=True),
    dict(type='Flip', keys=['img_a'], direction='horizontal'),
    dict(type='Flip', keys=['img_b'], direction='horizontal'),
    dict(type='RescaleToZeroOne', keys=['img_a', 'img_b']),
    dict(
        type='Normalize',
        keys=['img_a', 'img_b'],
        to_rgb=False,
        mean=[0.5, 0.5, 0.5],
        std=[0.5, 0.5, 0.5]),
    dict(type='ImageToTensor', keys=['img_a', 'img_b']),
    dict(
        type='Collect',
        keys=['img_a', 'img_b'],
        meta_keys=['img_a_path', 'img_b_path'])
]
test_pipeline = [
    dict(
        type='LoadImageFromFile', io_backend='disk', key='img_a',
        flag='color'),
    dict(
        type='LoadImageFromFile', io_backend='disk', key='img_b',
        flag='color'),
    dict(
        type='Resize',
        keys=['img_a', 'img_b'],
        scale=(256, 256),
        interpolation='bicubic'),
    dict(type='RescaleToZeroOne', keys=['img_a', 'img_b']),
    dict(
        type='Normalize',
        keys=['img_a', 'img_b'],
        to_rgb=False,
        mean=[0.5, 0.5, 0.5],
        std=[0.5, 0.5, 0.5]),
    dict(type='ImageToTensor', keys=['img_a', 'img_b']),
    dict(
        type='Collect',
        keys=['img_a', 'img_b'],
        meta_keys=['img_a_path', 'img_b_path'])
]
data_root = None
data = dict(
    samples_per_gpu=1,
    workers_per_gpu=4,
    drop_last=True,
    val_samples_per_gpu=1,
    val_workers_per_gpu=0,
    train=dict(
        type='UnpairedImageDataset',
        dataroot='./data/horse2zebra',
        pipeline=[
            dict(
                type='LoadImageFromFile',
                io_backend='disk',
                key='img_a',
                flag='color'),
            dict(
                type='LoadImageFromFile',
                io_backend='disk',
                key='img_b',
                flag='color'),
            dict(
                type='Resize',
                keys=['img_a', 'img_b'],
                scale=(286, 286),
                interpolation='bicubic'),
            dict(
                type='Crop',
                keys=['img_a', 'img_b'],
                crop_size=(256, 256),
                random_crop=True),
            dict(type='Flip', keys=['img_a'], direction='horizontal'),
            dict(type='Flip', keys=['img_b'], direction='horizontal'),
            dict(type='RescaleToZeroOne', keys=['img_a', 'img_b']),
            dict(
                type='Normalize',
                keys=['img_a', 'img_b'],
                to_rgb=False,
                mean=[0.5, 0.5, 0.5],
                std=[0.5, 0.5, 0.5]),
            dict(type='ImageToTensor', keys=['img_a', 'img_b']),
            dict(
                type='Collect',
                keys=['img_a', 'img_b'],
                meta_keys=['img_a_path', 'img_b_path'])
        ],
        test_mode=False),
    val=dict(
        type='UnpairedImageDataset',
        dataroot='./data/horse2zebra',
        pipeline=[
            dict(
                type='LoadImageFromFile',
                io_backend='disk',
                key='img_a',
                flag='color'),
            dict(
                type='LoadImageFromFile',
                io_backend='disk',
                key='img_b',
                flag='color'),
            dict(
                type='Resize',
                keys=['img_a', 'img_b'],
                scale=(256, 256),
                interpolation='bicubic'),
            dict(type='RescaleToZeroOne', keys=['img_a', 'img_b']),
            dict(
                type='Normalize',
                keys=['img_a', 'img_b'],
                to_rgb=False,
                mean=[0.5, 0.5, 0.5],
                std=[0.5, 0.5, 0.5]),
            dict(type='ImageToTensor', keys=['img_a', 'img_b']),
            dict(
                type='Collect',
                keys=['img_a', 'img_b'],
                meta_keys=['img_a_path', 'img_b_path'])
        ],
        test_mode=True),
    test=dict(
        type='UnpairedImageDataset',
        dataroot='./data/horse2zebra',
        pipeline=[
            dict(
                type='LoadImageFromFile',
                io_backend='disk',
                key='img_a',
                flag='color'),
            dict(
                type='LoadImageFromFile',
                io_backend='disk',
                key='img_b',
                flag='color'),
            dict(
                type='Resize',
                keys=['img_a', 'img_b'],
                scale=(256, 256),
                interpolation='bicubic'),
            dict(type='RescaleToZeroOne', keys=['img_a', 'img_b']),
            dict(
                type='Normalize',
                keys=['img_a', 'img_b'],
                to_rgb=False,
                mean=[0.5, 0.5, 0.5],
                std=[0.5, 0.5, 0.5]),
            dict(type='ImageToTensor', keys=['img_a', 'img_b']),
            dict(
                type='Collect',
                keys=['img_a', 'img_b'],
                meta_keys=['img_a_path', 'img_b_path'])
        ],
        test_mode=True))
checkpoint_config = dict(interval=100, by_epoch=False, save_optimizer=True)
log_config = dict(
    interval=100, hooks=[dict(type='TextLoggerHook', by_epoch=False)])
custom_hooks = [
    dict(
        type='VisualizeUnconditionalSamples',
        output_dir='training_samples',
        interval=1000)
]
runner = dict(
    type='DynamicIterBasedRunner',
    is_dynamic_ddp=True,
    pass_training_status=True)
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
find_unused_parameters = True
cudnn_benchmark = True
dataroot = './data/horse2zebra'
optimizer = dict(
    generators=dict(type='Adam', lr=0.0002, betas=(0.5, 0.999)),
    discriminators=dict(type='Adam', lr=0.0002, betas=(0.5, 0.999)))
lr_config = None
total_iters = 80000
exp_name = 'cyclegan_facades_id0'
work_dir = './work_dirs/cyclegan_facades_id0'
metrics = dict(
    FID=dict(type='FID', num_images=140, image_shape=(3, 256, 256)),
    IS=dict(type='IS', num_images=140, image_shape=(3, 256, 256)))
gpu_ids = range(0, 1)

2021-05-17 21:15:06,178 - mmgen - INFO - Set random seed to 2021, deterministic: False
/home/jay/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/cnn/bricks/conv_module.py:107: UserWarning: ConvModule has norm and bias at the same time
  warnings.warn('ConvModule has norm and bias at the same time')
2021-05-17 21:15:08,000 - mmgen - INFO - Start running, host: jay@Tachikoma, work_dir: /home/jay/git/jay-mmgeneration/work_dirs/cyclegan_facades_id0
2021-05-17 21:15:08,000 - mmgen - INFO - workflow: [('train', 1)], max: 80000 iters
Traceback (most recent call last):
  File "tools/train.py", line 163, in <module>
    main()
  File "tools/train.py", line 159, in main
    meta=meta)
  File "/home/jay/git/jay-mmgeneration/mmgen/apis/train.py", line 196, in train_model
    runner.run(data_loaders, cfg.workflow, cfg.total_iters)
  File "/home/jay/git/jay-mmgeneration/mmgen/core/runners/dynamic_iterbased_runner.py", line 284, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "/home/jay/git/jay-mmgeneration/mmgen/core/runners/dynamic_iterbased_runner.py", line 206, in train
    kwargs.update(dict(ddp_reducer=self.model.reducer))
  File "/home/jay/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 772, in __getattr__
    type(self).__name__, name))
torch.nn.modules.module.ModuleAttributeError: 'MMDataParallel' object has no attribute 'reducer'

from mmgeneration.

nbei avatar nbei commented on June 29, 2024 1

Hi @xinzhichao and @jayagami , our MMGeneration does NOT support Dataparallel training. If you start the training by directly using python train.py xxx, the PyTorch will automatically run in Dataparallel. Thus, we recommend that all of our users use distributed training by:

bash tools/dist_train.sh {CONFIG} {GPU_NUM} --work-dir {WORK_DIR}

In addition, after checking the detailed codes, I have just found another bug in the config files for cyclegan and pix2pix models. We will fix it asap but you can quickly fix it by adding this code at the end of the config file:

runner = None
use_ddp_wrapper = True

@plyfager will further follow this issue and fix the bugs.

thanks for replying, I found that cyclegan from mmediting worked for me.

In the future, the image translation model will be removed from MMEditing and supported in MMGeneration. We hope that you can switch to MMGeneration and sorry for the inconvenience.

from mmgeneration.

xinzhichao avatar xinzhichao commented on June 29, 2024

I use python tools/train.py configs/cyclegan/cyclegan_lsgan_resnet_in_1x1_266800_horse2zebra.py

from mmgeneration.

nbei avatar nbei commented on June 29, 2024

Hi @xinzhichao and @jayagami , our MMGeneration does NOT support Dataparallel training. If you start the training by directly using python train.py xxx, the PyTorch will automatically run in Dataparallel. Thus, we recommend that all of our users use distributed training by:

bash tools/dist_train.sh {CONFIG} {GPU_NUM} --work-dir {WORK_DIR}

In addition, after checking the detailed codes, I have just found another bug in the config files for cyclegan and pix2pix models. We will fix it asap but you can quickly fix it by adding this code at the end of the config file:

runner = None
use_ddp_wrapper = True

@plyfager will further follow this issue and fix the bugs.

from mmgeneration.

jayagami avatar jayagami commented on June 29, 2024

Hi @xinzhichao and @jayagami , our MMGeneration does NOT support Dataparallel training. If you start the training by directly using python train.py xxx, the PyTorch will automatically run in Dataparallel. Thus, we recommend that all of our users use distributed training by:

bash tools/dist_train.sh {CONFIG} {GPU_NUM} --work-dir {WORK_DIR}

In addition, after checking the detailed codes, I have just found another bug in the config files for cyclegan and pix2pix models. We will fix it asap but you can quickly fix it by adding this code at the end of the config file:

runner = None
use_ddp_wrapper = True

@plyfager will further follow this issue and fix the bugs.

thanks for replying, I found that cyclegan from mmediting worked for me.

from mmgeneration.

jayagami avatar jayagami commented on June 29, 2024

Hi @xinzhichao and @jayagami , our MMGeneration does NOT support Dataparallel training. If you start the training by directly using python train.py xxx, the PyTorch will automatically run in Dataparallel. Thus, we recommend that all of our users use distributed training by:

bash tools/dist_train.sh {CONFIG} {GPU_NUM} --work-dir {WORK_DIR}

In addition, after checking the detailed codes, I have just found another bug in the config files for cyclegan and pix2pix models. We will fix it asap but you can quickly fix it by adding this code at the end of the config file:

runner = None
use_ddp_wrapper = True

@plyfager will further follow this issue and fix the bugs.

thanks for replying, I found that cyclegan from mmediting worked for me.

In the future, the image translation model will be removed from MMEditing and supported in MMGeneration. We hope that you can switch to MMGeneration and sorry for the inconvenience.

Thanks again, I will take your advice.

from mmgeneration.

plyfager avatar plyfager commented on June 29, 2024

Hi @xinzhichao and @jayagami , our MMGeneration does NOT support Dataparallel training. If you start the training by directly using python train.py xxx, the PyTorch will automatically run in Dataparallel. Thus, we recommend that all of our users use distributed training by:

bash tools/dist_train.sh {CONFIG} {GPU_NUM} --work-dir {WORK_DIR}

In addition, after checking the detailed codes, I have just found another bug in the config files for cyclegan and pix2pix models. We will fix it asap but you can quickly fix it by adding this code at the end of the config file:

runner = None
use_ddp_wrapper = True

@plyfager will further follow this issue and fix the bugs.

thanks for replying, I found that cyclegan from mmediting worked for me.

In the future, the image translation model will be removed from MMEditing and supported in MMGeneration. We hope that you can switch to MMGeneration and sorry for the inconvenience.
``
Thanks again, I will take your advice.

Sorry for the inconvenience. This bug has been fixed in #38.

from mmgeneration.

zhjw0927 avatar zhjw0927 commented on June 29, 2024

@nbei hi, I only use the python tools/train.py config_file on a single-gpu machine, but I still encounter the above problems. I haven't modified any files yet. The current commit is 3542102

from mmgeneration.

LeoXing1996 avatar LeoXing1996 commented on June 29, 2024

@nbei hi, I only use the python tools/train.py config_file on a single-gpu machine, but I still encounter the above problems. I haven't modified any files yet. The current commit is 3542102

We suggest using dist_train.sh to start your training. You can use the following command to start a single GPU training:

bash dist_train.sh CONFIG 1

from mmgeneration.

zhjw0927 avatar zhjw0927 commented on June 29, 2024

@nbei hi, I only use the python tools/train.py config_file on a single-gpu machine, but I still encounter the above problems. I haven't modified any files yet. The current commit is 3542102

We suggest using dist_train.sh to start your training. You can use the following command to start a single GPU training:

bash dist_train.sh CONFIG 1

Thank you. Its ok. Let me debug in DDP mode first.

from mmgeneration.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.