Coder Social home page Coder Social logo

Comments (5)

jerryli27 avatar jerryli27 commented on August 23, 2024 1

I added two lines to the training script. It should work now.
--gradient_penalty_lambda=0.25 --use_unet=True

The whole script now looks like:

python pggan_runner.py
--program_name=twingan
--dataset_name="image_only"
# Assume you have data like 
# ./data/celeba/train-00000-of-00100.tfrecord,  
# ./data/celeba/train-00001-of-00100.tfrecord ...
--dataset_dir="./data/celeba/"
--unpaired_target_dataset_name="anime_faces"
--unpaired_target_dataset_dir="./data/anime_faces/"
--train_dir="./checkpoints/twingan_faces/"
--dataset_split_name=train
--preprocessing_name="danbooru"
--resize_mode=RESHAPE
--do_random_cropping=True
--learning_rate=0.0001
--learning_rate_decay_type=fixed
--is_training=True
--generator_network="pggan"
--use_unet=True
--num_images_per_resolution=300000
--loss_architecture=dragan
--gradient_penalty_lambda=0.25
--pggan_max_num_channels=256
--generator_norm_type=batch_renorm
--hw_to_batch_size="{4: 8, 8: 8, 16: 8, 32: 8, 64: 8, 128: 4, 256: 3, 512: 2}"

I haven't tested with the multi-gpu setting thoroughly yet due to limits in hardware, so yes there may be some bug, but you can try to add the following flags.

--sync_replicas=False
--replicas_to_aggregate=1
--num_clones=2
--worker_replicas=1

I updated the training readme with the comments above.

from twingan.

jerryli27 avatar jerryli27 commented on August 23, 2024 1

Hi @lionel3 I updated the training documentation. There was indeed a bug in my default parameters. After fixing that I am able to reproduce my previous results.

Please sync to the latest version and see https://github.com/jerryli27/TwinGAN/blob/master/docs/training.md .

The parameters I added are:

--do_pixel_norm=True
--l_content_weight=0.1
--l_cycle_weight=1.0

Please reopen this issue if you cannot reproduce. Thanks!

from twingan.

jerryli27 avatar jerryli27 commented on August 23, 2024

Yes you're right. Sorry for the wrong documentation. I'll push a newer version shortly.

  1. The num_image_per_resolution I used was '300000'. Of course 600000 should also work, but it takes longer to train.
  2. Please change to --resize_mode=RESHAPE.

FYI. The --do_random_cropping=True is in case You can try RANDOM_CROP as well if at inference time the quality is too bad because the face is not at the center of the image.

I am rerunning the exact code that I provided in the training example code. It will take a day or two for me to verify that it works.

from twingan.

lionel3 avatar lionel3 commented on August 23, 2024

Thanks for your answer.

Besides, when training with
'hw_to_batch_size', '{4: 16, 8: 16, 16: 16, 32: 16, 64: 12, 128: 12, 256: 12, 512: 6}.
I got ResourceExhaustedError: OOM when allocating tensor with ... during fade-in phase from resolution 128 to 256. Same error when trying 2 GPUs.
I am not familiar with Tensorflow. I guess there may be some bug with Multi-GPU training.

I will try to reproduce the error and show more training details once I have idle GPU.

from twingan.

lionel3 avatar lionel3 commented on August 23, 2024

Thanks, I will try it out asap.

from twingan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.