Coder Social home page Coder Social logo

Comments (11)

yuval-alaluf avatar yuval-alaluf commented on May 27, 2024 4

Right. Seems like the

res = np.array(input_image.resize((1024, 1024)))

was indeed needed so good catch!

And thanks for the feedback! It's awesome to hear what people think!

from pixel2style2pixel.

yuval-alaluf avatar yuval-alaluf commented on May 27, 2024 2

For all our experiments on image resolutions of 256x256 we use a single P40 with 22GB with a batch size of 8.

If you wish to work with input image resolutions of 1024x1024, you would need to reduce the batch size substantially or use a GPU with a larger RAM. However, please note that although our inputs are of size 256x256, the FFHQ StyleGAN model we use generates outputs of size 1024x1024. Therefore, working with inputs of size 256x256 may work well for your needs.

If none of these options are feasible, you may need to use multiple GPUs. Note that although our code currently supports only a single GPU, it should be relatively simple to add multi-GPU support using the DataParallel from torch.

We hope to add this support soon (it is first on our TODO), but you are welcome to try adding this support and open a pull request.

from pixel2style2pixel.

yuval-alaluf avatar yuval-alaluf commented on May 27, 2024 1

Additionally, if the FPN model (GradualStyleEncoder) requires too many resources, we also provide the Naive W+ model that was mentioned in the paper. This can be done by specifying BackboneEncoderUsingLastLayerIntoWPlus as your encoder type.

This model is smaller and faster to train than the default FPN model and may suite your needs.

from pixel2style2pixel.

yuval-alaluf avatar yuval-alaluf commented on May 27, 2024 1

Happy to help!
It should also be possible to do the same in style mixing.
First, make sure you call the forward function with resize=False in line 78, which will give you the 1024x1024 outputs.

As you mentioned, in line 88 we can change

res = np.array(input_image.resize((256, 256)))

to

res = np.array(input_image))

assuming your original input is of size 1024x1024.
Similarly, line 91 can be changed to:

res = np.concatenate([res, np.array(output))], axis=1)

Now, res should be the concatenated results of the input and multi-modal outputs, each of size 1024x1024.

from pixel2style2pixel.

pnbao avatar pnbao commented on May 27, 2024

Thank you for clarifying that!! (y)

from pixel2style2pixel.

nathanshipley avatar nathanshipley commented on May 27, 2024

However, please note that although our inputs are of size 256x256, the FFHQ StyleGAN model we use generates outputs of size 1024x1024. Therefore, working with inputs of size 256x256 may work well for your needs.

Thanks so much, @yuval-alaluf . How does one actually go about generating FFHQ inference output images at 1024 from psp_ffhq_encode.pt? I don't see anything obvious in the command line flags for inference.py.

I've tried modifying line 34 in transforms_config.py to no avail:

			'transform_inference': transforms.Compose([
				transforms.Resize((256, 256)),

Changing resize((256,256) on line 96 in inference.py to:
Image.fromarray(np.array(result.resize((1024, 1024)))).save(im_save_path)
just seems to be scaling up the 256 image.

Any guidance?

Really appreciate all the great work you guys have done!

from pixel2style2pixel.

yuval-alaluf avatar yuval-alaluf commented on May 27, 2024

Hi @nathanshipley ,
I reopened the issue to make it easier to follow :)
Regarding your question:

First, note that the transform_inference represents the input resolution, which should be 256 if you are using our pretrained models.

What you're interested in, if I understand correctly, is saving the outputs of the model as images of size 1024x1024. First, note that we use the original FFHQ StyleGAN model that outputs images of 1024x1024. However, during training we resize our outputs to 256x256.

During inference, however, you can still save the 1024x1024 images. To do so, please refer to psp.py and on line 66 you will see an argument resize, which is true by default and will down-sample the generated image to 256. Therefore, you can simply call the forward function and set resize=False and you should obtain the original 1024x1024 images.

Then, in inference.py, you can change line 96 to:

Image.fromarray(np.array(result)).save(im_save_path)

and you should be able to save the images in full resolution.

Let me know if this helps!

from pixel2style2pixel.

nathanshipley avatar nathanshipley commented on May 27, 2024

That works beautifully! Thank you for the quick reply. It's great to see the output at 1024x1024. Is there an equivalent to change the output of style_mixing.py to also be 1024x1024? Tried changing lines 88 and 91 there to no avail.

Also - appreciate the clarification on what transform_inference means.

from pixel2style2pixel.

nathanshipley avatar nathanshipley commented on May 27, 2024

Got it. My original input images are indeed 1024x1024. However, just changing those two lines to remove .resize results in an error where res and np.array(output) don't concatenate:

  File "scripts/style_mixing.py", line 91, in run
    res = np.concatenate([res, np.array(output)], axis=1)
  File "<__array_function__ internals>", line 6, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 256 and the array at index 1 has size 1024

This seems to be because DataLoader(dataset, ...) on line 51 is scaling the images down to 256 for inference as set in transforms_config.py.

However! It works if I change line 91 like you suggest and then change line 88 to:

res = np.array(input_image.resize((1024, 1024)))

I suppose this lets the images be 256 for inference to match the training resolution but then just scales the input_image back up to 1024 so dimensions match for concatenation with the style mixed output images. I think? Regardless, it works and they look great!

Again, I really appreciate you taking the time. It's awesome how fast pSp generates images! Have been used to waiting ages for regular StyleGAN projector.

from pixel2style2pixel.

07hyx06 avatar 07hyx06 commented on May 27, 2024

For all our experiments on image resolutions of 256x256 we use a single P40 with 22GB with a batch size of 8.

If you wish to work with input image resolutions of 1024x1024, you would need to reduce the batch size substantially or use a GPU with a larger RAM. However, please note that although our inputs are of size 256x256, the FFHQ StyleGAN model we use generates outputs of size 1024x1024. Therefore, working with inputs of size 256x256 may work well for your needs.

If none of these options are feasible, you may need to use multiple GPUs. Note that although our code currently supports only a single GPU, it should be relatively simple to add multi-GPU support using the DataParallel from torch.

We hope to add this support soon (it is first on our TODO), but you are welcome to try adding this support and open a pull request.

@yuval-alaluf Hi! Thanks for your code and I plan to do some experiments with it. I wonder how long did you take for training the encoder for the [StyleGAN trained on FFHQ]?

from pixel2style2pixel.

yuval-alaluf avatar yuval-alaluf commented on May 27, 2024

@yuval-alaluf Hi! Thanks for your code and I plan to do some experiments with it. I wonder how long did you take for training the encoder for the [StyleGAN trained on FFHQ]?

Hi @07hyx06, I don't remember how many days it took us to train the encoder, but we ended up training for 300,000 steps on a batch size of 8.

from pixel2style2pixel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.