Coder Social home page Coder Social logo

Comments (15)

lucidrains avatar lucidrains commented on August 17, 2024 4

there's actually also an improved version of big sleep out there (called fusedream) https://arxiv.org/abs/2112.01573 in case people don't know

from big-sleep.

wolfgangmeyers avatar wolfgangmeyers commented on August 17, 2024 2

Let me clarify - Big Sleep selects a random starting image (mostly dog images?) and can then use a priming image that guides the image generation. That does seem to help, but I've had odd results. The priming image is similar to the image_prompts param in VQGAN+CLIP that seems to accomplish the same thing:

https://github.com/nerdyrodent/VQGAN-CLIP/blob/main/generate.py#L60

In this case, I think the ask is to be able to take an image and convert it into the BigGAN latent space, similar to the init_image param in VQGAN+CLIP that uses VQGAN's encoder ability:

https://github.com/nerdyrodent/VQGAN-CLIP/blob/main/generate.py#L64

The expectation would be that you could do could reconstruct an input image - take an input image, convert it to latents, and then convert that back into an image that is similar enough to the original to be recognizable. Artbreeder does something similar with Stylegan, I think.

The big benefit to this is that you can provide a concrete starting point for image generation, and it's possible to "resume" image generation from a completely different stack like VQGAN+CLIP, Glide, DALL-E, etc. I have a PR up that allows resuming from another Big Sleep generated image - this was very handy in being able to try many variations and pick the best one (looks like I need to fix merge conflicts).

from big-sleep.

wolfgangmeyers avatar wolfgangmeyers commented on August 17, 2024 1

@lucidrains I think the VQGAN+CLIP implementation from @crowsonkb generally yields higher quality results, but the results from Big Sleep have more variety. This feature would allow the two to be combined (and I could add Big Sleep as an alternative engine for https://github.com/wolfgangmeyers/aibrush-2).

from big-sleep.

lucidrains avatar lucidrains commented on August 17, 2024 1

@wolfgangmeyers ohh got it, that is interesting (re: variety)

ok, maybe i'll hack this out when i have some spare time, shouldn't take too long

from big-sleep.

illtellyoulater avatar illtellyoulater commented on August 17, 2024 1

In this case, I think the ask is to be able to take an image and convert it into the BigGAN latent space, similar to the init_image param in VQGAN+CLIP that uses VQGAN's encoder ability: https://github.com/nerdyrodent/VQGAN-CLIP/blob/main/generate.py#L64

@lucidrains: I confirm my request is exactly what @wolfgangmeyers said in the above period.

This feature would allow the two to be combined (and I could add Big Sleep as an alternative engine for https://github.com/wolfgangmeyers/aibrush-2).

@wolfgangmeyers this would be very cool.

it'll require what they call projection in GAN literature, something like http://people.csail.mit.edu/minhuh/papers/pix2latent/arxiv_v2.pdf
@wolfgangmeyers and yes you are right, an encoder would help here, but it needs to be trained concurrently with the GAN, something which BigGAN does not have

@lucidrains this is getting a bit too technical for me so from now on I will leave space to you guys. I just hope one way or another it can be done :) I hope that when needed I will still be able to help with feedbacks and the alike.

@lucidrains I think the VQGAN+CLIP implementation from @crowsonkb generally yields higher quality results, but the results from Big Sleep have more variety.

Agreeing with @wolfgangmeyers here, VQGAN+CLIP results looks better, but Big Sleep has more variety. Also, as pointed out by @lucidrains, there's a certain surreal quality in Big Sleep which keeps me fascinated.

there's actually also an improved version of big sleep out there (called fusedream) https://arxiv.org/abs/2112.01573 in case people don't know

@lucidrains I didn't know this, I will try it out!

from big-sleep.

wolfgangmeyers avatar wolfgangmeyers commented on August 17, 2024

I'd like to say here that this sounds incredibly useful. It would allow someone to pass an image back and forth between VQGAN+CLIP setup and Big Sleep. For this I think it would need an encoder, and this one might work - https://github.com/disanda/MTV-TSA

from big-sleep.

lucidrains avatar lucidrains commented on August 17, 2024

@illtellyoulater yea sure, if you can give me a sample image and prompt you'd like me to test on, i can build it

from big-sleep.

lucidrains avatar lucidrains commented on August 17, 2024

@illtellyoulater don't you think Big Sleep is a little outdated these days? with the awesome work being done by @crowsonkb ?

from big-sleep.

lucidrains avatar lucidrains commented on August 17, 2024

@illtellyoulater or maybe there's some surreal quality to Big Sleep generations that is still useful for your work? do let me know since I am curious!

from big-sleep.

lucidrains avatar lucidrains commented on August 17, 2024

oops, closed by accident

from big-sleep.

wolfgangmeyers avatar wolfgangmeyers commented on August 17, 2024

Fusedream looks awesome. Do you think the MTV-TSA project would work for this and/or fusedream for encoding an init image to latents?

from big-sleep.

lucidrains avatar lucidrains commented on August 17, 2024

@wolfgangmeyers haha, i'm not familiar with MTV-TSA at all

btw, i'm not quite sure i understand the need for an encoder in your previous comment

@illtellyoulater was your desired feature basically this? https://github.com/lucidrains/deep-daze#priming

from big-sleep.

wolfgangmeyers avatar wolfgangmeyers commented on August 17, 2024

Looks like they only support 256x256 for BigGAN though.

from big-sleep.

lucidrains avatar lucidrains commented on August 17, 2024

ohh got it! sorry my bad, this won't be easy then

it'll require what they call projection in GAN literature, something like http://people.csail.mit.edu/minhuh/papers/pix2latent/arxiv_v2.pdf

from big-sleep.

lucidrains avatar lucidrains commented on August 17, 2024

@wolfgangmeyers and yes you are right, an encoder would help here, but it needs to be trained concurrently with the GAN, something which BigGAN does not have

from big-sleep.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.