Coder Social home page Coder Social logo

A few observations about big-sleep HOT 7 OPEN

lucidrains avatar lucidrains commented on August 14, 2024
A few observations

from big-sleep.

Comments (7)

Mut1nyJD avatar Mut1nyJD commented on August 14, 2024 8

So I made a little video to show some of the experimentation of different learning rates and cutouts. And I am more and more convinced that the default for both is too high.

https://www.youtube.com/watch?v=0apLPHoUy3c

All use the same seed and the same phrase "a sailing boat in the sea" running for 2000 iterations, each frame represents results after 5 iterations. As you can see most of the time halfway it sort of stabilizes and the changes become minimal at nearly all learning rates except for higher ones. This tells me that more than 1000 iterations seems to be a total waste.

Also by far the worst results are with 256 cutouts. I think the sweat spot seems somewhere between 32 and 64.

from big-sleep.

enricoros avatar enricoros commented on August 14, 2024

@Mut1nyJD great questions. I've experimented a bit with this project and found that for good artistic control, a human-in-the-loop approach is the best*. To answer some parts of your questions:

  1. the steps performed are = iterationsepochs. The epoch variable seems not to be used; even trying 10 epochs * 10 iterations, or 1 epoch * 100 iterations, I get the same output. For greater quality (assuming convergence), you want to run the algo for long; however, I've seen great results even in just 500 steps. It's a parameter anyway, so very configurable, or you can stop the command-line executable when you want

  2. does it depend on the input text? Sometimes you can get good results with a low rate, sometimes with a high rate. I wonder how would an outsider frame the "learning rate", and how would we pick a default value for the community?

  3. I think it defaults to a photo (real world) render, so I would omit "a photo of" (maybe the picture itself will contain "a photo of..."). I love the output of "An illustration of..", which seems to be good. Also, try "x made of y". Sometimes I even use the DALL-E strategy of repeating the text multiple times (see the DALL-E web page to understand what I mean). If there are prompts that work well for you, please share.

from big-sleep.

Mut1nyJD avatar Mut1nyJD commented on August 14, 2024

@enricoros

On 1 yes having Epochs really makes no sense. I noticed high number of iterations usually do nothing for me. If you use progression output I see stable results between 50-400 iterations and then it tends to become weird for me most of the time. It is like it suddenly flips and then it goes down a new alley. Maybe have to monitor the loss and see if there is a correlation.

  1. Could be haven't really investigated that deeply what I do notice though it is seems to be heavily biased towards dogs. It nearly always starts off with a dog in the image. I wonder if that's because it is one of the biggest classes in ImageNet and the highly unbalanced distribution of classes in ImageNet is too blame here.

  2. Hmm I was assuming that CLIP always would basically classify something with a photo of but yes maybe you are right and having photo in might cause confusion. I do notice it is very very responsive to colors though once you have a color token in your input text it really really focuses on that. It starts to become the most dominant feature.

from big-sleep.

walmsley avatar walmsley commented on August 14, 2024

@lucidrains
Seems like people agree with @Mut1nyJD ... can the num_cutouts default be permanently set to to 32? Or do we not want to change too many hyperparameters lest we break someone's existing workflow? I can make a PR if needed, if this is deemed an appropriate thing to do.

from big-sleep.

htoyryla avatar htoyryla commented on August 14, 2024
  1. Could be haven't really investigated that deeply what I do notice though it is seems to be heavily biased towards dogs. It nearly always starts off with a dog in the image. I wonder if that's because it is one of the biggest classes in ImageNet and the highly unbalanced distribution of classes in ImageNet is too blame here.

I think BigGAN was meant to be used to produce images mainly in one class at a time. Perhaps even mixing a few classes. Anyway, all training examples belong to a single class, if I am not mistaken. In other words, the training samples occupy rather small but isolated areas in the 1000-dimensional class space, along a single axis away from the origin.

What happens then if we activate all classes randomly and normalize the class vector, our class vectors will all be clustered very close to the origin. I guess that, given the absence of training examples in that area, it is simply a byproduct of training that there are dogs there.

This changes by the way when we limit the number of active classes, which we can do with the option max_classes. Then we will get class vectors further away from the origin, and consequently more variation as to the initial images we get. Just try it.

PS. Ryan Murdock did mention in Twitter the idea of dropping the use of the one-hot class vector altogether and using the 128 element embedding instead. I am talking about skipping the first of these two lines here altogether

embed = self.embeddings(class_label)
cond_vector = torch.cat((z, embed), dim=1)

Instead of the 1000 element one-hot vector for the class, one would use a 128 element embedding. I did try it quickly, but using it well would require some work as to how to initialise and bound it properly. So far I think using max_classes already does a good job.

from big-sleep.

LtqxWYEG avatar LtqxWYEG commented on August 14, 2024

I'm having a bit of a conundrum here. It seems class BigSleep of /big_sleep/big_sleep.py is not executed, making changing the value of num_cutouts irrelevant. ... neither does deleting the FILE!?
What am I doing wrong here? I'm using the notebook. Intentionally produced error messages point to the file /usr/local/lib/python3.7/dist-packages/big_sleep/big_sleep.py, which I deleted, and it still works ... what?!

Edit: I needed to restart the runtime... Oops

from big-sleep.

gregloryus avatar gregloryus commented on August 14, 2024

So I made a little video to show some of the experimentation of different learning rates and cutouts. And I am more and more convinced that the default for both is too high.

https://www.youtube.com/watch?v=0apLPHoUy3c

All use the same seed and the same phrase "a sailing boat in the sea" running for 2000 iterations, each frame represents results after 5 iterations. As you can see most of the time halfway it sort of stabilizes and the changes become minimal at nearly all learning rates except for higher ones. This tells me that more than 1000 iterations seems to be a total waste.

Also by far the worst results are with 256 cutouts. I think the sweat spot seems somewhere between 32 and 64.

These tips and takeaways are super useful! Have you played around with the number of classes? I know more classes = more "creativity" but I'm still kinda unclear what's happening. With max_classes = 1000, it seems to keep evolving and rarely converges to a stable result... I've seen 15 recommended for accuracy, but would love to hear other people's thoughts.

from big-sleep.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.