Not so much an issue maybe but a few observations I made while playing a bit with this

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

A few observations about big-sleep HOT 7 OPEN

lucidrains commented on August 14, 2024

A few observations

from big-sleep.

Comments (7)

Mut1nyJD commented on August 14, 2024 8

So I made a little video to show some of the experimentation of different learning rates and cutouts. And I am more and more convinced that the default for both is too high.

https://www.youtube.com/watch?v=0apLPHoUy3c

All use the same seed and the same phrase "a sailing boat in the sea" running for 2000 iterations, each frame represents results after 5 iterations. As you can see most of the time halfway it sort of stabilizes and the changes become minimal at nearly all learning rates except for higher ones. This tells me that more than 1000 iterations seems to be a total waste.

Also by far the worst results are with 256 cutouts. I think the sweat spot seems somewhere between 32 and 64.

from big-sleep.

enricoros commented on August 14, 2024

@Mut1nyJD great questions. I've experimented a bit with this project and found that for good artistic control, a human-in-the-loop approach is the best*. To answer some parts of your questions:

the steps performed are = iterationsepochs. The epoch variable seems not to be used; even trying 10 epochs * 10 iterations, or 1 epoch * 100 iterations, I get the same output. For greater quality (assuming convergence), you want to run the algo for long; however, I've seen great results even in just 500 steps. It's a parameter anyway, so very configurable, or you can stop the command-line executable when you want
does it depend on the input text? Sometimes you can get good results with a low rate, sometimes with a high rate. I wonder how would an outsider frame the "learning rate", and how would we pick a default value for the community?
I think it defaults to a photo (real world) render, so I would omit "a photo of" (maybe the picture itself will contain "a photo of..."). I love the output of "An illustration of..", which seems to be good. Also, try "x made of y". Sometimes I even use the DALL-E strategy of repeating the text multiple times (see the DALL-E web page to understand what I mean). If there are prompts that work well for you, please share.

from big-sleep.

Mut1nyJD commented on August 14, 2024

@enricoros

On 1 yes having Epochs really makes no sense. I noticed high number of iterations usually do nothing for me. If you use progression output I see stable results between 50-400 iterations and then it tends to become weird for me most of the time. It is like it suddenly flips and then it goes down a new alley. Maybe have to monitor the loss and see if there is a correlation.

Could be haven't really investigated that deeply what I do notice though it is seems to be heavily biased towards dogs. It nearly always starts off with a dog in the image. I wonder if that's because it is one of the biggest classes in ImageNet and the highly unbalanced distribution of classes in ImageNet is too blame here.
Hmm I was assuming that CLIP always would basically classify something with a photo of but yes maybe you are right and having photo in might cause confusion. I do notice it is very very responsive to colors though once you have a color token in your input text it really really focuses on that. It starts to become the most dominant feature.

from big-sleep.

walmsley commented on August 14, 2024

@lucidrains
Seems like people agree with @Mut1nyJD ... can the num_cutouts default be permanently set to to 32? Or do we not want to change too many hyperparameters lest we break someone's existing workflow? I can make a PR if needed, if this is deemed an appropriate thing to do.

from big-sleep.

htoyryla commented on August 14, 2024

Could be haven't really investigated that deeply what I do notice though it is seems to be heavily biased towards dogs. It nearly always starts off with a dog in the image. I wonder if that's because it is one of the biggest classes in ImageNet and the highly unbalanced distribution of classes in ImageNet is too blame here.

I think BigGAN was meant to be used to produce images mainly in one class at a time. Perhaps even mixing a few classes. Anyway, all training examples belong to a single class, if I am not mistaken. In other words, the training samples occupy rather small but isolated areas in the 1000-dimensional class space, along a single axis away from the origin.

What happens then if we activate all classes randomly and normalize the class vector, our class vectors will all be clustered very close to the origin. I guess that, given the absence of training examples in that area, it is simply a byproduct of training that there are dogs there.

This changes by the way when we limit the number of active classes, which we can do with the option max_classes. Then we will get class vectors further away from the origin, and consequently more variation as to the initial images we get. Just try it.

PS. Ryan Murdock did mention in Twitter the idea of dropping the use of the one-hot class vector altogether and using the 128 element embedding instead. I am talking about skipping the first of these two lines here altogether

big-sleep/big_sleep/biggan.py

Lines 575 to 576 in 6afb308

    
           embed = self.embeddings(class_label) 
        
           cond_vector = torch.cat((z, embed), dim=1)

Instead of the 1000 element one-hot vector for the class, one would use a 128 element embedding. I did try it quickly, but using it well would require some work as to how to initialise and bound it properly. So far I think using max_classes already does a good job.

from big-sleep.

LtqxWYEG commented on August 14, 2024

I'm having a bit of a conundrum here. It seems class BigSleep of /big_sleep/big_sleep.py is not executed, making changing the value of num_cutouts irrelevant. ... neither does deleting the FILE!?
What am I doing wrong here? I'm using the notebook. Intentionally produced error messages point to the file /usr/local/lib/python3.7/dist-packages/big_sleep/big_sleep.py, which I deleted, and it still works ... what?!

Edit: I needed to restart the runtime... Oops

from big-sleep.

gregloryus commented on August 14, 2024

So I made a little video to show some of the experimentation of different learning rates and cutouts. And I am more and more convinced that the default for both is too high.

https://www.youtube.com/watch?v=0apLPHoUy3c

All use the same seed and the same phrase "a sailing boat in the sea" running for 2000 iterations, each frame represents results after 5 iterations. As you can see most of the time halfway it sort of stabilizes and the changes become minimal at nearly all learning rates except for higher ones. This tells me that more than 1000 iterations seems to be a total waste.

Also by far the worst results are with 256 cutouts. I think the sweat spot seems somewhere between 32 and 64.

These tips and takeaways are super useful! Have you played around with the number of classes? I know more classes = more "creativity" but I'm still kinda unclear what's happening. With max_classes = 1000, it seems to keep evolving and rarely converges to a stable result... I've seen 15 recommended for accuracy, but would love to hear other people's thoughts.

from big-sleep.

A few observations about big-sleep HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	embed = self.embeddings(class_label)
	cond_vector = torch.cat((z, embed), dim=1)