Comments (7)
So I made a little video to show some of the experimentation of different learning rates and cutouts. And I am more and more convinced that the default for both is too high.
https://www.youtube.com/watch?v=0apLPHoUy3c
All use the same seed and the same phrase "a sailing boat in the sea" running for 2000 iterations, each frame represents results after 5 iterations. As you can see most of the time halfway it sort of stabilizes and the changes become minimal at nearly all learning rates except for higher ones. This tells me that more than 1000 iterations seems to be a total waste.
Also by far the worst results are with 256 cutouts. I think the sweat spot seems somewhere between 32 and 64.
from big-sleep.
@Mut1nyJD great questions. I've experimented a bit with this project and found that for good artistic control, a human-in-the-loop approach is the best*. To answer some parts of your questions:
-
the steps performed are = iterationsepochs. The epoch variable seems not to be used; even trying 10 epochs * 10 iterations, or 1 epoch * 100 iterations, I get the same output. For greater quality (assuming convergence), you want to run the algo for long; however, I've seen great results even in just 500 steps. It's a parameter anyway, so very configurable, or you can stop the command-line executable when you want
-
does it depend on the input text? Sometimes you can get good results with a low rate, sometimes with a high rate. I wonder how would an outsider frame the "learning rate", and how would we pick a default value for the community?
-
I think it defaults to a photo (real world) render, so I would omit "a photo of" (maybe the picture itself will contain "a photo of..."). I love the output of "An illustration of..", which seems to be good. Also, try "x made of y". Sometimes I even use the DALL-E strategy of repeating the text multiple times (see the DALL-E web page to understand what I mean). If there are prompts that work well for you, please share.
from big-sleep.
On 1 yes having Epochs really makes no sense. I noticed high number of iterations usually do nothing for me. If you use progression output I see stable results between 50-400 iterations and then it tends to become weird for me most of the time. It is like it suddenly flips and then it goes down a new alley. Maybe have to monitor the loss and see if there is a correlation.
-
Could be haven't really investigated that deeply what I do notice though it is seems to be heavily biased towards dogs. It nearly always starts off with a dog in the image. I wonder if that's because it is one of the biggest classes in ImageNet and the highly unbalanced distribution of classes in ImageNet is too blame here.
-
Hmm I was assuming that CLIP always would basically classify something with a photo of but yes maybe you are right and having photo in might cause confusion. I do notice it is very very responsive to colors though once you have a color token in your input text it really really focuses on that. It starts to become the most dominant feature.
from big-sleep.
@lucidrains
Seems like people agree with @Mut1nyJD ... can the num_cutouts
default be permanently set to to 32? Or do we not want to change too many hyperparameters lest we break someone's existing workflow? I can make a PR if needed, if this is deemed an appropriate thing to do.
from big-sleep.
- Could be haven't really investigated that deeply what I do notice though it is seems to be heavily biased towards dogs. It nearly always starts off with a dog in the image. I wonder if that's because it is one of the biggest classes in ImageNet and the highly unbalanced distribution of classes in ImageNet is too blame here.
I think BigGAN was meant to be used to produce images mainly in one class at a time. Perhaps even mixing a few classes. Anyway, all training examples belong to a single class, if I am not mistaken. In other words, the training samples occupy rather small but isolated areas in the 1000-dimensional class space, along a single axis away from the origin.
What happens then if we activate all classes randomly and normalize the class vector, our class vectors will all be clustered very close to the origin. I guess that, given the absence of training examples in that area, it is simply a byproduct of training that there are dogs there.
This changes by the way when we limit the number of active classes, which we can do with the option max_classes. Then we will get class vectors further away from the origin, and consequently more variation as to the initial images we get. Just try it.
PS. Ryan Murdock did mention in Twitter the idea of dropping the use of the one-hot class vector altogether and using the 128 element embedding instead. I am talking about skipping the first of these two lines here altogether
Lines 575 to 576 in 6afb308
Instead of the 1000 element one-hot vector for the class, one would use a 128 element embedding. I did try it quickly, but using it well would require some work as to how to initialise and bound it properly. So far I think using max_classes already does a good job.
from big-sleep.
I'm having a bit of a conundrum here. It seems class BigSleep
of /big_sleep/big_sleep.py is not executed, making changing the value of num_cutouts
irrelevant. ... neither does deleting the FILE!?
What am I doing wrong here? I'm using the notebook. Intentionally produced error messages point to the file /usr/local/lib/python3.7/dist-packages/big_sleep/big_sleep.py
, which I deleted, and it still works ... what?!
Edit: I needed to restart the runtime... Oops
from big-sleep.
So I made a little video to show some of the experimentation of different learning rates and cutouts. And I am more and more convinced that the default for both is too high.
https://www.youtube.com/watch?v=0apLPHoUy3c
All use the same seed and the same phrase "a sailing boat in the sea" running for 2000 iterations, each frame represents results after 5 iterations. As you can see most of the time halfway it sort of stabilizes and the changes become minimal at nearly all learning rates except for higher ones. This tells me that more than 1000 iterations seems to be a total waste.
Also by far the worst results are with 256 cutouts. I think the sweat spot seems somewhere between 32 and 64.
These tips and takeaways are super useful! Have you played around with the number of classes? I know more classes = more "creativity" but I'm still kinda unclear what's happening. With max_classes = 1000, it seems to keep evolving and rarely converges to a stable result... I've seen 15 recommended for accuracy, but would love to hear other people's thoughts.
from big-sleep.
Related Issues (20)
- Can't install because of conflicting dependencies HOT 5
- RuntimeError: Model has been downloaded but the SHA256 checksum does not not match HOT 3
- [help wanted] Just how much vram is needed to enable the option for OpenAI's larger model? HOT 1
- I would like to know dataset for BigGAN training
- Checkpoint
- Generated images are completely black?! 😵 What am I doing wrong? HOT 7
- Run big sleep in kaggle
- M1 "cuda" Support ? HOT 10
- whoops
- Option for initializing the generation with an image file? HOT 15
- Does it work on mac ? HOT 6
- How to use the CUDA to big-sleep?
- Error: tensor is not a torch image
- About warmup step of CLIP
- 1girl, simple background , nsfw ,{ pants ), messy , cum on bodyI ), wet , mist , stream , crying medium breast , short curly messy hair ,( blue hair )(( pink mediumhair )), beautiful aqua gradient blue and pink eyes ( torn . clothes ) I , nude , white clothes , streaming tear , upper body HOT 1
- Man in Sky
- Development HOT 1
- Pixshot.AI
- AI图片生成 HOT 1
- s
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from big-sleep.