fastai / diffusion-nbs Goto Github PK

Getting started with diffusion

License: Apache License 2.0

Jupyter Notebook 99.98% Python 0.02%

diffusion-nbs's Introduction

Welcome to fastai

Installing

You can use fastai without any installation by using Google Colab. In fact, every page of this documentation is also available as an interactive notebook - click “Open in colab” at the top of any page to open it (be sure to change the Colab runtime to “GPU” to have it run fast!) See the fast.ai documentation on Using Colab for more information.

You can install fastai on your own machines with conda (highly recommended), as long as you’re running Linux or Windows (NB: Mac is not supported). For Windows, please see the “Running on Windows” for important notes.

We recommend using miniconda (or miniforge). First install PyTorch using the conda line shown here, and then run:

conda install -c fastai fastai

To install with pip, use: pip install fastai.

If you plan to develop fastai yourself, or want to be on the cutting edge, you can use an editable install (if you do this, you should also use an editable install of fastcore to go with it.) First install PyTorch, and then:

git clone https://github.com/fastai/fastai
pip install -e "fastai[dev]"

Learning fastai

The best way to get started with fastai (and deep learning) is to read the book, and complete the free course.

To see what’s possible with fastai, take a look at the Quick Start, which shows how to use around 5 lines of code to build an image classifier, an image segmentation model, a text sentiment model, a recommendation system, and a tabular model. For each of the applications, the code is much the same.

Read through the Tutorials to learn how to train your own models on your own datasets. Use the navigation sidebar to look through the fastai documentation. Every class, function, and method is documented here.

To learn about the design and motivation of the library, read the peer reviewed paper.

About fastai

fastai is a deep learning library which provides practitioners with high-level components that can quickly and easily provide state-of-the-art results in standard deep learning domains, and provides researchers with low-level components that can be mixed and matched to build new approaches. It aims to do both things without substantial compromises in ease of use, flexibility, or performance. This is possible thanks to a carefully layered architecture, which expresses common underlying patterns of many deep learning and data processing techniques in terms of decoupled abstractions. These abstractions can be expressed concisely and clearly by leveraging the dynamism of the underlying Python language and the flexibility of the PyTorch library. fastai includes:

A new type dispatch system for Python along with a semantic type hierarchy for tensors
A GPU-optimized computer vision library which can be extended in pure Python
An optimizer which refactors out the common functionality of modern optimizers into two basic pieces, allowing optimization algorithms to be implemented in 4–5 lines of code
A novel 2-way callback system that can access any part of the data, model, or optimizer and change it at any point during training
A new data block API
And much more…

fastai is organized around two main design goals: to be approachable and rapidly productive, while also being deeply hackable and configurable. It is built on top of a hierarchy of lower-level APIs which provide composable building blocks. This way, a user wanting to rewrite part of the high-level API or add particular behavior to suit their needs does not have to learn how to use the lowest level.

Migrating from other libraries

It’s very easy to migrate from plain PyTorch, Ignite, or any other PyTorch-based library, or even to use fastai in conjunction with other libraries. Generally, you’ll be able to use all your existing data processing code, but will be able to reduce the amount of code you require for training, and more easily take advantage of modern best practices. Here are migration guides from some popular libraries to help you on your way:

Windows Support

Due to python multiprocessing issues on Jupyter and Windows, num_workers of Dataloader is reset to 0 automatically to avoid Jupyter hanging. This makes tasks such as computer vision in Jupyter on Windows many times slower than on Linux. This limitation doesn’t exist if you use fastai from a script.

See this example to fully leverage the fastai API on Windows.

We recommend using Windows Subsystem for Linux (WSL) instead – if you do that, you can use the regular Linux installation approach, and you won’t have any issues with num_workers.

Tests

To run the tests in parallel, launch:

nbdev_test

For all the tests to pass, you’ll need to install the dependencies specified as part of dev_requirements in settings.ini

pip install -e .[dev]

Tests are written using nbdev, for example see the documentation for test_eq.

Contributing

After you clone this repository, make sure you have run nbdev_install_hooks in your terminal. This install Jupyter and git hooks to automatically clean, trust, and fix merge conflicts in notebooks.

After making changes in the repo, you should run nbdev_prepare and make additional and necessary changes in order to pass all the tests.

Docker Containers

For those interested in official docker containers for this project, they can be found here.

diffusion-nbs's People

Contributors

Stargazers

Watchers

Forkers

rekil156 kavindu404 vtecftwy kesfrance stpingi agisga jmelsbach maizabros nmoran niyazikemer jegomezg s13tc2 matt-fff mozeal arunoda jantic butchland ksferguson cabbage8897 qbiwan petergoldstein joshiokk vigneshbaskar n-e-w njgroene cbare problemsolversguild leemengtw nyandwi nyzu-dev anatolicvs markb2 raulkite meta-sean 921kiyo fahimf liuyixi520 franckalbinet breezedeus johannesstutz o-keenan mattlichti albertvaka nirantk dmitriyg228 benmfox sirkyven harpreetmann24 nasheqlbrm thliang01 harishanand95 rrustom truthdead devforfu strickvl thangdduong ihanif ahmetgunduz mistobaan akash5474 mohitjuneja techthiyanes sayan0506 sunbc0120 kaitang26 animesh csaroff lkarjun alexredplanet riochuong wesleyegberto miteas cly julicq ikj1992 fmquaglia xl0 jiruijin dacquaviva ferranespigares munisdev86 ab-10 ahagai jszymanskijs calvinfeng jatintyagi-dev deep-oasis-ai centerqi xichenn jgmjgm deeplearn-art yurazharkovskyab akshaypt7 mohsen796 garymihalik1 mistrymm7 aswahd sarahspak kevo1ution briansigafoos

diffusion-nbs's Issues

TypeError: init() got an unexpected keyword argument 'tensor_format'

i can t solve this error.i never found tensor_format in text.what it should be?

noise_scheduler = DDPMScheduler(
    beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000, tensor_format="pt")#, tensor_format="pt"

text
DDPMScheduler
num_train_timesteps – number of diffusion steps used to train the model.
beta_start – the starting beta value of inference.
beta_end – the final beta value.
beta_schedule – the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from linear, scaled_linear, or squaredcos_cap_v2.
trained_betas – option to pass an array of betas directly to the constructor to bypass beta_start, beta_end etc.
variance_type – options to clip the variance used when adding noise to the denoised sample. Choose from fixed_small, fixed_small_log, fixed_large, fixed_large_log, learned or learned_range.
clip_sample – option to clip predicted sample between -1 and 1 for numerical stability.
prediction_type – prediction type of the scheduler function, one of epsilon (predicting the noise of the diffusion process), sample (directly predicting the noisy sample`) or v_prediction

How to add clip loss in Guidence part for stable diffusion latent ?

Hi, thanks for your great work.

I'm trying to add clip loss for stable diffusion latent in the Guidence part, but when i decode image from latent and put it into clip_model, when cal clip_loss with latent, it occurs an error, RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior. How to fix this problem or is there any example to refer?

Stable Diffusion deep dive notebook vae encoding problem

while encode image to the latent space using
latent = vae.encode(tfms.ToTensor()(input_im).unsqueeze(0).to(torch.float16).to(torch_device)*2-1)
it gave error RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.HalfTensor) should be the same

As my graphics card 8gb I converted vae to torch.float16. Is that the problem.

the whole error is---

RuntimeError Traceback (most recent call last)
Cell In[20], line 2
1 # Encode to the latent space
----> 2 encoded = pil_to_latent(input_image)
3 encoded.shape
4 # Let's visualize the four channels of this latent representation:

Cell In[18], line 4, in pil_to_latent(input_im)
1 def pil_to_latent(input_im):
2 # Single image -> single latent in a batch (so size 1, 4, 64, 64)
3 with torch.no_grad():
----> 4 latent = vae.encode(tfms.ToTensor()(input_im).type(torch.float16).unsqueeze(0).to(torch_device)*2-1) # Note scaling
5 return 0.18215 * latent.latent_dist.sample()

File F:\Python 3.10.8\lib\site-packages\diffusers\models\vae.py:566, in AutoencoderKL.encode(self, x, return_dict)
565 def encode(self, x: torch.FloatTensor, return_dict: bool = True) -> AutoencoderKLOutput:
--> 566 h = self.encoder(x)
567 moments = self.quant_conv(h)
568 posterior = DiagonalGaussianDistribution(moments)

File F:\Python 3.10.8\lib\site-packages\torch\nn\modules\module.py:1190, in Module._call_impl(self, *input, **kwargs)
1186 # If we don't have any hooks, we want to skip the rest of the logic in
1187 # this function, and just call forward.
1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1189 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190 return forward_call(*input, **kwargs)
1191 # Do not call functions when jit is used
1192 full_backward_hooks, non_full_backward_hooks = [], []

File F:\Python 3.10.8\lib\site-packages\diffusers\models\vae.py:130, in Encoder.forward(self, x)
128 def forward(self, x):
129 sample = x
--> 130 sample = self.conv_in(sample)
132 # down
133 for down_block in self.down_blocks:

File F:\Python 3.10.8\lib\site-packages\torch\nn\modules\conv.py:463, in Conv2d.forward(self, input)
462 def forward(self, input: Tensor) -> Tensor:
--> 463 return self._conv_forward(input, self.weight, self.bias)

File F:\Python 3.10.8\lib\site-packages\torch\nn\modules\conv.py:459, in Conv2d._conv_forward(self, input, weight, bias)
455 if self.padding_mode != 'zeros':
456 return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
457 weight, bias, self.stride,
458 _pair(0), self.dilation, self.groups)
--> 459 return F.conv2d(input, weight, bias, self.stride,
460 self.padding, self.dilation, self.groups)

RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.HalfTensor) should be the same

Difference between `latents requires_grad=True` and `torch.no_grad()`

Thanks for sharing such an amazing work :)

In the last section of the notebook Stable Diffusion Deep Dive.ipynb, you mention:

NB: We should set latents requires_grad=True before we do the forward pass of the unet (removing with torch.no_grad()) if we want mode accurate gradients. BUT this requires a lot of extra memory. You'll see both approaches used depending on whose implementation you're looking at.

Can you please clarify what is the difference between the two approaches? For example, if I had to code this, I would have used torch.no_grad(), but apparently you preferred another approach. What does it change computationally and results-wise?.

I think adding this as extra info to the notebook would be useful to others, too :)

How to get pred_original_sample when using PNDMScheduler

@johnowhitaker in the Stable Diffusion Deep Dive.ipynb notebook, section The UNET and CFG.

You get latents_x0 because the scheduler exposes pred_original_sample

    latents_x0 = scheduler.step(noise_pred, t, latents).pred_original_sample # Using the scheduler (Diffusers 0.4 and above)

How to get this pred_original_sample when using PNDMScheduler? As this scheduler does not expose this value.

FileNotFoundError: [Errno 2] No such file or directory: 'learned_embeds.bin'

birb_embed = torch.load('learned_embeds.bin')
birb_embed.keys(), birb_embed[''].shape

Deep Dive NB: Quick Fix for AttributeError: 'CLIPTextTransformer' object has no attribute '_build_causal_attention_mask'

In the Stable Diffusion Deep Dive notebook, in the code plot immediately following the Transformer diagram, there is the definition of get_output_embeds which includes a call to text_encoder.text_model._build_causal_attention_mask:

def get_output_embeds(input_embeddings):
    # CLIP's text model uses causal mask, so we prepare it here:
    bsz, seq_len = input_embeddings.shape[:2]
    causal_attention_mask = text_encoder.text_model._build_causal_attention_mask(bsz, seq_len, dtype=input_embeddings.dtype)
    ...

That is currently generating an error for me when I run the notebook on Colab (from a fresh instance) or my home computer:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-33-dbb74b7ec9b4>](https://localhost:8080/#) in <cell line: 26>()
     24     return output
     25 
---> 26 out_embs_test = get_output_embeds(input_embeddings) # Feed through the model with our new function
     27 print(out_embs_test.shape) # Check the output shape
     28 out_embs_test # Inspect the output

1 frames
[/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in __getattr__(self, name)
   1612             if name in modules:
   1613                 return modules[name]
-> 1614         raise AttributeError("'{}' object has no attribute '{}'".format(
   1615             type(self).__name__, name))
   1616 

AttributeError: 'CLIPTextTransformer' object has no attribute '_build_causal_attention_mask'

Everything in the notebook prior to that line runs fine.

Perhaps I'm doing something wrong, or perhaps something has changed with the HF libraries that being used, since the notebook's original conception?

UPDATE:

I see the same issue here: drboog/ProFusion#12. It seems that transformers has changed. Downgrading to version 4.25.1 fixed the problem.

Thus changing the the pip install line at the top of the notebook to

!pip install -q --upgrade transformers==4.25.1 diffusers ftfy

...will restore full functionality.

Feel free to close this issue at your convenience. Perhaps a PR is in order.

Presumably some way to keep up to date with transformers will be preferable, but for now this is a quick fix.

Perhaps bug in img2img example

Hey @johnowhitaker,

Might there be a bug in the img2img example here:

# Loop
for i, t in tqdm(enumerate(scheduler.timesteps)):
    if i > start_step: # << This is the only modification to the loop we do

Should this be i >= start_step?

i.e. image -> images

crash

Minor typo in Stable Diffusion Deep Dive notebook

Hi, @johnowhitaker

In the positional embedding section at the line

But now instead of dealing with ~50 tokens we just need one for each position (77 total):

I think the number of tokens should be 50K instead of 50. Could you please confirm this?

Thanks

Stable Diffusion Deep Dive fails at UNET and CFG with IndexError: index 51 is out of bounds for dimension 0 with size 51

Full error here:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In [63], line 19
     16 # Get the predicted x0:
     17 # latents_x0 = latents - sigma * noise_pred # Calculating ourselves
     18 print(noise_pred.shape, t, latents.shape)
---> 19 latents_x0 = scheduler.step(noise_pred, t, latents).pred_original_sample # Using the scheduler (Diffusers 0.4 and above)
     21 # compute the previous noisy sample x_t -> x_t-1
     22 latents = scheduler.step(noise_pred, t, latents).prev_sample

File /usr/local/lib/python3.9/dist-packages/diffusers/schedulers/scheduling_lms_discrete.py:405, in LMSDiscreteScheduler.step(self, model_output, timestep, sample, order, return_dict)
    403 # 3. Compute linear multistep coefficients
    404 order = min(self.step_index + 1, order)
--> 405 lms_coeffs = [self.get_lms_coefficient(order, self.step_index, curr_order) for curr_order in range(order)]
    407 # 4. Compute previous sample based on the derivatives path
    408 prev_sample = sample + sum(
    409     coeff * derivative for coeff, derivative in zip(lms_coeffs, reversed(self.derivatives))
    410 )

File /usr/local/lib/python3.9/dist-packages/diffusers/schedulers/scheduling_lms_discrete.py:405, in <listcomp>(.0)
    403 # 3. Compute linear multistep coefficients
    404 order = min(self.step_index + 1, order)
--> 405 lms_coeffs = [self.get_lms_coefficient(order, self.step_index, curr_order) for curr_order in range(order)]
    407 # 4. Compute previous sample based on the derivatives path
    408 prev_sample = sample + sum(
    409     coeff * derivative for coeff, derivative in zip(lms_coeffs, reversed(self.derivatives))
    410 )

File /usr/local/lib/python3.9/dist-packages/diffusers/schedulers/scheduling_lms_discrete.py:233, in LMSDiscreteScheduler.get_lms_coefficient(self, order, t, current_order)
    230         prod *= (tau - self.sigmas[t - k]) / (self.sigmas[t - current_order] - self.sigmas[t - k])
    231     return prod
--> 233 integrated_coeff = integrate.quad(lms_derivative, self.sigmas[t], self.sigmas[t + 1], epsrel=1e-4)[0]
    235 return integrated_coeff

IndexError: index 51 is out of bounds for dimension 0 with size 51

I tried playing around with the indices, but seems like it is another issue. Moving to an older checkout doesn't fix it either.