nateraw / stable-diffusion-videos Goto Github PK

View Code? Open in Web Editor NEW

4.2K 56.0 405.0 9.93 MB

Create 🔥 videos with Stable Diffusion by exploring the latent space and morphing between text prompts

License: Apache License 2.0

Python 83.27% Jupyter Notebook 16.73%

ai-art huggingface huggingface-diffusers machine-learning stable-diffusion

stable-diffusion-videos's Introduction

stable-diffusion-videos

Try it yourself in Colab:

TPU version (~x6 faster than standard colab GPUs):

Example - morphing between "blueberry spaghetti" and "strawberry spaghetti"

berry_good_spaghetti.2.mp4

Installation

pip install stable_diffusion_videos

Usage

Check out the examples folder for example scripts 👀

Making Videos

Note: For Apple M1 architecture, use torch.float32 instead, as torch.float16 is not available on MPS.

from stable_diffusion_videos import StableDiffusionWalkPipeline
import torch

pipeline = StableDiffusionWalkPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    torch_dtype=torch.float16,
).to("cuda")

video_path = pipeline.walk(
    prompts=['a cat', 'a dog'],
    seeds=[42, 1337],
    num_interpolation_steps=3,
    height=512,  # use multiples of 64 if > 512. Multiples of 8 if < 512.
    width=512,   # use multiples of 64 if > 512. Multiples of 8 if < 512.
    output_dir='dreams',        # Where images/videos will be saved
    name='animals_test',        # Subdirectory of output_dir where images/videos will be saved
    guidance_scale=8.5,         # Higher adheres to prompt more, lower lets model take the wheel
    num_inference_steps=50,     # Number of diffusion steps per image generated. 50 is good default
)

Making Music Videos

New! Music can be added to the video by providing a path to an audio file. The audio will inform the rate of interpolation so the videos move to the beat 🎶

from stable_diffusion_videos import StableDiffusionWalkPipeline
import torch

pipeline = StableDiffusionWalkPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    torch_dtype=torch.float16,
).to("cuda")

# Seconds in the song.
audio_offsets = [146, 148]  # [Start, end]
fps = 30  # Use lower values for testing (5 or 10), higher values for better quality (30 or 60)

# Convert seconds to frames
num_interpolation_steps = [(b-a) * fps for a, b in zip(audio_offsets, audio_offsets[1:])]

video_path = pipeline.walk(
    prompts=['a cat', 'a dog'],
    seeds=[42, 1337],
    num_interpolation_steps=num_interpolation_steps,
    audio_filepath='audio.mp3',
    audio_start_sec=audio_offsets[0],
    fps=fps,
    height=512,  # use multiples of 64 if > 512. Multiples of 8 if < 512.
    width=512,   # use multiples of 64 if > 512. Multiples of 8 if < 512.
    output_dir='dreams',        # Where images/videos will be saved
    guidance_scale=7.5,         # Higher adheres to prompt more, lower lets model take the wheel
    num_inference_steps=50,     # Number of diffusion steps per image generated. 50 is good default
)

Using the UI

from stable_diffusion_videos import StableDiffusionWalkPipeline, Interface
import torch

pipeline = StableDiffusionWalkPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    torch_dtype=torch.float16,
).to("cuda")

interface = Interface(pipeline)
interface.launch()

Credits

This work built off of a script shared by @karpathy. The script was modified to this gist, which was then updated/modified to this repo.

Contributing

You can file any issues/feature requests here

Enjoy 🤗

Extras

Upsample with Real-ESRGAN

You can also 4x upsample your images with Real-ESRGAN!

It's included when you pip install the latest version of stable-diffusion-videos!

You'll be able to use upsample=True in the walk function, like this:

pipeline.walk(['a cat', 'a dog'], [234, 345], upsample=True)

The above may cause you to run out of VRAM. No problem, you can do upsampling separately.

To upsample an individual image:

from stable_diffusion_videos import RealESRGANModel

model = RealESRGANModel.from_pretrained('nateraw/real-esrgan')
enhanced_image = model('your_file.jpg')

Or, to do a whole folder:

from stable_diffusion_videos import RealESRGANModel

model = RealESRGANModel.from_pretrained('nateraw/real-esrgan')
model.upsample_imagefolder('path/to/images/', 'path/to/output_dir')

stable-diffusion-videos's People

Contributors

Stargazers

Watchers

Forkers

reinventorsofthewheel abdelmageed95 chenxwh johndpope techthiyanes jamesthesnake hbcbh1999 asears piet4 aisdn getabhishekified loretoparisi baifengbai kadantte abedygathaba codzart p00pcvm donlinglok akashad98 jorik041 friedri1970 zyieo marcus-arcadius zhukuixi suryatmodulus zjudzl avatarchik seraph1188v souvikg544 shaar68 felatcetin matesxs rsh4d0w zivzone atannir martin-ev bankainojutsu haroldss kennethgoodman trainmachines realrandolph anilcosaran 0x1355 philmaas nosahama williamqzy rupencxosync miguelgargallo christophercelaya takhyun12 ipsin codefaux hengtuibabai yamine15 furmanlukasz me-gauravaggarwal juju9301 kolourr michaelgathara emichaelc langhalsdino lisa-smallbird zhiye0820 bryanyzhu johnjhr june-yang vickzhang tomchen-ctj errord silenzio777 danielpatrickhug jaimevelasco kingstorm jaedukseo caburn erikatlotic sycomix anubhav712 altruong niravprajapati1 hunter-heidenreich isaac-art ivoider horuselohim as-himself marcomorik austfish tahercoolguy sanjay-rassani renata-marselevna web3pm brunotech dutchkillsx lyghtcode vizuetcf michaellin99999 0xcryptoshark amitshah khdjn dineshkumares

stable-diffusion-videos's Issues

save images/prompts when generating images

Lets add multi-image ability in both the app, and in a new function generate_images (or something along those lines). Each call to the underlying function should save images to new subdirectory of output dir along with prompt config.

Is interpolation allowed between two artificially specified images？

As shown in the title.
Thanks for your fantastic work !!!

Multiple prompts (more than 2) and/or over-ride file saving naming scheme

Love this so far but I'm running into a problematic limitation.
I have large libraries of prompts designed to be sequentially connected.
I'd like to be able to interpolate in sequence through 40 or 50 prompts.
I can do this now by looping the pipe.walk() call and stepping through the list of prompts always placing the last prompt in the first prompt position with each iteration and even control the saving directory but I end up with dozens of separate directories that then have to be un-marshaled and compile their files. Much bother.
I can keep track of my own file numbers if it helps but on long runs (400 or 500 prompts) the current system is just unworkable for me.
Point me in the right direction in your code and I could try fixing it myself.
Thanks for everything you've done so far!

Allow users to pass in a specified StableDiffusionPipeline

In stable_diffusion_walk.py, the StableDiffusionPipeline is hardcoded at v1.4 and fp16.

stable-diffusion-videos/stable_diffusion_videos/stable_diffusion_walk.py

Lines 13 to 18 in 8a7b3b0

    
           pipeline = StableDiffusionPipeline.from_pretrained( 
        
               "CompVis/stable-diffusion-v1-4", 
        
               use_auth_token=True, 
        
               torch_dtype=torch.float16, 
        
               revision="fp16", 
        
           ).to("cuda")

The walk() function should take in a parametric StableDiffusionPipeline because a) there may be other/better StableDiffusion models in the future and b) if the text tokenizer of the Pipeline needs to be augmeneted in order to generate from textual inversion-trained embeddings.

Update Replicate interface to include latest changes here

Any way to upsample/scale the images before the video is created like with Google Colab? Sucks to pay for something which outputs inferior quality.

Torch not complied with CUDA enabled

from stable_diffusion_videos import interface
ftfy or spacy is not installed using BERT BasicTokenizer instead of ftfy.
Traceback (most recent call last):
File "", line 1, in
File "", line 1055, in handle_fromlist
File "D:\AI\stable-diffusion-videos\stable_diffusion_videos_init.py", line 75, in getattr
submod = importlib.import_module(submod_path)
File "D:\Conda_Envs\envs\SDV\lib\importlib_init_.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "D:\AI\stable-diffusion-videos\stable_diffusion_videos\app.py", line 6, in
from .stable_diffusion_walk import SCHEDULERS, pipeline, walk
File "D:\AI\stable-diffusion-videos\stable_diffusion_videos\stable_diffusion_walk.py", line 11, in
pipeline = StableDiffusionPipeline.from_pretrained(
File "D:\Conda_Envs\envs\SDV\lib\site-packages\diffusers\pipeline_utils.py", line 126, in to
module.to(torch_device)
File "D:\Conda_Envs\envs\SDV\lib\site-packages\torch\nn\modules\module.py", line 927, in to
return self._apply(convert)
File "D:\Conda_Envs\envs\SDV\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply
module._apply(fn)
File "D:\Conda_Envs\envs\SDV\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply
module._apply(fn)
File "D:\Conda_Envs\envs\SDV\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "D:\Conda_Envs\envs\SDV\lib\site-packages\torch\nn\modules\module.py", line 602, in apply
param_applied = fn(param)
File "D:\Conda_Envs\envs\SDV\lib\site-packages\torch\nn\modules\module.py", line 925, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "D:\Conda_Envs\envs\SDV\lib\site-packages\torch\cuda_init.py", line 211, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

RE: Token

Unable to copy paste token from Huggingface, regards login. tried both Powershell and CMD

upgrade to diffusers 0.5.1

Should update to latest diffusers release.

Notes (will add more if I come across them):

Can remove NoCheck in place of safety_checker=None in from_pretrained

Not sure what caused this issue. I can make images fine, but when it completes the videos, I get this message.

This is from finishing the video interpolation in the local app interface

Traceback (most recent call last):
File "D:\Python\lib\site-packages\gradio\routes.py", line 273, in run_predict
output = await app.blocks.process_api(
File "D:\Python\lib\site-packages\gradio\blocks.py", line 753, in process_api
result = await self.call_function(fn_index, inputs, iterator)
File "D:\Python\lib\site-packages\gradio\blocks.py", line 630, in call_function
prediction = await anyio.to_thread.run_sync(
File "D:\Python\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "D:\Python\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "D:\Python\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "D:\Python\lib\site-packages\stable_diffusion_videos\app.py", line 57, in fn_videos
video_path = walk(
File "D:\Python\lib\site-packages\stable_diffusion_videos\stable_diffusion_walk.py", line 288, in walk
return make_video_ffmpeg(output_path, f"{name}.mp4", fps=fps, frame_filename=f"frame%06d{frame_filename_ext}")
File "D:\Python\lib\site-packages\stable_diffusion_videos\stable_diffusion_walk.py", line 66, in make_video_ffmpeg
subprocess.call(
File "D:\Python\lib\subprocess.py", line 345, in call
with Popen(*popenargs, **kwargs) as p:
File "D:\Python\lib\subprocess.py", line 969, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "D:\Python\lib\subprocess.py", line 1438, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

This is from running it on command terminal after typing in python

from stable_diffusion_videos import walk

walk(
... prompts=['a cat', 'a dog'],
... seeds=[42, 1337],
... output_dir='dreams', # Where images/videos will be saved
... name='animals_test', # Subdirectory of output_dir where images/videos will be saved
... guidance_scale=8.5, # Higher adheres to prompt more, lower lets model take the wheel
... num_steps=5, # Change to 60-200 for better results...3-5 for testing
... num_inference_steps=50,
... scheduler='klms', # One of: "klms", "default", "ddim"
... disable_tqdm=False, # Set to True to disable tqdm progress bar
... make_video=True, # If false, just save images
... use_lerp_for_text=True, # Use lerp for text embeddings instead of slerp
... do_loop=False, # Change to True if you want last prompt to loop back to first prompt
... )
COUNT: 0/5
100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [00:34<00:00, 1.45it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [00:36<00:00, 1.39it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [00:37<00:00, 1.35it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [00:37<00:00, 1.33it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [00:36<00:00, 1.36it/s]
Traceback (most recent call last):
File "", line 1, in
File "D:\Python\lib\site-packages\stable_diffusion_videos\stable_diffusion_walk.py", line 288, in walk
return make_video_ffmpeg(output_path, f"{name}.mp4", fps=fps, frame_filename=f"frame%06d{frame_filename_ext}")
File "D:\Python\lib\site-packages\stable_diffusion_videos\stable_diffusion_walk.py", line 66, in make_video_ffmpeg
subprocess.call(
File "D:\Python\lib\subprocess.py", line 345, in call
with Popen(*popenargs, **kwargs) as p:
File "D:\Python\lib\subprocess.py", line 969, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "D:\Python\lib\subprocess.py", line 1438, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

NameError: name 'interface' is not defined

Hey there, I get this error when trying to run the interface.launch(debug=True) cell on the Colab Doc:

NameError                                 Traceback (most recent call last)
[<ipython-input-4-75e4cf6bcc7d>](https://localhost:8080/#) in <module>
----> 1 interface.launch(debug=True)

NameError: name 'interface' is not defined

Any solutions for this? Thank you :)

RuntimeError: Numpy is not available

Once I did the basic installation I made a file with this script.
from stable_diffusion_videos import interface interface.launch()

It runs fine until half way through it says:
RuntimeError: Numpy is not available
I have tried uninstalling and reinstalling numpy. I have also tried installing 1.21.0 and 1.23.3 neither of them work.
Is this a problem with my environment, or is this an actual bug?

from stable_diffusion_videos import interface throws error

Whenever i import interface in python it throws the below error
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<frozen importlib._bootstrap>", line 1039, in _handle_fromlist File "/home/zaithe/.local/lib/python3.8/site-packages/stable_diffusion_videos/__init__.py", line 75, in __getattr__ submod = importlib.import_module(submod_path) File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/home/zaithe/.local/lib/python3.8/site-packages/stable_diffusion_videos/app.py", line 6, in <module> from .stable_diffusion_walk import SCHEDULERS, pipeline, walk File "/home/zaithe/.local/lib/python3.8/site-packages/stable_diffusion_videos/stable_diffusion_walk.py", line 11, in <module> pipeline = StableDiffusionPipeline.from_pretrained( File "/home/zaithe/.local/lib/python3.8/site-packages/diffusers/pipeline_utils.py", line 154, in from_pretrained cached_folder = snapshot_download( File "/home/zaithe/.local/lib/python3.8/site-packages/huggingface_hub/_snapshot_download.py", line 182, in snapshot_download repo_info = _api.repo_info( File "/home/zaithe/.local/lib/python3.8/site-packages/huggingface_hub/hf_api.py", line 1289, in repo_info return self.model_info( File "/home/zaithe/.local/lib/python3.8/site-packages/huggingface_hub/hf_api.py", line 1136, in model_info _raise_for_status(r) File "/home/zaithe/.local/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 84, in _raise_for_status _raise_with_request_id(request) File "/home/zaithe/.local/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 95, in _raise_with_request_id raise e File "/home/zaithe/.local/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 90, in _raise_with_request_id request.raise_for_status() File "/usr/lib/python3/dist-packages/requests/models.py", line 940, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://huggingface.co/api/models/CompVis/stable-diffusion-v1-4/revision/main (Request ID: d88ARCfxzZgiWqYtjauWn)

Save prompt config

Save prompt configuration args in walk within frame directory so you don't lose your prompts/seeds.

FileNotFoundError: [WinError 2] The system cannot find the file specified

Images are generated in the folder specified but FFmpeg throws this error when creating the video.

from stable_diffusion_videos.stable_diffusion_walk import walk

walk(
    prompts=['a cat', 'a dog'],
    seeds=[10, 42],
    output_dir='dreams',     # Where images/videos will be saved
    name='animals_test',     # Subdirectory of output_dir where images/videos will be saved
    guidance_scale=8.5,      # Higher adheres to prompt more, lower lets model take the wheel
    num_steps=5,             # Change to 60-200 for better results...3-5 for testing
    num_inference_steps=50, 
    scheduler='klms',        # One of: "klms", "default", "ddim"
    disable_tqdm=False,      # Set to True to disable tqdm progress bar
    make_video=True,         # If false, just save images
    use_lerp_for_text=True,  # Use lerp for text embeddings instead of slerp
    do_loop=False,           # Change to True if you want last prompt to loop back to first prompt
)

Traceback (most recent call last):
  File "c:\python\Projects\stable-diffusion\stable-generator.py", line 3, in <module>
    walk(
  File "C:\python\Projects\Gesture Scrolling\keyvenv\lib\site-packages\stable_diffusion_videos\stable_diffusion_walk.py", line 288, in walk
    return make_video_ffmpeg(output_path, f"{name}.mp4", fps=fps, frame_filename=f"frame%06d{frame_filename_ext}")
  File "C:\python\Projects\Gesture Scrolling\keyvenv\lib\site-packages\stable_diffusion_videos\stable_diffusion_walk.py", line 66, in make_video_ffmpeg
    subprocess.call(
  File "C:\Users\meetg\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 349, in call
    with Popen(*popenargs, **kwargs) as p:
  File "C:\Users\meetg\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Users\meetg\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 1420, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

{enhancement} Option to load local checkpoint file

It would be wonderful to have the option to load the checkpoint file locally.

Add quick note to colab explaining that you need to accept terms of the model

I think it'd be nice to have a short note in the Colab demo, explaining that in order to use it, you first need to click on "I have read the License and agree with its terms" in the model card.

Missing modules and unable to launch locally

I am trying to run the app locally with the provided script:
https://github.com/nateraw/stable-diffusion-videos#run-the-app-locally

Upon running the code, i get missing torch, torchvision and cv2 (opencv-python), then when i install them i get told to install realesrgan, finally after installing it, i receive an assertion error saying Torch not compiled with CUDA enabled

I used to be able to run the app when the script was

from stable_diffusion_videos import interface
interface.launch()

audio alignment is off in later videos when calling pipeline.walk

when you use multiple prompts/seeds (4 in my case), the audio for the later clips begin to be off from where it should be. This is actually because when calculating T, the audio offset calculation is incorrect:

stable-diffusion-videos/stable_diffusion_videos/stable_diffusion_pipeline.py

Line 739 in 90039cf

offset=audio_start_sec + (i * num_step / fps),

This should instead be running sum of num_interpolation_steps.

add stable diffusion inpainting for videos

We'd have to discuss an API for this. Perhaps its 2 video inputs, one with source, other with masks. Iterate over frames here. Not sure if any wonkiness could come out of this approach though...

Probably better to take in 2 directories or 1 directory w/ file names denoting which are masks and which are src images.

add style pre-commit hook and/or Makefile

Need something to enforce style here. Getting a bit messy now

Pass batches of prompts

We're doing 1 image at a time, but if you have the vram you'll probably get a nice speedup if you do inference batch-wise. In the case of walk, we might even want to do a dataloader, etc, etc since you'll have 100s of inputs and will have to break them up.

Will hack some experiments around this and report back here results. If worth the effort, I'll do this

add ability to continue from certain step

Would be nice to continue from a certain step if you stopped part of the way through. With prompt config this shouldn't be too bad.

what would be really nice is to continue from certain step and steer it in another direction (like if you didn't like where it went when it ran the first time)...something to think about

Add image as an input option along with prompt and seed

Would it be possible in the current implementation to also add an image alongside a prompt as the "seed package" for the frame?

Multiple GPUs?

Hello Nate,

your implementation is fantastic! It really increased my productivity!

Indeed, running diffusion and upscaling at the same time might not be possible if RAM is too small. I happen to have two GPUs. Would it be possible to put the GAN for upscaling to the second GPU?

Best,
Tristan

ImportError: cannot import name 'PipelineRealESRGAN' from 'stable_diffusion_videos'

Hey there,

Awesome feature and quality code! Thanks!

Everything works for me except for upsampling. Whenever I added upsample=True in walk(), I got:

ImportError: cannot import name 'PipelineRealESRGAN' from 'stable_diffusion_videos' (/home/env/lib/python3.8/site-packages/stable_diffusion_videos/init.py)

Any thought on how to solve this? Thanks

PipelineRealESRGAN upsample_imagefolder - README

Hi Nate,

awesome work!

There is a minor mistake in the README. In the ESRGAN section, you wrote enhance_imagefolder, but you meant upsample_imagefolder.

Add depth model from MiDas for getting depth tensor to create 3D animations

Add functionality for calculation of depth tensors as seen here.
https://github.com/deforum/stable-diffusion/blob/main/helpers/depth.py
also Affine geometric transformation functions(translations, rotations, scaling)
would be a cool addition and project.

add num steps for each prompt

I was thinking it would be useful to allow for some prompts to run longer than others. so changing num_steps from an int to a list of ints.
essentially changing

for prompt, seed in zip(prompts, seeds):

for prompt, seed, num_step in zip(prompts, seeds, num_steps):

What do you think?
potentially expand this to all parameters by using parameter dict for each prompt.

Infinitely tileable videos

Couldn't see anything when searching so apologies if already done, but I took the circular convolution patch from:
https://github.com/TomMoore515/material_stable_diffusion

and applied it before running walk and it worked to produce tileable videos, fork here with notebook for anyone interested:
https://github.com/isaac-art/tile-stable-diffusion-videos/

Could be added as enhancement to walk parameters, adding a 'tileable=True/False' option and running the patch?

Demo gif here: https://isaacclarke.com/tiletest/

Change num_steps param to num_interpolation_steps or num_walk_steps

This param is poorly named, and causes confusion. it will be a breaking change. perhaps we do some deprecation of it - keeping it for a few pypi versions and set the new param under the hood, warning user of incoming change.

add image-to-image for videos

Given an input video, produce outputs for each frame.

add real esrgan upsampling on processed frames

Take a frame dir and process to new dir using real esrgan for super-resolution.

Torch complaining on M1 machine

I kept getting this error when running it on M1 machine with 8GB RAM:
Torch not compiled with CUDA enabled

Determine if best default is lerp or slerp for text embeddings

The use_lerp_for_text param lets you use lerp instead of slerp for text embedding interpolation. There was a comment on my original gist suggesting lerp was better, but I never got around to doing exhaustive comparison.

Would be nice if we could do this + set the value (or even get rid of the option to change it) based on what's best.

Remove duplicated NoCheck class

There's 2 of them in the pipeline file! my b lol. Just gotta remove one. Will do this myself if someone here doesn't get to it first ❤️

Move videos based on music

Would be nice to add an additional module that would "push" or "jitter" frames of the video based on music. A closed source version of this can be seen here. Would love to have open souce option for doing this with stable diffusion 😄

I don't really have the bandwidth to do this myself right now, so help (even if its just a colab with minimal example) would be much appreciated on this!!

If anybody is creeping on this repo/working on their own project + ends up getting to this before me, please ping me here so I can play with it 😬

Support for M1 macs MPS api

Feature request

Add support for the M1 chip, since the current macs do not have a Nvidia CUDA GPU.

Useful Resource

Pull request for stable-diffusion M1 support
https://github.com/lstein/stable-diffusion/pull/268/files#diff-fde39b496af1173c473e9bccb3ccf07a148c1b050a15f659d5241252268bbb1d

I will do a pull request to fix it

Incorrect import of RealESRGANModel

Really easy fix here. just update reference in init.py. Will do myself if some kind soul here doesn't get to it first.

Add callback option.

Maybe I didn't see it and it is already implemented but it would be nice to have a callback option which is executed every X number of steps, it would be useful for doing something like previewing the frame while it's being generated, on our WebUI we have a tab called "Text-To-Video" which is based on the script before this repo was made, we want to migrate that old code to use things from here as its better and more up to date than what we have but the problem we face is that we show the user a preview image every X number of steps, like for example every 10 steps, while it slows down the generations a bit this makes it so the user can see how the current frame looks during generation before it has finished so they can have a better idea of how things look instead of wasting a lot of time generating stuff without having any idea how it will look, it also helps with just showing progress, it seems like users prefer to be able to see how things look rather than having things go faster during the generation, so, a callback returning the current image or anything we can use would be enough for us, without it, we would probably need to break our heads around in order to make things work. I would appreciate if this feature is added or if it is already part of the code then I would appreciate some pointers on how to use it. Thanks for your time.

ModuleNotFoundError: No module named 'websockets.datastructures'

When I clone the repo to my pc and try to run it it pops the following error

>>> from app import interface
>>> interface.launch()
Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Users\Mr.Gamio\AppData\Local\Programs\Python\Python37\lib\threading.py", line 917, in _bootstrap_inner
    self.run()
  File "C:\Users\Mr.Gamio\AppData\Local\Programs\Python\Python37\lib\threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\Mr.Gamio\AppData\Local\Programs\Python\Python37\lib\site-packages\uvicorn\server.py", line 60, in run
    return asyncio.run(self.serve(sockets=sockets))
  File "C:\Users\Mr.Gamio\AppData\Local\Programs\Python\Python37\lib\asyncio\runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "C:\Users\Mr.Gamio\AppData\Local\Programs\Python\Python37\lib\asyncio\base_events.py", line 568, in run_until_complete
    return future.result()
  File "C:\Users\Mr.Gamio\AppData\Local\Programs\Python\Python37\lib\site-packages\uvicorn\server.py", line 67, in serve
    config.load()
  File "C:\Users\Mr.Gamio\AppData\Local\Programs\Python\Python37\lib\site-packages\uvicorn\config.py", line 471, in load
    ws_protocol_class = import_from_string(WS_PROTOCOLS[self.ws])
  File "C:\Users\Mr.Gamio\AppData\Local\Programs\Python\Python37\lib\site-packages\uvicorn\importer.py", line 24, in import_from_string
    raise exc from None
  File "C:\Users\Mr.Gamio\AppData\Local\Programs\Python\Python37\lib\site-packages\uvicorn\importer.py", line 21, in import_from_string
    module = importlib.import_module(module_str)
  File "C:\Users\Mr.Gamio\AppData\Local\Programs\Python\Python37\lib\importlib\__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "C:\Users\Mr.Gamio\AppData\Local\Programs\Python\Python37\lib\site-packages\uvicorn\protocols\websockets\auto.py", line 17, in <module>
    from uvicorn.protocols.websockets.websockets_impl import WebSocketProtocol
  File "C:\Users\Mr.Gamio\AppData\Local\Programs\Python\Python37\lib\site-packages\uvicorn\protocols\websockets\websockets_impl.py", line 9, in <module>
    from websockets.datastructures import Headers
ModuleNotFoundError: No module named 'websockets.datastructures'

Fix resize issue in get_timesteps_arr

When calculating T for music videos, we're currently doing a np.resize to get to the size of num_frames we want to generate. this actually screws up the scaling. We need to interpolate instead to accurately rescale the array and maintain the shape.

Did some tests locally with this and it works great!! Much better than current implementation.

Save results somewhere other than Colab

Would be nice to save files to Google Drive or even Hugging Face Hub so you don't lose them when the notebook disconnects.

Colab Music Video error: tuple index out of range

Hi,

I am trying to use Colab to generate a video using an mp3 file. Still, I keep getting this error:

IndexError                                Traceback (most recent call last)
[<ipython-input-16-593e08c3b40c>](https://localhost:8080/#) in <module>
     10     batch_size=1,                          # increase until you go out of memory
     11     output_dir='./dreams',                 # Where images will be saved
---> 12     name=None,                             # Subdir of output dir. will be timestamp by default
     13 )
     14 visualize_video_colab(video_path)

2 frames
[/usr/local/lib/python3.7/dist-packages/stable_diffusion_videos/stable_diffusion_pipeline.py](https://localhost:8080/#) in walk(self, prompts, seeds, num_interpolation_steps, output_dir, name, image_file_ext, fps, num_inference_steps, guidance_scale, eta, height, width, upsample, batch_size, resume, audio_filepath, audio_start_sec)
    803                 audio_offset=audio_offset,
    804                 audio_duration=audio_duration,
--> 805                 sr=44100,
    806             )
    807 

[/usr/local/lib/python3.7/dist-packages/stable_diffusion_videos/stable_diffusion_pipeline.py](https://localhost:8080/#) in make_video_pyav(frames_or_frame_dir, audio_filepath, fps, audio_offset, audio_duration, sr, output_filepath, glob_pattern)
    145             audio_fps=sr,
    146             audio_codec="aac",
--> 147             options={"crf": "10", "pix_fmt": "yuv420p"},
    148         )
    149     else:

[/usr/local/lib/python3.7/dist-packages/torchvision/io/video.py](https://localhost:8080/#) in write_video(filename, video_array, fps, video_codec, options, audio_array, audio_fps, audio_codec, audio_options)
    114             num_channels = audio_array.shape[0]
    115             audio_layout = "stereo" if num_channels > 1 else "mono"
--> 116             audio_sample_fmt = container.streams.audio[0].format.name
    117 
    118             format_dtype = np.dtype(audio_format_dtypes[audio_sample_fmt])

IndexError: tuple index out of range

I am not sure if there is something I should change...

Should `walk` instead be a pipeline?

When I first made this repo, it was basically just a script, which is why the stable_diffusion_walk.py file is the way it is. However, as the ideas and features here grow, it may be nice to structure things a bit better (especially if we add new workflows like #27 and #26 ).

We could have logic of walk be the __call__ function of a new pipeline, StableDiffusionForVideosPipeline or whatever. This would be a self-contained pipeline object - essentially building on top of the existing pipeline file. Then, as we write up the logic for the above additional workflows, they can become their own self-contained pipelines as well.

Anybody have any thoughts here? This would be a pretty big restructure with breaking API changes, but would be a bit more "professional" and less hacky. But maybe its fine this remains hacky. idk😅

Add image generator/saver

I wrote a script that'll generate images (has options num_batches, batch_size, push_to_hub) basically you decide how many images to generate of a certain prompt, and it'll generate ones with completely random seeds, then push them to the Hugging Face Hub. This is nice cuz we can then have an open source gallery of images there.

use revision=fp16 in pipeline init

Apparently this uses less vram and is faster. Would be nice to benchmark for sanity, but this should be a 1 line change

ModuleNotFoundError: No module named 'app'

I'm trying to run the notebook. I've ran the setup successfully, I've also logged in successfully with my HF token. When I run the "Load the interface" cell I get the following ModuleNotFoundError error:

/content/stable-diffusion-videos
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
[<ipython-input-4-fa6e4247984b>](https://h7k0zf9a2wc-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab-20220906-060059-RC00_472423659#) in <module>
      2 get_ipython().run_line_magic('cd', '/content/stable-diffusion-videos/')
      3 
----> 4 from app import interface

ModuleNotFoundError: No module named 'app'

Update diffusers version

0.3.0 just got released on PyPi.

Notes:

Progress bar feature now integrated, can remove my hack
vae decoder outputs in pipeline will have to use .sample

Save to datetime string subdir if `name` not provided

Already doing this in the interface, but I think it's a reasonable behavior for default.

Colab - Use walk programatically : TypeError: walk() got an unexpected keyword argument 'num_interpolation_steps'

Hi, I'm experiencing an issue when using walk programmatically. I'm doing this as I'd like to use three prompts, rather than the strict two allowed with Gradio.

Here's my config:

from stable_diffusion_videos import walk

video_path = walk(['Prompt1', 'Prompt2', 'Prompt3'], 
    seeds=[4332, 1337, 692671],
    name='trippy',            
    guidance_scale=17.5,   
    num_interpolation_steps=200,     
    num_inference_steps=50,     
    scheduler='ddim',           
    disable_tqdm=False,        
    make_video=True,           
    use_lerp_for_text=True,   
    do_loop=True,             
    upsample=True, 
)
visualize_video_colab(video_path)

The below error is thrown:

TypeError                                 Traceback (most recent call last)
[<ipython-input-24-548d897a7e61>](https://localhost:8080/#) in <module>
     12     use_lerp_for_text=True,     # Use lerp for text embeddings instead of slerp
     13     do_loop=True,              # Change to True if you want last prompt to loop back to first prompt
---> 14     upsample=True,
     15 )
     16 visualize_video_colab(video_path)

TypeError: walk() got an unexpected keyword argument 'num_interpolation_steps'

I've also tried using replacing num_interpolation_steps with num_walk_steps but that didn't work!

Any help would be greatly appreciated as I've just got Colab access back & I'm itching to make some animations!

	pipeline = StableDiffusionPipeline.from_pretrained(
	"CompVis/stable-diffusion-v1-4",
	use_auth_token=True,
	torch_dtype=torch.float16,
	revision="fp16",
	).to("cuda")