Coder Social home page Coder Social logo

drlx's People

Contributors

ncoop57 avatar shahbuland avatar tmabraham avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

drlx's Issues

CUDA Error on second epoch

Seeing an unknown CUDA error on the second epoch. Will try to debug more tomorrow.

Traceback (most recent call last):
  File "/home/paperspace/git/DRLX/train_aesthetics.py", line 12, in <module>
    trainer.train(pipe, Aesthetics())
  File "/home/paperspace/git/DRLX/src/drlx/trainer/ddpo_trainer.py", line 313, in train
    if self.config.train.total_samples is not None:
  File "/home/paperspace/git/DRLX/src/drlx/trainer/ddpo_trainer.py", line 313, in <listcomp>
    if self.config.train.total_samples is not None:
  File "/home/paperspace/.pyenv/versions/3.9.17/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/paperspace/git/DRLX/src/drlx/denoisers/ldm_unet.py", line 125, in postprocess
    images = images.detach().cpu().permute(0,2,3,1).numpy()
RuntimeError: CUDA error: unknown error
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Traceback (most recent call last):
File "/home/paperspace/git/DRLX/train_aesthetics.py", line 12, in
trainer.train(pipe, Aesthetics())
File "/home/paperspace/git/DRLX/src/drlx/trainer/ddpo_trainer.py", line 313, in train
if self.config.train.total_samples is not None:
File "/home/paperspace/git/DRLX/src/drlx/trainer/ddpo_trainer.py", line 313, in
if self.config.train.total_samples is not None:
File "/home/paperspace/.pyenv/versions/3.9.17/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/paperspace/git/DRLX/src/drlx/denoisers/ldm_unet.py", line 125, in postprocess
images = images.detach().cpu().permute(0,2,3,1).numpy()
RuntimeError: CUDA error: unknown error
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Local models dont work?

I was trying to use a local safetensors sd model and cant seem to get it to work does the current setup always trys to download from hugging face even if an explicit file path is given and use_safetensors is set to true.

The models will work locally if downloaded from the hub intially but not if I give a file path to a local safetensors model

Add abilty to load from safetensors

It seems that presently you can't load from a safetensors file unless I'm mistaken the library is using diffusers with the sdpipeline under the hood which should support this

`save_samples` is shown in the example configs but it isn't supported by the main branch

save_samples: False

save_samples is not supported by the main branch. When in the config, it causes the following error:

Traceback (most recent call last):
  File "/home/ogezi/miniconda3/lib/python3.9/runpy.py", line 188, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/home/ogezi/miniconda3/lib/python3.9/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/home/ogezi/projects/Freda/relations-encoding/spatial_rl.py", line 135, in <module>
    config = DRLXConfig.load_yaml("configs/ddpo_sd.yml")
  File "/home/ogezi/miniconda3/lib/python3.9/site-packages/drlx/configs.py", line 325, in load_yaml
    return cls.from_dict(config)
  File "/home/ogezi/miniconda3/lib/python3.9/site-packages/drlx/configs.py", line 354, in from_dict
    train=TrainConfig.from_dict(config["train"]),
  File "/home/ogezi/miniconda3/lib/python3.9/site-packages/drlx/configs.py", line 12, in from_dict
    return cls(**cfg)
TypeError: __init__() got an unexpected keyword argument 'save_samples'

Reward model inference

Need to add reward model inference for when the RM is a sizable model. Currently attempts to have RM on each GPU. This is problematic because there are many cases where RM is too big to fit alongside the denoiser model. Solution in LLM case is often to use Triton inference server or to put RM on one gpu while main model uses rest of GPUs. Should be explored further.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.