Coder Social home page Coder Social logo

atumcell / marigold Goto Github PK

View Code? Open in Web Editor NEW

This project forked from prs-eth/marigold

0.0 0.0 0.0 4.98 MB

Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

Home Page: https://marigoldmonodepth.github.io

License: Apache License 2.0

Shell 1.89% Python 98.11%

marigold's Introduction

Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

This repository represents the official implementation of the paper titled "Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation".

Website Paper Open In Colab Hugging Face Space Hugging Face Model License

Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Metzger, Rodrigo Caye Daudt, Konrad Schindler

We present Marigold, a diffusion model and associated fine-tuning protocol for monocular depth estimation. Its core principle is to leverage the rich visual knowledge stored in modern generative image models. Our model, derived from Stable Diffusion and fine-tuned with synthetic data, can zero-shot transfer to unseen data, offering state-of-the-art monocular depth estimation results.

teaser

๐Ÿ“ข News

2023-12-19: Updated license to Apache License, Version 2.0.
2023-12-08: Added - try it out with your images for free!
2023-12-05: Added - dive deeper into our inference pipeline!
2023-12-04: Added paper and inference code (this repository).

๐Ÿš€ Usage

We offer several ways to interact with Marigold:

  1. A free online interactive demo is available here: (kudos to the HF team for the GPU grant)

  2. Run the demo locally (requires a GPU and an nvidia-docker2, see Installation Guide): docker run -it -p 7860:7860 --platform=linux/amd64 --gpus all registry.hf.space/toshas-marigold:latest python app.py

  3. Extended demo on a Google Colab:

  4. If you just want to see the examples, visit our gallery:

  5. Finally, local development instructions are given below.

๐Ÿ› ๏ธ Setup

This code was tested on:

  • Ubuntu 22.04 LTS, Python 3.10.12, CUDA 11.7, GeForce RTX 3090 (pip, Mamba)
  • CentOS Linux 7, Python 3.10.4, CUDA 11.7, GeForce RTX 4090 (pip)
  • Windows 11 22H2, Python 3.10.12, CUDA 12.3, GeForce RTX 3080 (Mamba)
  • MacOS 14.2, Python 3.10.12, M1 16G (pip)

๐ŸชŸ A Note for Windows users

We recommend running the code in WSL2:

  1. Install WSL following installation guide.
  2. Install CUDA support for WSL following installation guide.
  3. Find your drives in /mnt/<drive letter>/; check WSL FAQ for more details. Navigate to the working directory of choice.

๐Ÿ“ฆ Repository

Clone the repository (requires git):

git clone https://github.com/prs-eth/Marigold.git
cd Marigold

๐Ÿ’ป Dependencies

We provide several ways to install the dependencies.

  1. Using Mamba, which can installed together with Miniforge3.

    Windows users: Install the Linux version into the WSL.

    After the installation, Miniforge needs to be activated first: source /home/$USER/miniforge3/bin/activate.

    Create the environment and install dependencies into it:

    mamba env create -n marigold --file environment.yaml
    conda activate marigold
  2. Using pip: Alternatively, create a Python native virtual environment and install dependencies into it:

    python -m venv venv/marigold
    source venv/marigold/bin/activate
    pip install -r requirements.txt

Keep the environment activated before running the inference script. Activate the environment again after restarting the terminal session.

๐Ÿš€ Testing on your images

๐Ÿ“ท Prepare images

If you have images at hand, skip this step. Otherwise, download a few select images from our paper:

bash script/download_sample_data.sh

๐ŸŽฎ Run inference

Place your images in a directory, for example, under input/in-the-wild_example, and run the following command:

python run.py \
    --input_rgb_dir input/in-the-wild_example \
    --output_dir output/in-the-wild_example

You can find all results in output/in-the-wild_example. Enjoy!

โš™๏ธ Inference settings

The default settings are optimized for the best result. However, the behavior of the code can be customized:

  • Trade-offs between the accuracy and speed (for both options, larger values result in better accuracy at the cost of slower inference.)

    • --ensemble_size: Number of inference passes in the ensemble. Default: 10.
    • --denoise_steps: Number of denoising steps of each inference pass. Default: 10.
  • --half_precision: Run with half-precision (16-bit float) to reduce VRAM usage, might lead to suboptimal result.

  • By default, the inference script resizes input images to the processing resolution, and then resizes the prediction back to the original resolution. This gives the best quality, as Stable Diffusion, from which Marigold is derived, performs best at 768x768 resolution.

    • --processing_res: the processing resolution; set 0 to process the input resolution directly. Default: 768.
    • --output_processing_res: produce output at the processing resolution instead of upsampling it to the input resolution. Default: False.
  • --seed: Random seed can be set to ensure additional reproducibility. Default: None (using current time as random seed).

  • --batch_size: Batch size of repeated inference. Default: 0 (best value determined automatically).

  • --color_map: Colormap used to colorize the depth prediction. Default: Spectral.

  • --apple_silicon: Use Apple Silicon MPS acceleration.

โฌ‡ Checkpoint cache

By default, the checkpoint is stored in the Hugging Face cache. The HF_HOME environment variable defines its location and can be overridden:

export HF_HOME=new/path

Alternatively, use the following script to download the checkpoint weights locally:

bash script/download_weights.sh

At inference, specify the checkpoint path:

python run.py \
    --checkpoint checkpoint/Marigold_v1_merged_2 \
    --input_rgb_dir input/in-the-wild_example\
    --output_dir output/in-the-wild_example

โœ๏ธ Contributing

Please refer to this instruction.

๐Ÿค” Troubleshooting

Problem Solution
(Windows) Invalid DOS bash script on WSL Run dos2unix <script_name> to convert script format
(Windows) error on WSL: Could not load library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory Run export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH

๐ŸŽ“ Citation

Please cite our paper:

@misc{ke2023repurposing,
      title={Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation}, 
      author={Bingxin Ke and Anton Obukhov and Shengyu Huang and Nando Metzger and Rodrigo Caye Daudt and Konrad Schindler},
      year={2023},
      eprint={2312.02145},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

๐ŸŽซ License

This work is licensed under the Apache License, Version 2.0 (as defined in the LICENSE).

By downloading and using the code and model you agree to the terms in the LICENSE.

License

marigold's People

Contributors

markkua avatar toshas avatar nandometzger avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.