Coder Social home page Coder Social logo

depthcrafter's Introduction

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

     

Wenbo Hu1* †, Xiangjun Gao2*, Xiaoyu Li1* †, Sijie Zhao1, Xiaodong Cun1,
Yong Zhang1, Long Quan2, Ying Shan3, 1


1Tencent AI Lab 2The Hong Kong University of Science and Technology 3ARC Lab, Tencent PCG

arXiv preprint, 2024

🔆 Introduction

  • [24-9-28] Add full dataset inference and evaluation scripts for better comparison use. :-)
  • [24-9-25] 🤗🤗🤗 Add huggingface online demo DepthCrafter.
  • [24-9-19] Add scripts for preparing benchmark datasets.
  • [24-9-18] Add point cloud sequence visualization.
  • [24-9-14] 🔥🔥🔥 DepthCrafter is released now, have fun!

🤗 DepthCrafter can generate temporally consistent long-depth sequences with fine-grained details for open-world videos, without requiring additional information such as camera poses or optical flow.

🎥 Visualization

We provide some demos of unprojected point cloud sequences, with reference RGB and estimated depth videos. Please refer to our project page for more details.

365030500-ff625ffe-93ab-4b58-a62a-50bf75c89a92.mov

🚀 Quick Start

🛠️ Installation

  1. Clone this repo:
git clone https://github.com/Tencent/DepthCrafter.git
  1. Install dependencies (please refer to requirements.txt):
pip install -r requirements.txt

🤖 Gradio Demo

gradio app.py

🤗 Model Zoo

DepthCrafter is available in the Hugging Face Model Hub.

🏃‍♂️ Inference

1. High-resolution inference, requires a GPU with ~26GB memory for 1024x576 resolution:

  • Full inference (~0.6 fps on A100, recommended for high-quality results):

    python run.py  --video-path examples/example_01.mp4
  • Fast inference through 4-step denoising and without classifier-free guidance (~2.3 fps on A100):

    python run.py  --video-path examples/example_01.mp4 --num-inference-steps 4 --guidance-scale 1.0

2. Low-resolution inference requires a GPU with ~9GB memory for 512x256 resolution:

  • Full inference (~2.3 fps on A100):

    python run.py  --video-path examples/example_01.mp4 --max-res 512
  • Fast inference through 4-step denoising and without classifier-free guidance (~9.4 fps on A100):

    python run.py  --video-path examples/example_01.mp4  --max-res 512 --num-inference-steps 4 --guidance-scale 1.0

🚀 Dataset Evaluation

Please check the benchmark folder.

  • To create the dataset we use in the paper, you need to run dataset_extract/dataset_extract_${dataset_name}.py.
  • Then you will get the csv files that save the relative root of extracted RGB video and depth npz files. We also provide these csv files.
  • Inference for all datasets scripts:
    bash benchmark/infer/infer.sh
    (Remember to replace the input_rgb_root and saved_root with your own path.)
  • Evaluation for all datasets scripts:
    bash benchmark/eval/eval.sh
    (Remember to replace the pred_disp_root and gt_disp_root with your own path.)

🤝 Contributing

  • Welcome to open issues and pull requests.
  • Welcome to optimize the inference speed and memory usage, e.g., through model quantization, distillation, or other acceleration techniques.

📜 Citation

If you find this work helpful, please consider citing:

@article{hu2024-DepthCrafter,
            author      = {Hu, Wenbo and Gao, Xiangjun and Li, Xiaoyu and Zhao, Sijie and Cun, Xiaodong and Zhang, Yong and Quan, Long and Shan, Ying},
            title       = {DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos},
            journal     = {arXiv preprint arXiv:2409.02095},
            year        = {2024}
    }

depthcrafter's People

Contributors

gaoxiangjun avatar wbhu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

depthcrafter's Issues

Crashes memory on RTX 4090 Laptop (16Go)

Congrats for the great work !
I can't make it run on my laptop with RTX 4090 16Gb with the command
python run.py --video-path examples/example_01.mp4 --max-res 512, it crashes with memory error.

Is there some other ways to optimize for small configuration?
Thanks!

Which DepthAnything models are being compared?

Hello, great work!

Can you please clarify which DepthAnything and DepthAnything-V2 models are used for comparison in Table1 of the paper?
Also there is no detail on the inference speed of the model? Can you please specify how long would it take to infer 110 frame on a consumer-level gpu? To this end, how much extra cost the devised inference strategy incur for video lengths more than 110?

Looking forward to the code release!

Model Editing or weights finetuning

Hello,

Thank you very much for this fantastic work! I am very interested in the depth crafter. May I ask if there is any way I can edit the pipeline or fine-tune the weights on my dataset? Would you like to release the codes for training in the future?

Best Regards

the type of depth maps

Hi authors, great work!

I am also working on video depth prediction. And I am wondering is that your predicted depth in disparity space?

That is, 1.0/(depth + tiny value) is the true depth value, same as Depth Anything series.

Thanks!

cant run the code token error

hey guys, i keep getting an error that says i need my token from huggingface, i go to enter in my token and it wont let me paste it into the field in the prompt it says i can right click to paste it but when i right click it does nothing, when i type it wont let me type and it does nothing, when i press ctrl+V still nothing. i literally cant access the program lol any suggestions.

how to get the point cloud

Hey, congrats on the amazing work. Could you provide the pointcloud generation and visualization code using depth infromation for the demo in the github page? Thanks very much for your kindly help

scannet测试数据

hello,我想问一下论文里是怎么处理的scannet数据集,我看到代码里路径是raw/scans_test,你们是怎么样取得test数据集
截屏2024-09-27 14 26 08

Depth Map Generation Limitation?

I have a 27-second video, but the output gives me a 7-second depth map. What am I doing wrong? Is there any limitation on the input regarding the length of frames or seconds?

如何推理超过110帧的长视频?

目前的 app.py ,只能推理一小段的短视频,约10秒左右。
如何推理长视频,比如几分钟的呢?

我试了分段处理然后拼接,发现深度图视频拼接处的 “闪烁” 非常严重。
后续有计划更新 app.py ,让它支持长视频吗?谢谢~

Image sequence

Hello, is it possible to export image sequence instead of compressed mp4?

Entry Not Found for config.json

Hey guys, I'm having this error - yesterday everything was working perfectly :

huggingface_hub.errors.EntryNotFoundError: 404 Client Error.

Entry Not Found for url: https://huggingface.co/tencent/DepthCrafter/resolve/main/unet/config.json.

As it says in the error and when you paste the link, it cannot finds the config.json file, I believe the file has been moved or ? I've seen this config.json, is that the one it should find ? If so, how can I update the link ?

Any suggestions on how to make this work again ?🙌

3D model in the output?

hi, can we have 3d reconstruction of input video in the output? like ".glb" files? (like point clouds in your github page)

Inference speed

Hi, I appreciate your work so much, it would be so useful for my personal research.

I tested your code on my 60-frame videos, and it took about 3-5 min to complete.
I guess this makes sense since DepthCrafter is based on (maybe multi-step) denoising technique,
but I want to confirm if if falls in the expected range of runtime.

Thanks!

Specific Cases for Evaluation on Bonn Dataset

Hi authors, great work!

I am curious about the specific cases you selected in Bonn dataset for evaluation. In the paper you said that 5 sequences were selected. Can you tell me which 5 are selected and why these 5 cases are selected?

Thanks,
Haodong

评估代码

hello,看到你们放了准备benchmark的脚本,是计划放评估代码吗,如果放的话,大概什么时候ready

Making models available on HF

Hi,

Niels here from the open-source team at Hugging Face. I discovered your work through the paper page: https://huggingface.co/papers/2409.02095 (feel free to claim the paper so that it appears under your HF account!). I work together with AK on improving the visibility of researchers' work on the hub.

Happy to assist making the models available on the hub once they get released.

Uploading models

See here for a guide: https://huggingface.co/docs/hub/models-uploading. In case the model is a custom PyTorch model, we could probably leverage the PyTorchModelHubMixin class which adds from_pretrained and push_to_hub to the model. Alternatively, one can leverages the hf_hub_download one-liner to download a checkpoint from the hub.

We encourage researchers to push each model checkpoint to a separate model repository, so that things like download stats also work.

Demo as a Space

One could also create a Gradio demo. Happy to connect with the Gradio folks at HF on making this a breeze.

Let me know if you need any help regarding this!

Cheers,

Niels
ML Engineer @ HF 🤗

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.