Coder Social home page Coder Social logo

google / dynamic-video-depth Goto Github PK

View Code? Open in Web Editor NEW
255.0 255.0 40.0 35.5 MB

Code for the SIGGRAPH 2021 paper "Consistent Depth of Moving Objects in Video".

Home Page: https://dynamic-video-depth.github.io

License: Apache License 2.0

Python 98.20% Shell 1.80%
deep-learning

dynamic-video-depth's People

Contributors

fcole avatar ztzhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dynamic-video-depth's Issues

Question about the colmap parameter setting and image resize need to convert the camera pose

This is very useful work, thanks. I use colmap automatic_reconstructor --camera_model FULL_OPENCV to process the dog training set in DAVIS to get the camera pose, then replacing ./datafiles/DAVIS/triangulation/, other training codes have not changed, but the depth result of each frame has become much worse. How to set the specific parameters of colmap preprocessing? In addition, the image is resized to a small image during training, does the camera pose information obtained by colmap need to be transformed according to resize?

DAVIS datafiles uncomplete?

"datafiles.tar" in provided "Google Drive" download link consists only triangulation data.
There are no "JPEGImages/1080p" and "Annotation//1080p" folders that "python ./scripts/preprocess/davis/generate_frame_midas.py" refers to:

---
data_list_root = "./datafiles/DAVIS/JPEGImages/1080p"
camera_path = "./datafiles/DAVIS/triangulation"
mask_path = './datafiles/DAVIS/Annotations/1080p'
---

Setting `--batch_size` to 8 results in RuntimeError

Hi, when I run the (slightly altered) script from the README

time ./experiments/davis/train_sequence.sh 0 --track_id dog --batch_size 8

i get a RuntimeError: number of dims don't match in permute.

Output & Stacktrace:

...
[Verbose] # training points: 274
[Verbose] # training batches per epoch: 34
[Verbose] # test batches: 7
==> Training
Epoch 1/20
Traceback (most recent call last):
  File "/home/hydrofin/dynamic-video-depth/train.py", line 364, in <module>
    main()
  File "/home/hydrofin/dynamic-video-depth/train.py", line 114, in main
    main_worker(None, 1, opt=opt)
  File "/home/hydrofin/dynamic-video-depth/train.py", line 339, in main_worker
    model.train_epoch(
  File "/home/hydrofin/dynamic-video-depth/models/netinterface.py", line 354, in train_epoch
    _train(epoch)
  File "/home/hydrofin/dynamic-video-depth/models/netinterface.py", line 295, in _train
    batch_log = self._train_on_batch(epoch, i, data)
  File "/home/hydrofin/dynamic-video-depth/models/scene_flow_motion_field.py", line 187, in _train_on_batch
    pred = self._predict_on_batch()
  File "/home/hydrofin/dynamic-video-depth/models/scene_flow_motion_field.py", line 233, in _predict_on_batch
    depth_1 = self.net_depth(self._input.img_1)
  File "/home/hydrofin/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/hydrofin/dynamic-video-depth/third_party/MiDaS.py", line 216, in forward
    x = x.permute([0, 2, 3, 1])
RuntimeError: number of dims don't match in permute

Specs:

  • WSL 2: 5.10.60.1-microsoft-standard-WSL2
  • using Anaconda: conda 4.11.0
  • on Windows 10
  • GPU (8 GiB): NVIDIA GeForce RTX 2060 SUPER

Note:
I can't run the scripts as is, because the default batch size (of 16?) results in a out-of-memory on my 8 GiB GPU.
Also, when running the script, i had to manually install some libs (like pandas). I don't know, if i did something wrong or if the script has an error.

question about the Pre-processing

Can you provide the code for preprocessing part? I wonder for dynamic video, how to get accurate camera pose and K? I see you use DAVIS for example, I want to know how to deal with other videos in this dataset.

How to get the triangulation files for customized videos?

Thanks for sharing this great work!

I was wondering how to obtain the triangulation files when using my own videos. For example, the dog.intrinsics.txt, dog.matrices.txt, and the dog.obj.

Are they calculated from colmap? Could you please provide some instructions to get them?

Question about triangulation results file

This is a great project, Thanks for your work. I have download triangulation results from your link, but i only found dog.intrinsics.txt and train.intrinsics.txt, In DAVIS-2017-trainval-Full-Resolution.zip file, There are 90 files in it, I was wondering if you could share all the triangulation files about Davis and ShutterStock dataset, Thanks very much.

Parameter finetuning vs Output finetuning

It seems that running gradient descent for the depth prediction network makes up the majority of the runtime of this method. The current MiDaS implementation (v3?) contains 1.3 GB of parameters, most of which are for the DPT-Large (https://github.com/isl-org/DPT) backbone.

In your research, did you experiment with performance differences between 'parameter finetuning' and just simple 'output finetuning' for the depth predictions (like as discussed in the GLNet paper (https://arxiv.org/pdf/1907.05820.pdf))?

I would also be curious about whether as a middle ground, maybe just finetuning the 'head' of the MiDaS network would be sufficient, and leave the much larger set of backbone parameters locked.

Thanks!

SyntaxError: invalid syntax

Upon running :
bash ./experiments/davis/train_sequence.sh 0 --track_id dog
Got an error:

File "train.py", line 106
str_warning, f'ignoring the gpu set up in opt: {opt.gpu}. Will use all gpus in each node.')
                                                                                         ^
SyntaxError: invalid syntax

Windows 10, Python 3.7.10

Question about scene flow sign

Hi, I reviewed the code and got one question.

In models/scene_flow_motion_field.py L256 and L257,

            flow_data_input['sflow_1_2'] = sf_1_2.permute(0, 2, 3, 1)[..., None, :]  # .fill_(0)
            flow_data_input['sflow_2_1'] = sf_1_2.permute(0, 2, 3, 1)[..., None, :]

is it supposed to be like this with a minus sign?

            flow_data_input['sflow_1_2'] = sf_1_2.permute(0, 2, 3, 1)[..., None, :]  # .fill_(0)
            flow_data_input['sflow_2_1'] = (-)sf_1_2.permute(0, 2, 3, 1)[..., None, :]

It is hard to understand without a minus sign because they are used in losses/scene_flow_projection.py L244 ~ L246.

        p1_camera_2 = torch.matmul(global_p1 + sflow_1_2 - t_2, R_2_T)
        p1_camera_2_static = torch.matmul(global_p1 - t_2, R_2_T)
        p2_camera_1 = torch.matmul(global_p2 + sflow_2_1 - t_1, R_1_T)

Maybe I missed something when I review your code.
What do you think?

Script for rendering the teaser video (fix-view-video)

Hi, I want to ask that can you provide the script for rendering the fix-view-video as shown in the teaser.
I think it will involve some forward warping and splatting. Did you use the estimated depth to handle the many-to-one mapping in the forward warping and splatting?
Thank you!

Can not reproduce training result

As it has been mentioned in issue #9 "DAVIS datafiles uncomplete":
"datafiles.tar in provided "Google Drive" download link consists only triangulation data.
There are no "JPEGImages/1080p" and "Annotation//1080p" folders that "python ./scripts/preprocess/davis/generate_frame_midas.py" refers to."
So, I manually downloaded missing data from https://data.vision.ee.ethz.ch/csergi/share/davis/DAVIS-2017-Unsupervised-trainval-Full-Resolution.zip
After that the structure as follow:

├── datafiles
    ├── DAVIS
        ├── Annotations  --- missing in supplied download links, downloaded manually from DAVIS datasets 
            ├── 1080p
                ├── dog
                ├── train
        ├── JPEGImages  --- missing in supplied download links, downloaded manually from DAVIS datasets 
            ├── 1080p
                ├── dog
                ├── train
        ├── triangulation -- data from supplied link

Only after that I could successfully performed all steps of suggested in "Davis data preparation":

  1. Run python ./scripts/preprocess/davis/generate_frame_midas.py.
  2. Run python ./scripts/preprocess/davis/generate_flows.py
  3. Run python ./scripts/preprocess/davis/generate_sequence_midas.py

However still couldn't reproduce the presented result, running:
bash ./experiments/davis/train_sequence.sh 0 --track_id dog

Output & Stacktrace:


D:\dynamic-video-depth-main>bash ./experiments/davis/train_sequence.sh 0 --track_id dog
python train.py --net scene_flow_motion_field --dataset davis_sequence --track_id train --log_time --epoch_batches 2000 --epoch 20 --lr 1e-6 --html_logger --vali_batches 150 --batch_size 1 --optim adam --vis_batches_vali 4 --vis_every_vali 1 --vis_every_train 1 --vis_batches_train 5 --vis_at_start --tensorboard --gpu 0 --save_net 1 --workers 4 --one_way --loss_type l1 --l1_mul 0 --acc_mul 1 --disp_mul 1 --warm_sf 5 --scene_lr_mul 1000 --repeat 1 --flow_mul 1 --sf_mag_div 100 --time_dependent --gaps 1,2,4,6,8 --midas --use_disp --logdir './checkpoints/davis/sequence/' --suffix 'track_{track_id}_{loss_type}_wreg_{warm_reg}_acc_{acc_mul}_disp_{disp_mul}_flowmul_{flow_mul}_time_{time_dependent}_CNN_{use_cnn}_gap_{gaps}_Midas_{midas}_ud_{use_disp}' --test_template './experiments/davis/test_cmd.txt' --force_overwrite --track_id dog
  File "train.py", line 106
    str_warning, f'ignoring the gpu set up in opt: {opt.gpu}. Will use all gpus in each node.')
                                                                                             ^
SyntaxError: invalid syntax

Noticed that there is no folder named ".checkpoints"

Similar issue has been mentioned in issue #8 "SyntaxError: invalid syntax"

Specs:
Windows 10
Anaconda: conda 4.11.0
Python 3.7.10
GPU 12Gb Quadro M6000
All specified dependencies including RAFT are installed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.