Coder Social home page Coder Social logo

liuguoyou / 3dvideos2stereo Goto Github PK

View Code? Open in Web Editor NEW

This project forked from lasinger/3dvideos2stereo

0.0 2.0 0.0 315 KB

Code to extract stereo frame pairs from 3D videos, as used in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, arXiv:1907.01341"

License: MIT License

Shell 15.21% Python 84.79%

3dvideos2stereo's Introduction

3DVideos2Stereo

The provided scripts help to extract stereo data as described in our paper:

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer
René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, Vladlen Koltun

Frame Extraction

There exist multiple different formats to store stereo videos.

For our frame extraction scripts we expect videos to be stored as 1080p SBS (side-by-side) MKVs, i.e. the image resolution should be 3840x1080px (2x 1920x1080). Additionally we extract chapter information using ffmpeg:

ffmpeg -i ${video}.mkv 2>&1 | grep Chapter | grep start | awk '{print $4 $6}' >> ${outputFolder}chapter.txt

Script to extract left and right frames: run_extractFrames.sh

We extracted left and right frames (on full 24fps), centrally cropped to 1880x800 --> aspect ratio 2.35:1 (original input has varying aspect ratios and thus black bars on top/bottom and sometimes left/right due to the floating window effect).

In case a video is stored in MVC format, the script convertToSbs.sh can be used to convert it to SBS format.

Requirements:

Addtional requirements for MVC to SBS conversion:

Clip Extraction

To generate our 1 second clips sampled at 4fps for all training data according to our Supplementary (using shot detection but no disparity filtering) we used:

python genTraining_recurr.py --videoListPath 3DVideos/data/ --numRecurrent 24 --fpsRecurrent 24 --fpsSingle 4 --name training_set --blacklist testVid1,testVid2,valVid1,valVid2

For our validation set we used the following:

python genTraining_recurr.py --videoListPath 3DVideos/data/ --numRecurrent 24 --fpsRecurrent 24 --fpsSingle 1 --name validation_set --whitelist valVid1,valVid2

Data path and video names (for whitelist and blacklist) have to be adapted accordingly.

Sky Computation

Please use your favorite segmenation algorithm for sky segmentation. We used Mapillary's Inplace ABN (https://github.com/mapillary/inplace_abn) and adapted test_vistas_single_gpu.py. Sky should have ID 27, e.g. in get_pred_image you can do:

mask = (tensor==27)
img = Image.fromarray(mask.astype(np.uint8)*255, mode="L")

For faster processing we reduced the input image size from 2048 to 1024:

transformation = SegmentationTransform(
        1024,
        (0.41738699, 0.45732192, 0.46886091),
        (0.25685097, 0.26509955, 0.29067996),
    )

Flow Computation

Please compute the backward and forward flow fields with your favorite flow algorithm (at full resolution; i.e. 1880x800). We used PWC-Net-Plus (https://github.com/NVlabs/PWC-Net).

You can use the filelists "train.txt", "validation.txt", and "test.txt".

Please make sure that the resulting flow fields ("flow_backward" and "flow_forward") are in a similar folder structure as "image_left" and "image_right".

Disparity and Uncertainty Computation

The filelists "train.txt", "validation.txt", and "test.txt" are constructed in a way that only "good" flow fields are to be expected. Hence, you can create the disparity and uncertainty maps without a filtering of the flow fields as follows:

python get_disp_and_uncertainty.py

This script generates disparity and corresponding uncertainty maps and outputs them in the folders "disparity" and "uncertainty". Please note that those disparity and uncertainty maps are at half of the resolution (940x400). This is also the resolution that we use for testing.

If you need to enable an explicit flow filtering, you can use the option "--filter".

Data Reading

Read Disparity

disp = imageio.imread("disp.png")

offset = float(disp.meta["offset"])
scale = float(disp.meta["scale"])

disp = (offset + scale * disp).astype(np.float32)

Read Uncertainty

uncertainty = imageio.imread("uncertainty.png")
uncertainty = 0.1 * uncertainty

Citation

Please cite our paper if you use this code in your research:

@article{Ranftl2019,
	author    = {Ren\'{e} Ranftl and Katrin Lasinger and David Hafner and Konrad Schindler and Vladlen Koltun},
	title     = {Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer},
	journal   = {arXiv:1907.01341},
	year      = {2019},
}

License

MIT License

3dvideos2stereo's People

Contributors

lasinger avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.