PreCNet

This repository contains complete code and trained models related to PreCNet: Next Frame Video Prediction Based on Predictive Coding by Zdenek Straka, Tomas Svoboda and Matej Hoffmann. The content is sufficient to generate all results and figures from the paper.

PreCNet is a deep hierachical reccurent network for next frame video prediction which embodies predictive coding schema proposed by Rao and Ballard (Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects).

How to proceed

Install prerequisities
Clone this repository
Get datasets
Train or download a network
Get desired evaluation or figures

Prerequisities

Versions used during training/testing are shown in the parenthesis.

Python 3 (3.6.6)
Keras (2.2.4)
Tensorflow (1.13.1)
Hickle (3.4.5)
Numpy (1.15.0)
Matplotlib (3.1.2)
Pillow (6.2.1)
Six (1.11.0)

Datasets

The model was trained on (i) KITTI dataset, (ii) large subset of Berkeley DeepDrive dataset (BDD100K) with size 2M frames (bdd_large), (iii) small subset of BDD100K with size 41K frames (bdd_small). Evaluation of the network was performed on test part of Caltech Pedestrian Dataset.

Dataset location is set in {kitti/bdd_large/bdd_small}_settings.py.

Getting datasets

Please see the links bellow for information about the datasets and their terms of use.

KITTI Dataset (http://www.cvlibs.net/datasets/kitti/)

Run python3 process_kitti.py.

Caltech Pedestrian Dataset (http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/)

Perform:

Execute ./download_caltech_pedestrian_dataset.sh.
Download and install Piotr's Computer Vision Matlab Toolbox.
Run cal_ped_seq2imgs.m in Matlab.
Run python3 process_cal_ped_test.py.

BDD100K Dataset (https://bdd-data.berkeley.edu/)

As the dataset is very large, only (randomly) selected subsets were used for creating train and validation datasets. Therefore, it is necesarry to use sources files to get the exactly same datasets as were used during training.
Perform:

Execute ./download_bdd100k_selected.sh.
Run python3 process_selected_bdd100k_val.py for getting validation dataset.
Run python3 process_selected_bdd100k_train0-4999.py (python3 process_selected_bdd100k_train_40K.py) for getting large (small) subset of BDD100K as a training set -- 2M (41K) of frames.

Training of a network

The model can be train, depending on training dataset, by python3 kitti_train.py, python3 bdd_large_train.py or python3 bdd_small_train.py.

Already trained models, which were evaluated in the article, can be found in the folders model_data_{kitti/bdd_small/bdd_large}. These models will be overwritten by newly trained models if you run the training. You can prevent it, for instance, by changing their names.

Model location is set in {kitti/bdd_large/bdd_small}_settings.py.

Evaluation and figures

See comments in the code for choosing a model (trained on kitti/bdd_large/bdd_small). Results will be saved in the folder {kitti/bdd_large/bdd_small}_results (defined in {kitti/bdd_large/bdd_small}_settings.py).

Next frame video prediction evaluation

Run python3 caltech_pedestrian_evaluate.py for getting SSIM, PSNR, MSE values on Caltech Pedestrian Dataset (Tables 3, 4 in the article) and getting randomly selected predicted sequences.

Execute python3 caltech_pedest_plot_selected_seq.py for obtaining a selected sequence prediction (Fig. 5, 6 in the article).

Multiple frame video prediction evaluation

Run python3 caltech_pedest_evaluate_extrap.py for getting SSIM, PSNR, MSE values for multiple frame prediction on Caltech Pedestrian Dataset (Table 5 in the article) and obtaining randomly selected predicted sequences.

Execute python3 caltech_pedest_plot_selected_seq_extrap_fig.py for obtaining a selected sequence with multiple frame prediction.

Additional Notes

A size of input images has to be divisible by 2^(nb of layers - 1) because pooling operation halves size of its input in each layer and the sizes have to be integers in all layers.

Network states can be obtained by setting output mode to desire units and layer (e.g. output_mode = 'Etd1' for getting error units states in second layer after top down pass).

Acknowledgements

We would like to thank to the authors of PredNet for making their source code public which significantly accelerated the development of PreCNet.

pk1125 / precnet Goto Github PK

precnet's Introduction

PreCNet

How to proceed

Prerequisities

Datasets

Getting datasets

KITTI Dataset (http://www.cvlibs.net/datasets/kitti/)

Caltech Pedestrian Dataset (http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/)

BDD100K Dataset (https://bdd-data.berkeley.edu/)

Training of a network

Evaluation and figures

Next frame video prediction evaluation

Multiple frame video prediction evaluation

Additional Notes

Acknowledgements

precnet's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent