Coder Social home page Coder Social logo

urs-waldmann / improving-unsupervised-label-propagation Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 1.0 66.46 MB

Code for 'Improving Unsupervised Label Propagation for Pose Tracking and Video Object Segmentation' (GCPR 2022)

Home Page: https://urs-waldmann.github.io/improving-unsupervised-label-propagation/

License: MIT License

Python 100.00%
label-propagation pose-tracking unsupervised vos

improving-unsupervised-label-propagation's Introduction

Improving Unsupervised Label Propagation for Pose Tracking and Video Object Segmentation

This repository provides code for Improving Unsupervised Label Propagation for Pose Tracking and Video Object Segmentation (GCPR 2022).

Abstract

Label propagation is a challenging task in computer vision with many applications. One approach is to learn representations of visual correspondence. In this paper, we study recent works on label propagation based on correspondence, carefully evaluate the effect of various aspects of their implementation, and improve upon various details. Our pipeline assembled from these best practices outperforms the previous state of the art in terms of PCK_0.1 on the JHMDB dataset by 6.5%. We also propose a novel joint framework for tracking and keypoint propagation, which in contrast to the base pipeline is applicable to tracking small objects and obtains results that substantially exceed the performance of the core pipeline. Finally, for VOS, we extend our pipeline to a fully unsupervised one by initializing the first frame with the self-attention layer from DINO. Our pipeline for VOS runs online and can handle static objects. It outperforms unsupervised frameworks with these characteristics.

If you find a bug, have a question or know how to improve the code, please open an issue.

Setup

The main sources are located in src/.

Python requirements

The PyTorch builds with CUDA aren't available in PyPi and need to be installed manually first:

pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

Then, install the requirements listed in ./requirements.txt, e.g.

python -m pip install -r requirements.txt

If you don't have a CUDA-ready device installed use the ./requirements-cpu.txt. The code should fall back to a supported implementation of necessary algorithms.

Setting the python path

All further instructions assume, that the PYTHONPATH variable was set to point to the ./src directory (e.g. in bash export PYTHONPATH="./src" or in Powershell $env:PYTHONPATH='./src').

Datasets

The code is organized, such that paths are configured in ./config.py. This file dynamically configures the path prefix used by default. This allows to easily switch between different machines or operating systems. It is easiest, to set the WORKSPACE_PREFIX variable to a directory where all processing should happen. Alternatively, you can also set some paths manually to fit your needs.

The most important paths are:

<prefix>/                           # Workspace directory
         datasets/                  # Base for datasets
                  DAVIS/DAVIS2017/  # DAVIS2017 data
                  JHMDB/            # JHMDB data
                  SegTrackV2        # SegTrackV2 data
...                                 # More datasets
         results/                   # Location for results
         checkpoints/               # Albeit not used much. Models are cached in the usual PyTorch Hub caching location.

Preparing the datasets

DAVIS

Download: https://data.vision.ee.ethz.ch/csergi/share/davis/DAVIS-2017-trainval-480p.zip and extract the contents to the configured dataset root, e.g. <prefix>/datasets/DAVIS/DAVIS2017/ by default. This file contains everything necessary for DAVIS2016 and DAVIS2017.

SegTrackV2

Download and extract https://web.engr.oregonstate.edu/~lif/SegTrack2/SegTrackv2.zip

JHMDB

Run the following tool to download and extract the necessary archives automatically.

python -m dataset.preparation.prepare_JHMDB --output <output-path> --extract

Add the --download-all to download not only the necessary files but also archives for body part masks etc.

Usage

The code is built, such that most things can be done by simply creating a configuration and then running:

python -m tools.evaluate_config --config "<path-to-config>" --output-dir "<output-dir>"

The config path can be prefixed with $config/ to load configurations from the ./share/config/ directory.

The most important configurations are located in the $config/label_propagation/_reported_configs/ directory.

Creating configurations

The configuration schema is defined in ./share/config/schema/config-schema.json. Configurations are checked against this schema prior to instantiation. The general structure of the configuration has four blocks, one for the feature extractor, one for the dataset, one for the label propagation and one for the evaluation mode. An example of such a configuration is shown below.

{
  "feat_ext": {
    "name": "vit", "variant": "base", "patch_size": 8,
    "scale_size": 480
  },
  "dataset": {
    "name": "segmentation_dataset",
    "data": {
      "name": "davis", "year": "2017", "split": "val",
      "mode": "multi-object"
    },
    "codec": {
      "channel_normalization": "minmax",
      "interpolation_mode": "bicubic"
    }
  },
  "label_propagation": {
    "name": "affinity",
    "implementation": "local",
    "feature_normalization": true,
    "affinity_topk": 5,
    "topk_implementation": "full",
    "affinity_norm": "dino",
    "neighborhood_size": 12,
    "label_normalization": "none"
  },
  "evaluator": {
    "num_context": 7,
    "recreate_labels": false,
    "label_initializer": "ground_truth"
  }
}

This configuration defines multi-object O-VOS inference on DAVIS2017.

{
  "feat_ext": {"name": "vit", "variant": "base", "patch_size": 8, "scale_size": 480},
  ...
}

This block defines the feature extractor as a vision transformer (DeIT) of size base and patch size 8, i.e. DeIT-B/8. The scale size indicates, how the input images should be resized. 480 implies, that the shorter side is rescaled to 480 pixels. If no weight source is configured, as is the case here, weights trained with DINO are loaded, if they are available.

{
  ...
  "dataset": {
    "name": "segmentation_dataset",
    "data": {
      "name": "davis", "year": "2017", "split": "val",
      "mode": "multi-object"
    },
    "codec": {
      "channel_normalization": "minmax",
      "interpolation_mode": "bicubic"
    }
  },
  ...
}

This block defines that we want to use the validation split of the DAVIS2017 dataset. multi-object indicates, that the masks are loaded as multi-object masks, i.e. indexed instead of binary. The codec object defines properties of the mask to label translation, for example the scaling mode to adapt between label and mask size.

The "label_propagation" object that follows, defines key parameters of the label propagation implementation. Here, it is advised to have a look at the available configuration options and the corresponding implementation.

Finally, "evaluator": {"num_context": 7, "recreate_labels": false, "label_initializer": "ground_truth"} defines our inference. num_context is the number of context frames used during propagation. recreate_labels defines, whether the labels should be recreated after every frame, i.e. performing a decoding and encoding step. Finally, the label_initializer option defines if we are using O-VOS inference or Z-VOS by choosing an implementation for the selection of the initial mask.

Evaluating the results

Evaluating JHMDB

# JHMDB-test1: Evaluate
python -m dataset.evaluation.evaluate_JHMDB <result-dir> --compute-coverage

You can evaluate multiple runs at once by setting the --eval-all flag and passing a directory that contains a subdir for each of the runs.

To evaluate a different variant of JHMDB that is not test split 1 add the --dataset-split-num <num> and --dataset-split-name <split-name> arguments.

For more options see python -m dataset.evaluation.evaluate_JHMDB --help

Evaluating segmentation datasets

# DAVIS2017val: Evaluate multi-object
python -m dataset.evaluation.evaluate_segmentation_dataset \
    --dataset DAVIS2017val \
    --mode multi-object \
    --input-dir <result-dir>
    

Set --dataset and --mode accordingly. Again, adding --eval-all enables multi-run evaluation. The result table can also be sorted by one of the metrics by adding --sort-key <metric>, e.g. --sort-key J_and_F

Open source libraries

This code makes use of many open-source tools. The license files are placed in ./share/licenses.

Name License Repository
UDT MIT https://github.com/594422814/UDT_pytorch
KCF,DSST MIT https://github.com/fengyang95/pyCFTrackers
STC MIT https://github.com/ajabri/videowalk
SWIN MIT https://github.com/microsoft/Swin-Transformer
fhog + ColorTable BSD https://github.com/pdollar/toolbox
DAVIS2016 eval BSD https://github.com/fperazzi/davis
DINO Apache 2.0 https://github.com/facebookresearch/dino
TIMM Apache 2.0 https://github.com/rwightman/pytorch-image-models
TimeCycle None https://github.com/xiaolonw/TimeCycle
UVC None https://github.com/Liusifei/UVC

Cite us

@inproceedings{waldmann2022improving,
  title={Improving Unsupervised Label Propagation for Pose Tracking and Video Object Segmentation},
  author={Waldmann, Urs and Bamberger, Jannik and Johannsen, Ole and Deussen, Oliver and Goldl\"{u}cke, Bastian},
  booktitle={DAGM German Conference on Pattern Recognition},
  year={2022},
  pages={230--245}
  }

improving-unsupervised-label-propagation's People

Contributors

urs-waldmann avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

jackzhousz

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.