Coder Social home page Coder Social logo

cycda's Introduction

CycDA: Unsupervised Cycle Domain Adaptation to Learn from Image to Video (ECCV 2022)

Official implementation of CycDA [arXiv]
Author HomePage

Abstract

Although action recognition has achieved impressive results over recent years, both collection and annotation of video training data are still time-consuming and cost intensive. Therefore, image-to-video adaptation has been proposed to exploit labeling-free web image source for adapting on unlabeled target videos. This poses two major challenges: (1) spatial domain shift between web images and video frames; (2) modality gap between image and video data. To address these challenges, we propose Cycle Domain Adaptation (CycDA), a cycle-based approach for unsupervised image-to-video domain adaptation. We leverage the joint spatial information in images and videos on the one hand and, on the other hand, train an independent spatio-temporal model to bridge the modality gap. We alternate between the spatial and spatio-temporal learning with knowledge transfer between the two in each cycle. We evaluate our approach on benchmark datasets for image-to-video as well as for mixed-source domain adaptation achieving state-of-the-art results and demonstrating the benefits of our cyclic adaptation.

Requirements

  • Our experiments run on Python 3.6 and PyTorch 1.7. Other versions should work but are not tested.
  • All dependencies can be installed using pip:
    python -m pip install -r requirements.txt

Data Preparation

  • Image Datasets

  • Video Datasets

  • Other required data for BU101→UCF101 can be downloaded here
    UCF_mapping.txt: mapping file of class_id action_name
    list_BU2UCF_img_new.txt: list of image files (resized) img_path label
    list_ucf_all_train_split1.txt: list of training videos video_path label
    list_ucf_all_val_split1.txt: list of test videos video_path label
    After extracting frames of videos
    list_frames_train_split1.txt: list of frames of training videos frame_path label
    list_frames_val_split1.txt: list of frames of test videos frame_path label

    ucf_all_vid_info.npy: dictionary of {videoname: (n_frames, label)}
    InceptionI3d_pretrained/rgb_imagenet.pt: I3D Inception v1 pretrained on the Kinetics dataset

  • Data structure

    DATA_PATH/
      UCF-HMDB_all/
        UCF/
          CLASS_01/
            VIDEO_0001.avi
            VIDEO_0002.avi
            ...
          CLASS_02/
          ...
    
        ucf_all_imgs/
          all/
            CLASS_01/
                VIDEO_0001/
                  img_00001.jpg
                  img_00002.jpg
                  ...
                VIDEO_0002/
              ...
       BU101/
        img_00001.jpg
        img_00002.jpg
        ...
    

Usage

Here we release the codes for training in separate stages on BU101→UCF101. Scripts for a complete cycle are still in construction.

  • stage 1 - Class-agnostic spatial alignment train the image model, output frame-level pseudo labels
    python bu2ucf_train_stage1.py
    
  • stage 3 - Class-aware spatial alignment train the image model with video-level pseudo labels from stage 2, output frame-level pseudo labels
    python bu2ucf_train_stage3.py --target_train_vid_ps_label ps_from_stage2.npy
    
  • stage 2 & 4 - Spatio-temporal learning
    • video model training: train the video model with frame-level pseudo labels from the image model
      • specify in the config file:
        • data_dir: main data directory
        • pseudo_gt_dict: frame-level pseudo labels from image model training
        • pretrained_model_path: path of pretrained model
        • work_main_dir: directory of training results and logs
        • ps_thresh: confidence threshold p of frame-level thresholding
      • specify in mmaction/utils/update_config.py: path of target_train_vid_info
      • specify the --gpu-ids used for training
      ./tools/dist_train_da.sh configs/recognition/i3d/ucf101/i3d_incep_da_kinetics400_rgb_video_1x64_strid1_test3clip_128d_w_ps_img0.8_targetonly_split1.py 4 --gpu-ids 0 1 2 3 --validate
      
    • pseudo label computation: compute video-level pseudo labels
      model_path=your_model_path
      ps_name=epoch_20_test1clip
      python tools/test_da.py configs/recognition/i3d/ucf101/i3d_incep_da_kinetics400_rgb_video_1x64_strid1_test1clip_128d_split2_compute_ps.py $model_path --out $model_dir/$ps_name.json --eval top_k_accuracy
      

Citation

Thanks for citing our paper:

@inproceedings{lin2022cycda,
  title={CycDA: Unsupervised Cycle Domain Adaptation to Learn from Image to Video},
  author={Lin, Wei and Kukleva, Anna and Sun, Kunyang and Possegger, Horst and Kuehne, Hilde and Bischof, Horst},
  booktitle={Computer Vision--ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part III},
  pages={698--715},
  year={2022},
  organization={Springer}
}

Acknowledgements

Some codes are adapted from mmaction2 and pytorch-i3d

cycda's People

Contributors

wlin-at avatar

Stargazers

 avatar Lingdong Kong avatar  avatar Subhankar Roy avatar  avatar  avatar

Watchers

 avatar

Forkers

gzaraunitn

cycda's Issues

Welcome update to OpenMMLab 2.0

Welcome update to OpenMMLab 2.0

I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.

Here are the OpenMMLab 2.0 repos branches:

OpenMMLab 1.0 branch OpenMMLab 2.0 branch
MMEngine 0.x
MMCV 1.x 2.x
MMDetection 0.x 、1.x、2.x 3.x
MMAction2 0.x 1.x
MMClassification 0.x 1.x
MMSegmentation 0.x 1.x
MMDetection3D 0.x 1.x
MMEditing 0.x 1.x
MMPose 0.x 1.x
MMDeploy 0.x 1.x
MMTracking 0.x 1.x
MMOCR 0.x 1.x
MMRazor 0.x 1.x
MMSelfSup 0.x 1.x
MMRotate 1.x 1.x
MMYOLO 0.x

Attention: please create a new virtual environment for OpenMMLab 2.0.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.