Coder Social home page Coder Social logo

amefu-net's Introduction

1 AMeFu-Net

Repository for the paper :

Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition

[paper]

[News! presentation video uploaded in 2021/10/06 in Bilibili.]

[News! presentation video uploaded in 2021/10/10 in Youtube.]

If you have any questions, feel free to contact me. My email is [email protected].

2 setup and datasets

A new virtual environment is suggested.

Before running our code, you are suggested to set up the virtual environment of Monodepth2. We use the Monodepth2 model to extract our depth frames in our paper.

2.1 setup monodepth2

Following the monodepth2 repo to finish the Setup and Prediction for a single image steps.

2.2 preparing the datasets

Totally three datasets are used in our experiments, including Kinetics, UCF101, and HMDB51.

We suppose that you have downloaded the datasets by yourself. Sorry for that we could not provide the original video datasets.

There are mainly three steps to processing our dataset. We also provide the scripts for your convenience.

  1. extracting the video frames. sources/video_jpg.py
  2. calculate the number of frames per video. sources/n_frames.py
  3. extracting the depth frames for each video. sources/generate_depth.py

Take the Kinetics dataset as an example, our final datasets are organized as follows:

miniKinetics_frames
├── class1
│   ├── video1
│   │   ├── image_00001.jpg
│   │   ├── image_00002.jpg
│   │   ├── ...
│   │   ├── monodepth
│   │   │   ├── image_00001_disp.jpeg
│   │   │   └── image_00002_disp.jpeg
│   │   │   └── ...
│   │   └── n_frames
│   └── video2
│       ├── image_00001.jpg
│       ├── image_00002.jpg
│       ├── ...
│       ├── monodepth
│       │   ├── image_00001_disp.jpeg
│       │   └── image_00002_disp.jpeg
│       │   └── ...
│       └── n_frames
└── class2
└── ...

3 network testing

We provide the pretrained checkpoints for submodels and full models.

full model:

submodels:

usage for Kinetics dataset:

CUDA_VISIBLE_DEVICES=0 python network_test.py --dataset kinetics --ckp ./result/model_full_kinetics.pkl --k_shot 1

usage for UCF101 dataset:

CUDA_VISIBLE_DEVICES=0 python network_test.py --dataset ucf --ckp ./result/model_full_ucf.pkl --k_shot 1

usage for HMDB51 dataset:

CUDA_VISIBLE_DEVICES=0 python network_test.py --dataset hmdb --ckp ./result/model_full_hmdb.pkl --k_shot 1

Due to the randomness of the few-shot learning, the result may vary.

4 network training

Take Kinetics as an example:

If ''out of the memory" occurred, then two cards are required.

CUDA_VISIBLE_DEVICES=0,1 python network_train_meta_learning.py --dataset kinetics --exp_name test --pre_model_rgb ./result/kinetics_rgb_submodel.pkl --pre_model_depth ./result/kinetics_depth_submodel.pkl

You can use the submodel provided by us.

If you want to train the submodels by yourself, we refer you to our previous work emboded few-shot learning.

5 Citing

Please cite our paper if you find this code useful for your research.

@inproceedings{fu2020depth,
  title={Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition},
  author={Fu, Yuqian and Zhang, Li and Wang, Junke and Fu, Yanwei and Jiang, Yu-Gang},
  booktitle={Proceedings of the 28th ACM International Conference on Multimedia},
  pages={1142--1151},
  year={2020}
}

amefu-net's People

Contributors

lovelyqian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

amefu-net's Issues

Some questions about the implementations of DGAdaIN

Hello! I recently read the paper and the code, and I am a bit confusing about the implementation of DGAdaIN ( in model_DGAdaIn.py).

According to the original paper, instance normalization is done along the temporal dimension (termed as L) of RGB features. However, in your codes, instance normalization is done along the last dimention of RGB features. I check the document that says nn.InstanceNorm1d will normalize along the last dimenstion of the input tensor.

I am not sure which one (normalization along the temporal dimenstion or along the activations) is better and how they will affect the final results. Hoping for your help~

episode_novel_dataloader中代码传入参数问题

关于episode_novel_dataloader中的这一行
video, video_depth = get_video_fusion_from_video_info_rgb_depth_object_multi_depth(video_info..)
这个函数episode_novel_dataloader,输入的是class_name
而utils.py文件中定义的
get_video_fusion_from_video_info_rgb_depth_object_multi_depth
video_info接受的输入应该是”class_name/video_name",请问这个地方怎么读取到照片的
请问dataload中如何完成对视频样本的采样的
我附上我的hdbm的数据集的目录结构,您看看这个数据读取是不是写的有问题。
image

关于episode_novel_dataloader的问题

关于episode_novel_dataloader中的这一行
video, video_depth = get_video_fusion_from_video_info_rgb_depth_object_multi_depth(video_info..)
这个函数episode_novel_dataloader,输入的是class_name
而utils.py文件中定义的
get_video_fusion_from_video_info_rgb_depth_object_multi_depth
video_info接受的输入应该是”class_name/video_name"
请问dataload中如何完成对视频样本的采样的

Parameter not defined

Excuse me! I have been reading this paper of yours recently, and I am very interested in your experimental content.But I have a question.
The get_video_from_video_info in utils.py file is not defined. What should it be?
Hope to get your answer!Thanks!

数据集形式

作者,您好,请问这篇文章输入到网络中的数据是什么形式的呢

The training of submodels

I want to know when will you release the model and the submodels rgb and depth?
Also, you mentioned we can train the submodels by ourselves through the previous work "emboded few-shot learning". But how can we get both pre_model_rgb and pre_model_depth?

HDMB51 results

Hello, I use the pretrained_weights you provided for the HDMB51 dataset, and under your guidance finish the construction of the dataset. But when I test the results on HDMB51, I can't obtain the same results described in the paper.
For 1-shot, the acc is 48% and 74.7% for 5-shot.

Thanks a lot.~

txt.文件

作者,您好,我想问一下network_test.py文件中的test.txt,是需要自己根据splits中的标签来进行划分的用来测试的数据集吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.