laura-wang / video-pace Goto Github PK

View Code? Open in Web Editor NEW

98.0 14.0 12.0 829 KB

code for our ECCV-2020 paper: Self-supervised Video Representation Learning by Pace Prediction

Python 100.00%

video-pace eccv pace-prediction self-supervised-learning

video-pace's Introduction

Hello, this is my website, please visit https://laura-wang.github.io/ for more information.

video-pace's People

Contributors

Stargazers

Watchers

Forkers

jianbojiao trendingtechnology endeavour10020 aroncao49 cv-ip xinyu-shi griffintaur automan000 yasar-rehman hongbo-sun avijit9 alanayy

video-pace's Issues

Contrast learning

Clean code! Looking forward to the code for contrast learning

The Evaluation Code

Hello Laura

Please could you post the evaluation code. There are details that are missing in the paper for example:

Lets say that the network takes a 16 frames clip as input then the ideal testing video length is 160 frames.

1- What is the exact sampling method when evaluating on the testing videos for action recognition?
2- If the testing video is too short, 24 frames, ( there is no way to sample unique 10 clips from it),
What is your method to overcome this case? For example, do you abandon short videos or do you pad them ?

3- If the testing video is longer than the required length, 300 frames, then what are the sampling indices of the 10 clips ?

Thanks

The comparative learning module is missing in the code

Hi, thank you very much for your excellent work.
I wonder if the contrastive learning module mentioned in the paper is ignored in the code? Why don’t I see the part related to comparative learning in the code?

What does self.rgb_lines = list(lines) * 10 mean?

Hi,

Thank you for your great work.

While I want to apply this method on my custom data, I found this line in ucf101.py.

Why do you multiply 10 here?

Thank you

Looking forward to your contrast code

Hi, thanks for your REALLY nice work and clean codes! Looking forward to your contrast learning code every day. 👍

How to organize the training data?

I have download these data:

and what I should do to run train.py?
please help me.

Image preprocessing

https://github.com/laura-wang/video-pace/blob/master/datasets/ucf101.py#L82

Why using ClipResize((128,171)) instead of (128, 128) in the preprocessing stage?

error: (-215:Assertion failed) !ssize.empty() in function 'cv::resize'

I got the error like this
cv2.error: OpenCV(4.4.0) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-6sxsq0tp\opencv\modules\imgproc\src\resize.cpp:3929: error: (-215:Assertion failed) !ssize.empty() in function 'cv::resize'
and I dont know how to address it, could you help me?

Can you release your code about contrast learning?

When reading your code, I found that there seems to be a lack of code for contrast learning！

Waiting for the contrastive learning code

About the epoch number

Hi, thank you for your work. I have a question about the epoch number in your paper.

While when pretraining on UCF101 dataset, as it only contains around 9k videos in the training
split, we set epoch size to be around 90k for temporal jittering following [1].

I found in [1], there are some descriptions which you might refer to:

For inference on the downstream tasks, we uniformly sample 10 clips per testing example and average their predictions to
make a video-level prediction.

It is strange that using self.rgb_lines = list(lines) * 10 in ucf101.py and mention the total epoch number is 18. And if video clips are sampled randomly in temporal axis, using 180 epochs will have the same effect.

Therefore, my question is why not just using 180 epochs to train and conduct temporal jittering during each sampling procedure? Then just use self.rgb_lines = list(lines) and set the epoch number to 180 would be more clear for code. There might be some reasons or tricks that I have not noticed. Thank you in advance.

Normalize

Hello, Thanks for your job!
there is no normalization in the data preprocessing, is it important in this task?

Request for pretrained model

Hello Laura,
Thanks for sharing your wondeful work!
I wonder if you can provide me with your pretrained model(best if just pretrained with prediction task)?
Please pardon my question if it is not proper from your perspective.

Best regards,
Sean

How to preprocess data?

I want to try this with a custom video dataset. How can I do it? I have the following questions.

What should be the maximum length of a single training video clip?
- Is it similar to the UCF101 dataset or we can use longer videos?
Should each video be saved as image frames?
What is the framerate?

Code verify

def loop_load_rgb(self, video_dir, start_frame, sample_rate, clip_len,
                      num_frames):

        video_clip = []
        idx = 0

        for i in range(clip_len):
            cur_img_path = os.path.join(
                video_dir,
                "frame" + "{:06}.jpg".format(start_frame + idx * sample_rate))

            img = cv2.imread(cur_img_path)
            video_clip.append(img)

            if (start_frame + (idx + 1) * sample_rate) > num_frames:
                start_frame = 1 <--
                idx = 0
            else:
                idx += 1

why is the blank frames starting from the begining? should the start_frame be commented out?

Sampling rate

Hello. Thanks for the great work.

I noticed that the maximum sampling rate for your implementation is 4 which seems to enable the faster sampling rates.

Can you provide how you designed the possible pace candidates? (Pace lists)
And if possible, with the updated implementations.

Thank you.

Cannot reproduce the supervised performance on UCF101

Thank you very much for your inspiring work. However, I encountered a problem when reproducing the performance. I followed your code to do the self-supervised learning. I got about 60-70% accuracy in pace prediction. However, when I freeze the Conv weights and only train the final FC layer for supervised learning, I just got 0.10 average accuracy on training. When training final FC, I used the same data augmentation method as self-supervised learning as your paper said. Could you please tell me more about the fine-tuning details?

Request for explaning

hi wang!
thanks for your wonderful work.
i saw your code, but i can't find contrastive loss part...
would you explain your code..? i can see only cls cross entropy loss.
thank you.