Coder Social home page Coder Social logo

video-pace's Introduction

video-pace's People

Contributors

jianbojiao avatar laura-wang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

video-pace's Issues

The Evaluation Code

Hello Laura

Please could you post the evaluation code. There are details that are missing in the paper for example:

Lets say that the network takes a 16 frames clip as input then the ideal testing video length is 160 frames.

1- What is the exact sampling method when evaluating on the testing videos for action recognition?
2- If the testing video is too short, 24 frames, ( there is no way to sample unique 10 clips from it),
What is your method to overcome this case? For example, do you abandon short videos or do you pad them ?

3- If the testing video is longer than the required length, 300 frames, then what are the sampling indices of the 10 clips ?

Thanks

error: (-215:Assertion failed) !ssize.empty() in function 'cv::resize'

I got the error like this
cv2.error: OpenCV(4.4.0) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-6sxsq0tp\opencv\modules\imgproc\src\resize.cpp:3929: error: (-215:Assertion failed) !ssize.empty() in function 'cv::resize'
and I dont know how to address it, could you help me?

About the epoch number

Hi, thank you for your work. I have a question about the epoch number in your paper.

While when pretraining on UCF101 dataset, as it only contains around 9k videos in the training
split, we set epoch size to be around 90k for temporal jittering following [1].

I found in [1], there are some descriptions which you might refer to:

For inference on the downstream tasks, we uniformly sample 10 clips per testing example and average their predictions to
make a video-level prediction.

It is strange that using self.rgb_lines = list(lines) * 10 in ucf101.py and mention the total epoch number is 18. And if video clips are sampled randomly in temporal axis, using 180 epochs will have the same effect.

Therefore, my question is why not just using 180 epochs to train and conduct temporal jittering during each sampling procedure? Then just use self.rgb_lines = list(lines) and set the epoch number to 180 would be more clear for code. There might be some reasons or tricks that I have not noticed. Thank you in advance.

Normalize

Hello, Thanks for your job!
there is no normalization in the data preprocessing, is it important in this task?

Request for pretrained model

Hello Laura,
Thanks for sharing your wondeful work!
I wonder if you can provide me with your pretrained model(best if just pretrained with prediction task)?
Please pardon my question if it is not proper from your perspective.

Best regards,
Sean

How to preprocess data?

I want to try this with a custom video dataset. How can I do it? I have the following questions.

  1. What should be the maximum length of a single training video clip?
    - Is it similar to the UCF101 dataset or we can use longer videos?
  2. Should each video be saved as image frames?
  3. What is the framerate?

Code verify

def loop_load_rgb(self, video_dir, start_frame, sample_rate, clip_len,
                      num_frames):

        video_clip = []
        idx = 0

        for i in range(clip_len):
            cur_img_path = os.path.join(
                video_dir,
                "frame" + "{:06}.jpg".format(start_frame + idx * sample_rate))

            img = cv2.imread(cur_img_path)
            video_clip.append(img)

            if (start_frame + (idx + 1) * sample_rate) > num_frames:
                start_frame = 1 <--
                idx = 0
            else:
                idx += 1

why is the blank frames starting from the begining? should the start_frame be commented out?

Sampling rate

Hello. Thanks for the great work.

I noticed that the maximum sampling rate for your implementation is 4 which seems to enable the faster sampling rates.

Can you provide how you designed the possible pace candidates? (Pace lists)
And if possible, with the updated implementations.

Thank you.

Cannot reproduce the supervised performance on UCF101

Thank you very much for your inspiring work. However, I encountered a problem when reproducing the performance. I followed your code to do the self-supervised learning. I got about 60-70% accuracy in pace prediction. However, when I freeze the Conv weights and only train the final FC layer for supervised learning, I just got 0.10 average accuracy on training. When training final FC, I used the same data augmentation method as self-supervised learning as your paper said. Could you please tell me more about the fine-tuning details?

Request for explaning

hi wang!
thanks for your wonderful work.
i saw your code, but i can't find contrastive loss part...
would you explain your code..? i can see only cls cross entropy loss.
thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.