Coder Social home page Coder Social logo

pilhyeon / wtal-uncertainty-modeling Goto Github PK

View Code? Open in Web Editor NEW
123.0 123.0 9.0 176 KB

Official Pytorch Implementation of 'Weakly-supervised Temporal Action Localization by Uncertainty Modeling' (AAAI-21)

License: MIT License

Python 99.06% Shell 0.94%
background-modeling deep-learning pytorch temporal-action-localization uncertainty weakly-supervised-learning

wtal-uncertainty-modeling's Introduction

👋 Hi, I'm Pilhyeon Lee (이필현).

I am currently working as a postdoctoral researcher at Yonsei University. In the past, I collaborated as a visiting researcher with the video understanding team at CLOVA AI research, Naver Corporation. Also, I was fortunate to work as a research intern at Microsoft Research Asia. I received a Ph.D. degree from Yonsei University and a B.S. degree from Chung-Ang University. My research interests include computer vision, video understanding, multimodal learning, and weakly-supervised learning.

📓 Publications (selected)

The full list can be found here.

💻 Languages

  • C/C++
  • Python
  • OpenCV
  • Tensorflow
  • Pytorch

Pilhyeon's GitHub stats

wtal-uncertainty-modeling's People

Contributors

dependabot[bot] avatar pilhyeon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

wtal-uncertainty-modeling's Issues

Dataloader of ActivityNet 1.3

Thank you for your excellent work.
Could you provide the PyTorch dataset file (like thumos_features.py)?
I am doing the experiment on a reorganized ActivityNet 1.3 dataset. Therefore, I hope to get more details for a fair comparison.
Think you!

If possible, you can also send it to my email: [email protected]

the newest result

Hello guys.Thank you for your excellent work. I read the paper you updated and I found that the result is better than before.Can you share the codes that match the best result?

I am waiting for your reply.

Reproducing the results on ActivityNet

Hi @Pilhyeon, thanks for your great work! Now I'm following your BMUE and have some trouble in reproducing the results on ActivityNet dataset.

I have tried to do some experiments on ActivityNet v1.2. I downloaded the I3D features provided by this link and adapt them to BMUE format. The following are some of my results. All experiments run for 6k epochs, the results are showed in the form of "(average_mAP, Test_acc)":

  • [no params changed] (0.0390, 0.43)

According to Sec 4.1 in your arXiv paper, T=50, so I set num_segments as 50 and do the following experiments:

  • [num_segments: 50] (0.0275, 0.31)
    • [class_th: 0.1] (0.0332, 0.36)
    • [class_th: 0.1, act_thresh_cas: np.arange(0.0, 0.15, 0.015)] (0.0400, 0.36)

Besides, I also tried to change "act_thresh_magnitudes", "NMS thresh", "alpha", "_lambda & gamma in get_proposal_oic()", etc. The results don't seem to get better: the test accuracy is around 0.4 and the average_mAP is very low. It's hard for me to find the best settings. Could you please share your params settings on ActivityNet 1.2 & 1.3 datasets? Or give me some advice on which params to change?

Looking forward to your reply and I'd be glad to cite your excellent work. Thanks!

GPU utilization is low

Hi @Pilhyeon
Thank you for your excellent work! I clone you code and run it,but I found that the GPU utilization of the program is very low(only 6%).Is this a normal phenomenon?My GPU is NVIDIA TITAN XP12GB.

ActivityNet 1.3 Features and model

Hi, Excellent work !

You mentioned that you will make the ActivityNet Features and model public. Can you do so now ? I t is very difficult to reproduce the reported results as someone already posted in this repository without the desired settings ( i tried your suggested setting for AN but not as reported results )

My email is : [email protected] ! You can you send me the link here if you wish

Cannot get provided feature.

Hi @Pilhyeon:

thanks for your contribution, I cannot download the features provided by you, when I open the google drive link, there is all the files:

image

I am not sure this is the features that used in your repo, since I do not know how to use these files in the google drive link. Could you please check this or explain it?

Results of your provided pre-trained model

Thanks for your great work! But the pre-trained model you provide cannot achieve the results in the paper. I saw the same reproducted results in a closed issue, could you please check whether you didn't upload the latest pre-trained model or maybe some other mistakes? Thanks~
Step: 0
Test_acc: 0.8905
average_mAP: 0.4038
[email protected]: 0.6551
[email protected]: 0.5836
[email protected]: 0.5052
[email protected]: 0.4158
[email protected]: 0.3245
[email protected]: 0.2266
[email protected]: 0.1161

Can anyone reproduce the results in the paper?

I have ran the code many times(about 100 times), same environment with requirements.txt, even changed 3 different machines, but the best result I could get is about to 40, much worse than 41.8.

I tried to change random seed but keep all the hyper-parameters, but it did not work, I am sure that I used the same environment and latest code.

I hope the author could reply this question since there are also other people cannot reproduce the result #10 #5 #11

Their issues are closed by the author, and no responds to their questions, I think the author should check code carefully again, especially the hyperparameters.

If anyone else could achieve mAP 41.8, please tell me, I do not hope that author close this issue, since the problem is not solved, I also hope the author @Pilhyeon could run this code again in a different machine and then public the result and code, or provide the training log if you could prove that.

About baseline performance

Why mAP of baseline in BasNet (Table2: IoU0.5: 12.0) and this paper (Figure 3: IoU0.5: 22.6 ) is so different?
What is the main differrence in implementation?
image
image

question about feat_magnitudes

Good work! And when I'm reading the paper, I get a question: how is the feature magnitudes defined in Figure 2 of the paper? (X-axis) Is that the normalized video feature(shape: [B,T,F]) or something? I am confused about the x-axis of Figure 2 and the histogram plotting. Thanks!

Some questions in your paper

Hi @Pilhyeon

Thanks for your contribution, I tried again and could reproduce your result! It is really an amazing work!

I read your paper carefully, but there are still some details I cannot understand, could you please answer me if you have time?

  1. Could you please explain the figure 2 in your paper? What does the Y-axis Density mean?
  2. Can I understand the original features are obtained from embedded feature only use main pipeline as the whole model while separated features use both main pipeline and Uncertainty modeling as final model?
  3. Does the softmax score used in table 3 of ablation study means only use main pipeline in figure3 to obtain result?
  4. If i understand correctly, the softmax score is obtained by the original features, which means they are not separated, have unconstrained magnitudes, so is the description in the figure below wrong? It should be For the **first**, as the original......
    image
  5. Could you please provide your extracted features and pretrained models for ActivityNet 1.2 and ActivityNet 1.3?

Thanks again for your contribution and patience, hope you can reply to me!

Question about Figure 2 in paper.

I have a doubt that how to determine which frames are background frames and which frames are action frames when drawn the histogram?

> Can't reproduce the result?

Hello,
In fact, the performance is improved by hyper-parameter tuning without any model change.
Specifically, alpha: 0.0002 -> 0.0005, r_act: 8 -> 9, r_bkg: 6 -> 4
You can also find it in the options.py.
In addition, I updated the best model file, with which you can see the improved result.
Thanks!

Thanks for your reply.I test the best model that you have updated. I have changed the parameter as you said.But I can't reproduce the result.The result that I test is as follows:
Step: 0
Test_acc: 0.8905
average_mAP: 0.4038
[email protected]: 0.6551
[email protected]: 0.5836
[email protected]: 0.5052
[email protected]: 0.4158
[email protected]: 0.3245
[email protected]: 0.2266
[email protected]: 0.1161
Do you know the reason?

Originally posted by @xumh-9 in #9 (comment)

Real-time inference?

Hi, I read your paper and congrats for your work.
Anyway, there is the inference part that is unclear to me: at inference time is it possible to use this framework for the online action detection task? (i.e. let's suppose I have an input stream video, is the model able to predict frame-level labels as the frames arrive, with real-time speed ?)

Thank you!

Some questions about features and codes

Hi, I am reproducing your work, and following the hyperparameters as the paper does.
I use the feature extractor in the repo you recommend, and select 16 frames as a segment, choose the output of Logits Layer with the averaging pooling layer, so i get 1024-d vector as feature. But i can not reproduce the results in your paper.

Also, i find a difference between the code and the paper that when calculating the loss_act in the BMUE loss, you use abs function instead of the max function compared with 0.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.