pilhyeon / wtal-uncertainty-modeling Goto Github PK

Official Pytorch Implementation of 'Weakly-supervised Temporal Action Localization by Uncertainty Modeling' (AAAI-21)

License: MIT License

Python 99.06% Shell 0.94%

background-modeling deep-learning pytorch temporal-action-localization uncertainty weakly-supervised-learning

wtal-uncertainty-modeling's Introduction

👋 Hi, I'm Pilhyeon Lee (이필현).

I am currently working as a postdoctoral researcher at Yonsei University. In the past, I collaborated as a visiting researcher with the video understanding team at CLOVA AI research, Naver Corporation. Also, I was fortunate to work as a research intern at Microsoft Research Asia. I received a Ph.D. degree from Yonsei University and a B.S. degree from Chung-Ang University. My research interests include computer vision, video understanding, multimodal learning, and weakly-supervised learning.

📓 Publications (selected)

P Lee† and H Byun "BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos", arXiv, 2023.
K Hong, S Jeon, J Lee, N Ahn, K Kim, P Lee, D Kim, Y Uh, and H Byun "AesPA-Net: Aesthetic Pattern-Aware Style Transfer Networks", ICCV, 2023.
S Jeon, B Liu, P Lee, K Hong, J Fu, and H Byun "Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations", ICCV, 2023.
P Lee, T Kim, M Shim, D Wee, and H Byun "Decomposed Cross-modal Distillation for RGB-based Temporal Action Detection", CVPR, 2023.
S Park, J Lee, P Lee, S Hwang, D Kim, and H Byun "Fair Contrastive Learning for Facial Attribute Classification", CVPR, 2022.
P Lee and H Byun. "Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization", ICCV, 2021. (Oral)
S Jeon, K Hong, P Lee, J Lee, and H Byun. "Feature Stylization and Domain-aware Contrastive Learning for Domain Generalization", ACM MM, 2021. (Oral)
P Lee, J Wang, Y Lu, and H Byun. "Weakly-supervsied Action Localization by Uncertainty Modeling", AAAI, 2021.
P Lee, Y Uh, and H Byun. "Background Suppression Network for Weakly-supervised Temporal Action Localization", AAAI, 2020. (Spotlight)

The full list can be found here.

💻 Languages

C/C++
Python
OpenCV
Tensorflow
Pytorch

wtal-uncertainty-modeling's People

Contributors

Stargazers

Watchers

Forkers

jone1222 cv-ip ammieqi littleboy7 thameemtraining mymuli yangsenwxy thithaotran semooze

wtal-uncertainty-modeling's Issues

Dataloader of ActivityNet 1.3

Thank you for your excellent work.
Could you provide the PyTorch dataset file (like thumos_features.py)?
I am doing the experiment on a reorganized ActivityNet 1.3 dataset. Therefore, I hope to get more details for a fair comparison.
Think you!

If possible, you can also send it to my email: [email protected]

the newest result

Hello guys.Thank you for your excellent work. I read the paper you updated and I found that the result is better than before.Can you share the codes that match the best result?

I am waiting for your reply.

Reproducing the results on ActivityNet

Hi @Pilhyeon, thanks for your great work! Now I'm following your BMUE and have some trouble in reproducing the results on ActivityNet dataset.

I have tried to do some experiments on ActivityNet v1.2. I downloaded the I3D features provided by this link and adapt them to BMUE format. The following are some of my results. All experiments run for 6k epochs, the results are showed in the form of "(average_mAP, Test_acc)":

[no params changed] (0.0390, 0.43)

According to Sec 4.1 in your arXiv paper, T=50, so I set num_segments as 50 and do the following experiments:

[num_segments: 50] (0.0275, 0.31)
- [class_th: 0.1] (0.0332, 0.36)
- [class_th: 0.1, act_thresh_cas: np.arange(0.0, 0.15, 0.015)] (0.0400, 0.36)

Besides, I also tried to change "act_thresh_magnitudes", "NMS thresh", "alpha", "_lambda & gamma in get_proposal_oic()", etc. The results don't seem to get better: the test accuracy is around 0.4 and the average_mAP is very low. It's hard for me to find the best settings. Could you please share your params settings on ActivityNet 1.2 & 1.3 datasets? Or give me some advice on which params to change?

Looking forward to your reply and I'd be glad to cite your excellent work. Thanks!

Can't reproduce the experimental results

Thanks for your code. I use the code you provide and download the thumos14's feature, but I get the results following, like mAP is 1% less than the results in your paper:

Experiment	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	average map
Paper-I3D	46.9	39.2	30.7	20.8	12.5	30.0
Reproduce-I3D	46.0	38.4	29.7	20.5	12.1	29.3

Could there be any problems?

GPU utilization is low

Hi @Pilhyeon
Thank you for your excellent work! I clone you code and run it,but I found that the GPU utilization of the program is very low(only 6%).Is this a normal phenomenon？My GPU is NVIDIA TITAN XP12GB.

Act1.2 and act1.3 feature？

Hello，Pilhyeon。
When do you relase the feature of act1.2 and act1.3? I have been waiting for your feature about four months.
Or you can send me the feature by email:[email protected].
Thanks.

Confusion of proposal

Can you explain the workflow of proposal? @Pilhyeon

WTAL-Uncertainty-Modeling/utils.py

Line 19 in ea630d4

    
           def get_proposal_oic(tList, wtcam, final_score, c_pred, scale, v_len, sampling_frames, num_segments, _lambda=0.25, gamma=0.2):

How to decide these four values:
_lambda=0.25, gamma=0.2, feature_fps = 24, scale = 24

Why choose softmax as the activation function instead of sigmoid?

It is a multi-label classification problem. @Pilhyeon

ActivityNet 1.3 Features and model

Hi, Excellent work !

You mentioned that you will make the ActivityNet Features and model public. Can you do so now ? I t is very difficult to reproduce the reported results as someone already posted in this repository without the desired settings ( i tried your suggested setting for AN but not as reported results )

My email is : [email protected] ! You can you send me the link here if you wish

Could you please share the parameter settings for experiments on ActivityNet 1.2 and 1.3?

Why there is a dropout when generate pseudo action features?

Hi @Pilhyeon , I have read all of your code but there is a question I cannot understand, why there is a dropout ?

It seems like you did not select all magnitude values, while remove most of them(0.7 in your code), and generate pseudo action features and pseudo background features via the remain magnitude(0.3).

Is this a regularization method?

Cannot get provided feature.

Hi @Pilhyeon:

thanks for your contribution, I cannot download the features provided by you, when I open the google drive link, there is all the files:

I am not sure this is the features that used in your repo, since I do not know how to use these files in the google drive link. Could you please check this or explain it?

Results of your provided pre-trained model

Thanks for your great work! But the pre-trained model you provide cannot achieve the results in the paper. I saw the same reproducted results in a closed issue, could you please check whether you didn't upload the latest pre-trained model or maybe some other mistakes? Thanks~
Step: 0
Test_acc: 0.8905
average_mAP: 0.4038
[email protected]: 0.6551
[email protected]: 0.5836
[email protected]: 0.5052
[email protected]: 0.4158
[email protected]: 0.3245
[email protected]: 0.2266
[email protected]: 0.1161

Can anyone reproduce the results in the paper?

I have ran the code many times(about 100 times), same environment with requirements.txt, even changed 3 different machines, but the best result I could get is about to 40, much worse than 41.8.

I tried to change random seed but keep all the hyper-parameters, but it did not work, I am sure that I used the same environment and latest code.

I hope the author could reply this question since there are also other people cannot reproduce the result #10 #5 #11

Their issues are closed by the author, and no responds to their questions, I think the author should check code carefully again, especially the hyperparameters.

If anyone else could achieve mAP 41.8, please tell me, I do not hope that author close this issue, since the problem is not solved, I also hope the author @Pilhyeon could run this code again in a different machine and then public the result and code, or provide the training log if you could prove that.

About baseline performance

Why mAP of baseline in BasNet (Table2: IoU0.5: 12.0) and this paper (Figure 3: IoU0.5: 22.6 ) is so different?
What is the main differrence in implementation?

questions about thumos feature

Hi @Pilhyeon

I noticed that your features(932 M) are less than the features provided by other paper and repo, e.g.:

CMCS-CVPR'2019 -> 80 GB for ten-crop
DGAM-CVPR'2020 -> 1.82 GB
W-TALC -> 1.04 GB

Could you please explain it? Whether the size of the feature is related to the removal of some videos (270, 1292, 1496)?

question about feat_magnitudes

Good work! And when I'm reading the paper, I get a question: how is the feature magnitudes defined in Figure 2 of the paper? (X-axis) Is that the normalized video feature(shape: [B,T,F]) or something? I am confused about the x-axis of Figure 2 and the histogram plotting. Thanks!

Some questions in your paper

Hi @Pilhyeon

Thanks for your contribution, I tried again and could reproduce your result! It is really an amazing work!

I read your paper carefully, but there are still some details I cannot understand, could you please answer me if you have time?

Could you please explain the figure 2 in your paper? What does the Y-axis Density mean?
Can I understand the original features are obtained from embedded feature only use main pipeline as the whole model while separated features use both main pipeline and Uncertainty modeling as final model?
Does the softmax score used in table 3 of ablation study means only use main pipeline in figure3 to obtain result?
If i understand correctly, the softmax score is obtained by the original features, which means they are not separated, have unconstrained magnitudes, so is the description in the figure below wrong? It should be For the **first**, as the original......
Could you please provide your extracted features and pretrained models for ActivityNet 1.2 and ActivityNet 1.3?

Thanks again for your contribution and patience, hope you can reply to me!

Question about Figure 2 in paper.

I have a doubt that how to determine which frames are background frames and which frames are action frames when drawn the histogram?

An Error implement about `nms`

These two lines should not add 1 for compute areas:

WTAL-Uncertainty-Modeling/utils.py

Line 101 in ea630d4

areas = x2 - x1 + 1

WTAL-Uncertainty-Modeling/utils.py

Line 111 in ea630d4

inter = np.maximum(0.0, xx2 - xx1 + 1)

Reference the implement of temporal nms in mmaction2
https://mmaction2.readthedocs.io/en/latest/api.html#id26

> Can't reproduce the result?

Hello,
In fact, the performance is improved by hyper-parameter tuning without any model change.
Specifically, alpha: 0.0002 -> 0.0005, r_act: 8 -> 9, r_bkg: 6 -> 4
You can also find it in the options.py.
In addition, I updated the best model file, with which you can see the improved result.
Thanks!

Thanks for your reply.I test the best model that you have updated. I have changed the parameter as you said.But I can't reproduce the result.The result that I test is as follows:
Step: 0
Test_acc: 0.8905
average_mAP: 0.4038
[email protected]: 0.6551
[email protected]: 0.5836
[email protected]: 0.5052
[email protected]: 0.4158
[email protected]: 0.3245
[email protected]: 0.2266
[email protected]: 0.1161
Do you know the reason?

Originally posted by @xumh-9 in #9 (comment)

Real-time inference?

Hi, I read your paper and congrats for your work.
Anyway, there is the inference part that is unclear to me: at inference time is it possible to use this framework for the online action detection task? (i.e. let's suppose I have an input stream video, is the model able to predict frame-level labels as the frames arrive, with real-time speed ?)

Thank you!

How long is the training time

thanks for your nice work, and can you provide the details about training GPU and training time?

How to get WUM_result_numpy

Hi, how to get WUM_result_numpy?
@Pilhyeon

Some questions about features and codes

Hi, I am reproducing your work, and following the hyperparameters as the paper does.
I use the feature extractor in the repo you recommend, and select 16 frames as a segment, choose the output of Logits Layer with the averaging pooling layer, so i get 1024-d vector as feature. But i can not reproduce the results in your paper.

Also, i find a difference between the code and the paper that when calculating the loss_act in the BMUE loss, you use abs function instead of the max function compared with 0.

hi, about test code -upsample and down

what is the meaning of it ?

pilhyeon / wtal-uncertainty-modeling Goto Github PK

wtal-uncertainty-modeling's Introduction

👋 Hi, I'm Pilhyeon Lee (이필현).

📓 Publications (selected)

💻 Languages

wtal-uncertainty-modeling's People

Contributors

Stargazers

Watchers

Forkers

wtal-uncertainty-modeling's Issues

Recommend Projects

Recommend Topics

Recommend Org