Thanks for sharing the code of this excellent work. But I have some questions about the code. It will be much appreciated if you could resolve my puzzles.
the result of your process of the ucf dataset and generating the video_list.txt in "/BERT_Anomaly_Video_Classification/tree/main/MIL-BERT/fix_ucf_crime_test_list.ipynb" is below:
['Abuse/Abuse028_x264.mp4|1412|[165, 240, -1, -1]', 'Abuse/Abuse030_x264.mp4|1544|[1275, 1360, -1, -1]', 'Arrest/Arrest001_x264.mp4|2374|[1185, 1485, -1, -1]', 'Arrest/Arrest007_x264.mp4|3144|[1530, 2160, -1, -1]', 'Arrest/Arrest024_x264.mp4|3629|[1005, 3105, -1, -1]']
but the result was processed by me is below:
Abuse/Abuse028_x264.mp4|1413|[165, 240, -1, -1]
Abuse/Abuse030_x264.mp4|1545|[1275, 1360, -1, -1]
Arrest/Arrest001_x264.mp4|2375|[1185, 1485, -1, -1]
Arrest/Arrest007_x264.mp4|3145|[1530, 2160, -1, -1]
there are different in num_frames, I use the "frames.pkl" released by https://github.com/junha-kim/Learning-to-Adapt-to-Unseen-Abnormal-Activities
Thanks very much