cvdfoundation / kinetics-dataset Goto Github PK

View Code? Open in Web Editor NEW

699.0 699.0 88.0 42 KB

Shell 79.18% Python 20.82%

kinetics-dataset's People

Contributors

Stargazers

Watchers

Forkers

shoufachen hzhang57 dexianghong plrbear amir-dz 13301338176 hucui2022 keephappy-365 user8361 haideraltahan voducman namhar tdf1995 784682065 kejie-cn xuyu0010 gjtjx gzcsudo yeboqxc foolishman-wx semhejazi aierhaimian mry1990 klauscc inuxer simonlliu cmu-inf-diva pansanity666 shuaijun-deng wzb1005 dongzeyang deepsworld xgmiao egoyue messiyue sawyermade plaovem lagrangeli yangyanghu leopoldlin bf2harven yuhao20 mbencherif yikoudamifan doranlyong gcxamy pace577 shanye1516 maoyunyao martin-wmx yangzhuang-j dollja niuliling123 pravin74 xinzhe-ni hwijune wgcban kkallidromitis simcs 3x2y ruizewang wzml gzaraunitn ee2110 yeshaokai elv-zhounan zhenligod asdf2kr shutongjin learnerma thomas-lee-adtronics abhinav95 chrisindris istiyakv 2nung youyuehanxiao applezoos lianglili tcrapse hatemhosam rendicahya wzhao5 rushdawn chuan-shanjia coolsquirtle dextermayhewjd

kinetics-dataset's Issues

When decoding some of the kinetics400 videos, "moov atom not found" raised

Hi, Thanks for your dataset! It's very helpful!

I have successfully download kinetics400 dataset and annotations and have two questions.

Some of the videos in kinetics400 train set are corrupted. Try to decode video with cv2.VideoCapture("/path/to/video"), "moov atom not found" raised. The corrupted video file list is here
test.txt
There are 240k videos in kinetics400 train set. However, it seems that not all of them(about 238k-239k) are in kinetics400 train annotations

Missing videos in downloaded validation and train folder.

Hello,

It looks like a lot of samples found in https://s3.amazonaws.com/kinetics/600/annotations/train.csv and val.csv are missing. In my downloaded val folder there are about 29k video samples but only 15k of them can be found in the https://s3.amazonaws.com/kinetics/600/annotations/val.csv file.

Burak

does kinect700 include kinect600 and kinect 400

fix the space in link

change wget $one to wget "$one" in download.sh. Otherwise, the space in the link causes fails when downloading.

Followup on the held out set for Kinetics-600

Hi @kinetics-cvdf ,

Is there a path for held-out test set for Kinetics-600 now like you have https://s3.amazonaws.com/kinetics/600/test/k600_test_path.txt for the test set?

this is related to #10. I am facing the same issue. I was planning to avoid downloading all the videos from youtube.

Some videos recorded in the annotation could not be found

In the process of using the dataset, we found that some videos recorded in train.csv, test.csv, val.csv under the annotations folder were lost. After putting 1200 replace videos into the train folder, there are still 8000 videos that cannot be found.

extracting into subdirectories rather than everything in one dir

The extract.sh untar everything to one folder. Isn't it better to extract it into its destination folder? I.e.

$ more ../extract2.sh
for file in *.tar.gz;
do
mkdir -p "${file%.tar.gz}"
tar -zxf "$file" -C "${file%.tar.gz}"
done

and extract with

DeepMind/Kinetics-600/train$ bash ../extract2.sh k600_train_path.txt

 gives

drwxr-sr-x 2 torel users 993 Apr 29 2018 abseiling/
-rw-r--r-- 1 torel users 2000227200 Apr 17 12:49 abseiling.tar.gz
drwxr-sr-x 2 torel users 708 Apr 29 2018 acting in play/
-rw-r--r-- 1 torel users 930025532 Apr 17 12:49 acting in play.tar.gz
drwxr-sr-x 2 torel users 649 Apr 29 2018 adjusting glasses/

Brgds,

Missing videos

It is known that some YouTube videos are not available，i wonder if the Kinetics-400 you provided contain these missing videos.

Size of Some Videos is 0KB

When I use ffmpeg to change the resolution of these videos, I found some errors. So I check the size of the videos. It is found that some videos are 0KB in size.

1. E2kUsRIj4tM_000317_000327.mp4
2. E2NeSaQieHk_000087_000097.mp4
3. MVWayhNpHr0_000065_000075.mp4
4. N74EWF0fs5c_000182_000192.mp4
5. QhF1i23vwps_000379_000389.mp4
6. YCQlaH_Vy8I_000245_000255.mp4
7. _cbZlhduYJY_000503_000513.mp4
8. aCcAcCE7Ixo_000034_000044.mp4
9. gKBhQ-oe_9Q_000177_000187.mp4
10. lm6qgrfJGmw_000027_000037.mp4
11. 28bTQiuymgs_000031_000041.mp4
12. 8iED0lhyrN8_000038_000048.mp4
13. B6GxQKcL7IY_000213_000223.mp4
14. Df6CGDjUkAA_000151_000161.mp4
15. GkGS69GCx4Q_000319_000329.mp4
16. J5xNIJlfBAw_000156_000166.mp4
17. QzmhrYx15_E_000059_000069.mp4
18. ZtCk_0cMZ9U_000347_000357.mp4
19. d_vQWquKtBg_000015_000025.mp4
20. rba-NkJjSNg_000167_000177.mp4
21. wL1Bit-Gv40_000305_000315.mp4
22. GN37yfNvQwM_000132_000142.mp4
23. bOU2oGVBM_o_000030_000040.mp4
24. du6bfkBEfVs_000155_000165.mp4
25. fXRNY6-s-7U_000112_000122.mp4

I don't know if there was an error in the process of downloading or decompressing, or the videos themselves are corrupted.

label missing in k400 test set annotation

The original test set download from https://deepmind.com/research/open-source/kinetics contain labels for each video, which is missing in this release.

from deepmind:

label,youtube_id,time_start,time_end,split
drinking beer,--6bJUbfpnQ,17,27,test
climbing tree,--8YXc8iCt8,2,12,test
surfing water,--coBvtS-eQ,57,67,test
stomping grapes,--q6ElFyVq0,148,158,test
...

download from AWS (the label is missing):

youtube_id,time_start,time_end,split
--6bJUbfpnQ,17,27,test
--8YXc8iCt8,2,12,test
--coBvtS-eQ,57,67,test
--q6ElFyVq0,148,158,test
...

Missing files in the data folder

Hi,
When I was checking the test folder. I found the filename in the test.csv has 39805 records. When checking the file basing on the records in the test.csv, there are 1120 not exist out of 39805. Is this normal?
Thank you.

Videos inside train and test folder may have different names. (K400)

The following is the annotations file for training: [](url
k400_train.csv
)
The following is the list of videos present inside the train folder:
origtrain.txt

Many videos that are present inside the annotations file are either missing or have a different name: Some examples are:
absent: ['abseiling' 'lqciwm6gDrk' 659 669 'train' 0]
absent: ['abseiling' 'Lwti_IVm-Bc' 39 49 'train' 0]
absent: ['abseiling' 'LwyKxe85UWI' 88 98 'train' 0]
absent: ['abseiling' 'lXnebafO2cI' 2145 2155 'train' 0]
absent: ['abseiling' 'LY02AE6XK5I' 381 391 'train' 0]
absent: ['abseiling' 'M-hBdj62g9Y' 48 58 'train' 0]
absent: ['abseiling' 'm-iKFbNcLYM' 30 40 'train' 0]
absent: ['abseiling' 'M1QFHoC4o3A' 78 88 'train' 0]
absent: ['abseiling' 'm25BcZ3B0Hs' 219 229 'train' 0]
absent: ['abseiling' 'M6yv0dy8lYE' 297 307 'train' 0]
absent: ['abseiling' 'm8Pm5kmCuqI' 64 74 'train' 0]
absent: ['abseiling' 'MIIbU2xZcUY' 32 42 'train' 0]
absent: ['abseiling' 'mjsrWa2olhk' 35 45 'train' 0]
absent: ['abseiling' 'MP-Op52e84g' 176 186 'train' 0]
absent: ['abseiling' 'MqBaIW3qmuM' 98 108 'train' 0]
absent: ['abseiling' 'mRdyYMPlJ_8' 73 83 'train' 0]

Can someone please confirm if they have the same issue or am I missing something?
Thank you

No videos found after running the k700_2020_extractor.sh !

I am trying to download the kinetics 700 dataset
I followed the instructions provided But no videos were dowloaded
Is there any suggestions?

k700_2020_downloader.sh

This file is all kinds of wrong. First off, uses sudo for everything and then also downloads all the files twice. I will fix it and do a pull request. This is really, really bad.

Many videos in the Kinetics700-2020 are shorter than 10 seconds

Hi, many videos in the Kinetics700-2020 are shorter than 10 seconds, but they are supposed to be 10 seconds long. In the test split, the percentage is over 25%. Here are some examples that are shorter than 8 seconds.

Kinetics700-2020-test/v55ikd_-Rc4_000141_000151.mp4
Kinetics700-2020-test/52mb2tRzayU_000106_000116.mp4
Kinetics700-2020-test/9k3bdcoMTVY_000013_000023.mp4
Kinetics700-2020-test/f9FftpAwmws_000074_000084.mp4
Kinetics700-2020-test/714LsaiTVVk_000002_000012.mp4
Kinetics700-2020-test/7WtqdnyTXjY_000004_000014.mp4
Kinetics700-2020-test/bbaRarfa-X0_000073_000083.mp4
Kinetics700-2020-test/xKnk1UYdgac_000000_000010.mp4
Kinetics700-2020-test/Pf5jowvNpiE_000013_000023.mp4
Kinetics700-2020-test/A1CQslN-Xbw_000010_000020.mp4
Kinetics700-2020-test/aJw7fScmOGo_000007_000017.mp4
Kinetics700-2020-test/2bI8oYlrWjs_000000_000010.mp4
Kinetics700-2020-test/KV8RVTRTAL0_000007_000017.mp4
Kinetics700-2020-test/rAgdt5mqCwA_000048_000058.mp4
Kinetics700-2020-test/LhW0hADHePo_000000_000010.mp4
Kinetics700-2020-test/Fo7EYCBwDaw_000135_000145.mp4
Kinetics700-2020-test/72PEZjijk8o_000002_000012.mp4
Kinetics700-2020-test/3-3e71B5yBo_000000_000010.mp4
Kinetics700-2020-test/d61S7amsWsM_000003_000013.mp4
Kinetics700-2020-test/191VnlH8z68_000002_000012.mp4
Kinetics700-2020-test/QV6D9MoUlH4_000042_000052.mp4
Kinetics700-2020-test/flXQJFDjw1E_000001_000011.mp4
Kinetics700-2020-test/iCgHfcLhnDU_000318_000328.mp4
Kinetics700-2020-test/6vemGexYgHI_000003_000013.mp4
Kinetics700-2020-test/2AxfjxBvh10_000000_000010.mp4
Kinetics700-2020-test/4LFQuxKfFIQ_000261_000271.mp4
Kinetics700-2020-test/4QYmCBN1nHQ_000046_000056.mp4
Kinetics700-2020-test/cPd1GhGV4Fg_000011_000021.mp4
Kinetics700-2020-test/4V7JPYZBnCM_000014_000024.mp4
Kinetics700-2020-test/3xcQj9HZP5Y_000000_000010.mp4
Kinetics700-2020-test/1LaRLvgZTjI_000114_000124.mp4
Kinetics700-2020-test/8uGAZkuoXVg_000078_000088.mp4
Kinetics700-2020-test/42vZ8I-jRPg_000034_000044.mp4
Kinetics700-2020-test/1f-5jxwtibg_000262_000272.mp4
Kinetics700-2020-test/6_T1NJTMNuc_000000_000010.mp4
Kinetics700-2020-test/1F4REb4pqo0_000001_000011.mp4
Kinetics700-2020-test/3OPqFdZlaNY_000075_000085.mp4
Kinetics700-2020-test/JE8h-yGd25w_000000_000010.mp4
Kinetics700-2020-test/9PVi6qiS7zM_000006_000016.mp4
Kinetics700-2020-test/0hMk37By7t4_000021_000031.mp4
Kinetics700-2020-test/Pd_gOf0TY7M_000050_000060.mp4
Kinetics700-2020-test/KdD5HVxwaQE_000018_000028.mp4
Kinetics700-2020-test/caBITzNkOis_000014_000024.mp4
Kinetics700-2020-test/3lGPnnsf9Y8_000004_000014.mp4
Kinetics700-2020-test/1OvQ9_ZgnIA_000000_000010.mp4
Kinetics700-2020-test/AkIhOrNcbUA_000020_000030.mp4
Kinetics700-2020-test/M45S-HkcwTM_000049_000059.mp4
Kinetics700-2020-test/FOa1tk1Isi0_000038_000048.mp4
Kinetics700-2020-test/OgXl2BKdUoU_000012_000022.mp4
Kinetics700-2020-test/uaKPPePpSY0_000006_000016.mp4
Kinetics700-2020-test/-_D7UCii3FU_000021_000031.mp4
Kinetics700-2020-test/3Hr-2TpgVEE_000057_000067.mp4
Kinetics700-2020-test/1Je9mL8Uudo_000000_000010.mp4
Kinetics700-2020-test/N1IGDSJoia0_000000_000010.mp4
Kinetics700-2020-test/9EiQCNi4bOA_000023_000033.mp4
Kinetics700-2020-test/0C9EO_A2PIY_000004_000014.mp4
Kinetics700-2020-test/B0n-nS4Y6xs_000000_000010.mp4
Kinetics700-2020-test/45E3EdNaoHg_000013_000023.mp4
Kinetics700-2020-test/6hpPVBBGZ74_000009_000019.mp4
Kinetics700-2020-test/a1jyH4CJJR4_000000_000010.mp4
Kinetics700-2020-test/AzQ6mn_6ZKc_000000_000010.mp4
Kinetics700-2020-test/0zr5-JyS0Xc_000047_000057.mp4
Kinetics700-2020-test/43D0gnE5Z7o_000083_000093.mp4
Kinetics700-2020-test/IVW_Yk2lyDg_000000_000010.mp4
Kinetics700-2020-test/2R45XkkgbAQ_000045_000055.mp4
Kinetics700-2020-test/8N6-DeT6mXs_000048_000058.mp4
Kinetics700-2020-test/6ATIhv4DFjo_000034_000044.mp4
Kinetics700-2020-test/3E9AdPkiz9o_000000_000010.mp4
Kinetics700-2020-test/5XgnD4P9B-M_000005_000015.mp4
Kinetics700-2020-test/AB305H8Np48_000040_000050.mp4
Kinetics700-2020-test/3OezYSbd_n4_000064_000074.mp4
Kinetics700-2020-test/Z8e-EfVlIx0_000000_000010.mp4
Kinetics700-2020-test/6m_8FNc2scg_000137_000147.mp4
Kinetics700-2020-test/K43n8RqxbFQ_000101_000111.mp4
Kinetics700-2020-test/kgIEx-OjPG0_000000_000010.mp4
Kinetics700-2020-test/0nLH52UNKhw_000000_000010.mp4
Kinetics700-2020-test/5V7GTuihlQQ_000002_000012.mp4
Kinetics700-2020-test/1hZV-H5yl6s_000000_000010.mp4
Kinetics700-2020-test/COZqe2f1Axg_000031_000041.mp4
Kinetics700-2020-test/29GNPtZaqS4_000001_000011.mp4
Kinetics700-2020-test/83J0uf8cJlI_000025_000035.mp4
Kinetics700-2020-test/6Zl5jX9fjKE_000139_000149.mp4
Kinetics700-2020-test/1AGYst8AKCc_000000_000010.mp4
Kinetics700-2020-test/cEmdLm8cBNE_000037_000047.mp4
Kinetics700-2020-test/1FUiMeIu7sE_000011_000021.mp4
Kinetics700-2020-test/5JBC5X0O73k_000005_000015.mp4
Kinetics700-2020-test/Cpn-XAerL5I_000011_000021.mp4
Kinetics700-2020-test/aFqlkvgQKho_000000_000010.mp4
Kinetics700-2020-test/aUOo5M67Itc_000010_000020.mp4
Kinetics700-2020-test/BJaHpp_K148_000190_000200.mp4
Kinetics700-2020-test/AivUke09tz8_000019_000029.mp4
Kinetics700-2020-test/f5WiwscpVlE_000000_000010.mp4
Kinetics700-2020-test/44esMhYjLRs_000019_000029.mp4
Kinetics700-2020-test/p-koaErOtiI_000075_000085.mp4
Kinetics700-2020-test/aC6__nAesz8_000103_000113.mp4
Kinetics700-2020-test/CrnGGdO3C4M_000030_000040.mp4
Kinetics700-2020-test/8YbNZ3lm7Ts_000000_000010.mp4
Kinetics700-2020-test/CAUQyTTat2M_000011_000021.mp4
Kinetics700-2020-test/CZbXx9UW2FE_000146_000156.mp4
Kinetics700-2020-test/7JYYa4C5u4A_000003_000013.mp4

Kinetics 600 validation set mountain climber empty tar file

Hi,

Thanks for making the kinetics dataset publicly available. I found the following link to be empty under Kinetics 600 validation set. Could you please look into it?

https://s3.amazonaws.com/kinetics/600/val/mountain climber (exercise).tar.gz

Thanks

Question about dir `replacement`

I have downloaded the data and extracted the targz files using .sh scripts. Then how can I use replacement data? Should I move all the files in replacement to any other dir?

K600 has videos not in the original release?

Hi,

Thanks for your effort archiving the videos.

I found some videos in the provided K600 not exist in the original release.

In particular, the val set provided here is quite different: around 30k videos downloaded but only ~17k are in the original release.

I wonder whether this is an official updated version, or there is something wrong, like you just happened to include videos from other sources by accident?

Thanks,

k600_extract script

find $curr_dl -type f | while read file; do mv "$file" echo $file | tr ' ' ''done
should be
find $curr_dl -type f | while read file; do mv "$file" echo $file | tr ' ' ''; done
Missing the semicolon leads to an error.

k600_extractor.sh throws errors during extraction

Running bash k600_extractor.sh gives the following output (first 11 lines)

Extracting k600_targz/train/abseiling.tar.gz to k600/train                                                                                                   
Extracting k600_targz/train/play.tar.gz to k600/train                                                                                                        
tar (child): k600_targz/train/play.tar.gz: Cannot open: No such file or directory                                                                            
tar (child): Error is not recoverable: exiting now                                                                                                           
tar: Child returned status 2                                                                                                                                 
tar: Error is not recoverable: exiting now                                                                                                                   
Extracting k600_targz/train/glasses.tar.gz to k600/train                                                                                                     
tar (child): k600_targz/train/glasses.tar.gz: Cannot open: No such file or directory                                                                         
tar (child): Error is not recoverable: exiting now                                                                                                           
tar: Child returned status 2                                                                                                                                 
tar: Error is not recoverable: exiting now
...

Expected behaviour: No errors must be thrown during extraction

Missing annotations directory in arrange_by_classes.py

kinetics-dataset/arrange_by_classes.py

Line 22 in 7fddcaa

split_csv = load_label(path / f'{split}.csv')

hi, i just tested this script with the k400 version of the dataset and figured that the 'annotations' folder where the csv's reside in is missing in the script. I'm not sure if this fix would be correct for all the kinetics versions.

fix:
split_csv = load_label(path / 'annotations' / f'{split}.csv')

GPU settings

Hello, Thanks for the great contribution !

Could you please describe the required GPU memory and the number of GPUs for the experiments in the table?

label missing for k600?

Hi, I am wondering where can we get the ground truth label for the k600? It's not in the link you provided in the README and also I checked with DeepMind's csv and json but it seems that not every video in the k600 test set is in their csv or json, for example, the id 0y-r_p-0TwM (in the part_2.tar.gz) is not in the csv.

Thank you!

Sizes of The Video Datasets

Hi, could you (or anyone that have downloaded the datasets) provide an estimation of the dataset size so we could better plan out the disk space for them? Thanks! I'm planning to download K400 and K600.

Download part of kinetics dataset

No held-out test in Kinetics-600

I have downloaded all the videos in https://s3.amazonaws.com/kinetics/600/test/k600_test_path.txt, but found the "held-out test set" missing. There are 72,924 videos in https://s3.amazonaws.com/kinetics/600/annotations/test.csv, but only 59,608 downloaded.

Could you help us how to get the held-out test set in Kinetics-600? Thanks! @kinetics-cvdf

part_120.tar.gz is not a tar.gz file but a tar file

label missing in k600 test set annotation

The original test set download from https://deepmind.com/research/open-source/kinetics contain labels for each video, which is missing in this release.

from deepmind:

label,youtube_id,time_start,time_end,split
drinking beer,--6bJUbfpnQ,17,27,test
climbing tree,--8YXc8iCt8,2,12,test
surfing water,--coBvtS-eQ,57,67,test
stomping grapes,--q6ElFyVq0,148,158,test
...

download from AWS (the label is missing):

youtube_id,time_start,time_end,split
--6bJUbfpnQ,17,27,test
--8YXc8iCt8,2,12,test
--coBvtS-eQ,57,67,test
--q6ElFyVq0,148,158,test
...

Please provide the required csv file as soon as possible.

no videos are downloaded

I followed the instructions, but it didn't download any video rather the folders are empty.

Question Regarding kinetics-400 Dataset: What are test videos?

Hello, I'm new to the field of Action recognition and have a question regarding the dataset split. Specifically for the kinetics-400 dataset, in the paper "Unmasked Teacher: Towards Training-Efficient Video Foundation Models," they provide the following summary for the number of training and validation data:

In the Video Swin Transformer paper, they also describe the kinetics-400 dataset as follows:

Both papers commonly state that kinetics-400 consists of approximately 240k training videos and 20k validation videos. However, the CSV file provided in this GitHub repository contains around 40k test videos that are not mentioned in the papers. Could you please clarify what are these test videos?

Additionally, the link to https://deepmind.com/research/open-source/kinetics is not working correctly. Has the official project page been removed?

I would appreciate insights from those who have continued their research in the field of Action Recognition, and familiar with the kinetics dataset.

使用mmaction2的RGB帧提取工具出现错误

不知道是否有人使用mmaction2的RGB帧提取工具。我下载了k400数据集后，成功解压并按照mmaction2的命名习惯重新命名了文件夹，然后提取RGB帧时却提示没有找到相应的视频文件。

是否有人可以给我提供一些帮助或者指出可能存在的问题（已经检查过视频文件夹，确定目标路径下有相应的.mp4视频文件）

very serious problem with k600 test set

After downloading, there are a lot of videos in the training set, and the actual number of correct videos is only more than 3,000

k400

An error occurred during decompression and forced exit when used bash ./k400_extractor.sh

issue

Hello, I didn't find kinetics-dataset on the official website

md5sum

Thanks for this great repo.

It would be much better if md5sum can be provided for checking files.

High resolution of videos, is it necessary ?

I would like to ask you about the resolution of the videos in dataset. I download some tar.gz files to inspect the resolutions of videos and I noticed that some videos are in 720p resolution. I think that this is completely unnecessary. What I mean is, that in most models I know the first action that perform in a dataset is to resize it for example to 227x227, so a resolution of 720p is a waste of space. So:

Is there a reason that you have such high resolutions in dataset ?
Have you tried to run a model with videos with initial high resolution and initial low resolution and notice any difference ?

In a same situation for Sports1M dataset from youtube, the authors suggest to store the low resolution videos.

HTTP request sent, awaiting response... 404 Not Found

bash download.sh k400_..._path.txt always return 404 Not Found error
it looks like the wget "$one" doesn't work
https://stackoverflow.com/questions/7623698/wget-cant-download-404-error
I have tried those methods and it just didn't work.
Could you give me some advice?

No K600 train.csv and val.csv provided

@kinetics-cvdf Could you provide train.csv and val.csv instead of train.txt and val.txt please? Thanks in advance!