dxli94 / wlasl Goto Github PK

WACV 2020 "Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison"

Home Page: https://dxli94.github.io/WLASL/

Python 99.34% Shell 0.66%

sign-language-recognition sign-language sign-language-translation sign-language-datasets sign-language-classifier

wlasl's Introduction

WLASL: A large-scale dataset for Word-Level American Sign Language (WACV 20' Best Paper Honourable Mention)

This repository contains the WLASL dataset described in "Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison".

Please visit the project homepage for news update.

Please star the repo to help with the visibility if you find it useful.

yt-dlp vs youtube-dl youtube-dl has had low maintance for a while now and does not work for some youtube videos, see this issue. yt-dlp is a more up to date fork, which seems to work for all youtube videos. Therefore ./start_kit/video_downloader.py uses yt-dlp by default but can be switched back to youtube-dl in the future by adjusting the youtube_downloader variable. If you have trouble with yt-dlp make sure update to the latest version, as Youtube is constantly changing.

Download Original Videos

Download repo.

git clone https://github.com/dxli94/WLASL.git

Install youtube-dl for downloading YouTube videos.
Download raw videos.

cd start_kit
python video_downloader.py

Extract video samples from raw videos.

python preprocess.py

You should expect to see video samples under directory videos/.

Requesting Missing / Pre-processed Videos

Videos can dissapear over time due to expired urls, so you may find the downloaded videos incomplete. In this regard, we provide the following solution for you to have access to missing videos.

We also provide pre-processed videos for the full WLASL dataset on request, which saves troubles of video processing for you.

(a) Run

python find_missing.py

to generate text file missing.txt containing missing video IDs.

(b) Submit a video request by agreeing to terms of use at: https://docs.google.com/forms/d/e/1FAIpQLSc3yHyAranhpkC9ur_Z-Gu5gS5M0WnKtHV07Vo6eL6nZHzruw/viewform?usp=sf_link. You will get links to the missing videos within 7 days (recently I got more occupied, some delays may be expected yet I'll try to share in one week. If you are in urgent need, drop me an email.)

File Description

The repository contains following files:

WLASL_vx.x.json: JSON file including all the data samples.
data_reader.py: Sample code for loading the dataset.
video_downloader.py: Sample code demonstrating how to download data samples.
C-UDA-1.0.pdf: the Computational Use of Data Agreement (C-UDA) agreement. You must read and agree with the terms before using the dataset.
README.md: this file.

Data Description

gloss: str, data file is structured/categorised based on sign gloss, or namely, labels.
bbox: [int], bounding box detected using YOLOv3 of (xmin, ymin, xmax, ymax) convention. Following OpenCV convention, (0, 0) is the up-left corner.
fps: int, frame rate (=25) used to decode the video as in the paper.
frame_start: int, the starting frame of the gloss in the video (decoding with FPS=25), indexed from 1.
frame_end: int, the ending frame of the gloss in the video (decoding with FPS=25). -1 indicates the gloss ends at the last frame of the video.
instance_id: int, id of the instance in the same class/gloss.
signer_id: int, id of the signer.
source: str, a string identifier for the source site.
split: str, indicates sample belongs to which subset.
url: str, used for video downloading.
variation_id: int, id for dialect (indexed from 0).
video_id: str, a unique video identifier.

Please be kindly advised that if you decode with different FPS, you may need to recalculate the frame_start and frame_end to get correct video segments.

Constituting subsets

As described in the paper, four subsets WLASL100, WLASL300, WLASL1000 and WLASL2000 are constructed by taking the top-K (k=100, 300, 1000 and 2000) glosses from the WLASL_vx.x.json file.

Training and Testing

I3D

cd WLASL
mkdir data

put all the videos under data/.

cp WLASL2000 -r data/

To train models, first download I3D weights pre-trained Kinetics and unzip it. You should see a folder I3D/weights/.

python train_i3d.py

To test pre-trained models, first download WLASL pre-trained weights and unzip it. You should see a folder I3D/archived/.

python test_i3d.py

By default the script tests WLASL2000. To test other subsets, please change line 264, 270 in test_i3d.py properly.

A previous release can be found here.

Pose-TGCN

Download splits file and body keypoints. Unzip them into WLASL/data. You should see WLASL/data/splits and WLASL/data/pose_per_individual_videos folders.

To train the model, modify paths in train_tgcn.py main() to point to WLASL root.

python train_tgcn.py

To test the model, first download pre-trained models and unzip to code/TGCN/archived. Then run

python test_tgcn.py

License

Licensed under the Computational Use of Data Agreement (C-UDA). Plaese refer to C-UDA-1.0.pdf for more information.

Disclaimer

All the WLASL data is intended for academic and computational use only. No commercial usage is allowed. We highly respect copyright and privacy. If you find WLASL violates your rights, please contact us.

Citation

Please cite the WLASL paper if it helps your research:

 @inproceedings{li2020word,
    title={Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison},
    author={Li, Dongxu and Rodriguez, Cristian and Yu, Xin and Li, Hongdong},
    booktitle={The IEEE Winter Conference on Applications of Computer Vision},
    pages={1459--1469},
    year={2020}
 }

Please consider citing our work on WLASL.

@inproceedings{li2020transferring,
 title={Transferring cross-domain knowledge for video sign language recognition},
 author={Li, Dongxu and Yu, Xin and Xu, Chenchen and Petersson, Lars and Li, Hongdong},
 booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
 pages={6205--6214},
 year={2020}
}

Other works you might be interested in.

@article{li2020tspnet,
  title={Tspnet: Hierarchical feature learning via temporal semantic pyramid for sign language translation},
  author={Li, Dongxu and Xu, Chenchen and Yu, Xin and Zhang, Kaihao and Swift, Benjamin and Suominen, Hanna and Li, Hongdong},
  journal={Advances in Neural Information Processing Systems},
  volume={33},
  pages={12034--12045},
  year={2020}
}

@inproceedings{li2022transcribing,
  title={Transcribing natural languages for the deaf via neural editing programs},
  author={Li, Dongxu and Xu, Chenchen and Liu, Liu and Zhong, Yiran and Wang, Rong and Petersson, Lars and Li, Hongdong},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={36},
  number={11},
  pages={11991--11999},
  year={2022}
}

Stargazers over time

wlasl's People

Contributors

Stargazers

Watchers

Forkers

ann-korea tfederico alighofrani95 jurjsorinliviu rlashofregas gtron-1729 istiakfahim28 yufliu abdulrahman-ashour vancelot11 mahdiesrafili raghuhvarman tongyao-zhu bu632ada shihao310 adityapune onkarsus13 zhumingxu the6thsense joshrclo chi-yu-sung jnwestra nerdyspook enigma-i-am discrea jesusgonzaleza michaelyhuang23 raj-chinagundi silantran shady932 gokulmishra deepaliverma xx-q palak2665 tusharsb-12 wajahati faisalirulam dahburj shungjhon kimslor astroguy1 yiyuyi-matthw komaljoshi252 ayushbindal baris-unver manikant92 wgzhsh dariaosk mbencherif chrisfrederick subburajs aiiuii amna-ai hanchenchen python-repository-hub dengandong imaadm097 pabloqb2000 godlycs kaveesh680 himangshu007 keerthanaanandh balusu-bhanu-prakash jmintb irina694 tella26 naelmostafa cyberninja600 cristinalunaj doha-kim01 mohdnaser1 ricardiam yuhan-zheng0121 jiangjianglan jefferyjapheth ganzobtn zhangy10 meghanahs999 adityakansara8 mohd-danish-shakeel j-gc lucasalemes sabharish04 viky98 guangxuewang joyc7 vilerareza aklotzusc kikisanni ngtruongthanh98 akmmes mohankrishna12 ahm215 alan7749 derekila4 3effa binita-g emadyay siddhantgupta101 lw55sixo

wlasl's Issues

Some videos in asl2000.json not present in WLASL_v03.json

Describe the bug
The following video id's in asl2000.json are not present in WLASL_v03.json.

To Reproduce
Steps to reproduce the behavior:

Go to 'WLASL_v03.json'
Search for any of the following video id's: 20065, 20138, 48251, 13422, 16096, 39347, 57839, 47639, 12209, 51153, 60721, 09500.
See video_id not found

Expected behavior
Video id's in asl2000.json should be described in WLASL_v03.json, too.

Comments
I first thought it could be that those videos are not available anymore and that would be the reason why those ids are not present in WLASL_v03.json. However, as an example, video_id = 09500 with url = https://elementalaslconcepts.weebly.com/uploads/2/4/4/5/24454483/larva.mov is available for download.

Why does this happen?

How much time is reqired to download whole raw_videos ?

Cant download all video!

I clone this project to my local environment and run the script. A large part of the videos is correctly downloaded. BUT there are still almost 10% of videos that couldn't be download. Eventually, I can download only 18,915 videos, which contain 1736 invalid .swf files.
e.g.
ERROR:root: Unsuccessful downloading - youtube video url https://www.youtube.com/watch?v=YEFd7zsUNXU

plus: I tried many times.

About pose_per_individual_videos file

Hi,
About pose_per_individual_videos file, I download it from public link, and find it just own 4174 projects. Then when I was training train_tgcn.py, I got a file error FileNotFoundError: [Errno 2] No such file or directory: '\\32337\\image_00018_keypoints.json'.I looked into the pose_per_individual file. I found no 32337 directory in the pose_per_individual_videos. How to resolve this problem. I hope anyone can help me. Thank you a lot.

Models for training

Great work! when are you planning to upload the training models?

Test model not working due to config file

I tried testing the pre-trained models downloaded from drive after installing all python tools and I keep getting this error

  File "test_tgcn.py", line 101, in <module>
    configs = Config(config_file)
  File "/Users/mac/pythonstuff/WLASL-master/code/TGCN/configs.py", line 10, in __init__
    train_config = config['TRAIN']
  File "/usr/local/anaconda3/lib/python3.8/configparser.py", line 960, in __getitem__
    raise KeyError(key)
KeyError: 'TRAIN'```

I don't think this is a major issue but I am very new to this field and I can't seem to understand what wrong. Please help

Training

Dataset Size

This is not an issue strictly. We would like to know the size of the dataset so that when we download we can issue that much space in advance.

Thanks in advance :)

Edge device support

I don't think this can be classified as an issue since it is more of a question. But would it be possible to run this model on an Edge device?
Maybe by converting the .pt file to tflite file after exporting to onnx or pytorch mobile .ptl?

About WLASL benchmarks

Great repo and thanks for your effort on collecting such amazing dataset!

I am trying to run SLR experiment on WLASL. Where can I find the newest benchmarks of WLASL 2000? I have checked paperwithcode and cannot find any benchmarks uploaded. Can I consider the results reported in the dataset paper (Top-1 32.48, Top-5 57.31) as the state-of-the-art results?

About validation and test

I have 3 questions about the validation and test in I3D. (I use the subset WLASL100) Hope someone could help me.

In ./I3D/train.py line 57, it seems that it is the 'testing' sets rather than 'validation' sets are used for validation process. Thus I wonder that it is a mistake or just the special setting of this work.
I utilize the given weight './archived/asl100/FINAL_nslt_100_iters=896_top1=65.89_top5=84.11_top10=89.92.pt' to test the I3D model, and get the result: top1=67.07, top5=84.58, top10=90.25, which is different from the accuracy presented in the name of the weight. Why would this happen and which one should be chose as the final result？
I once trained the I3D model, using 'tesing' set for both validation and testing. But the validation results and the testing results are different, which makes me confused.

No module named 'datasets.nslt_dataset_all'

I have tried to run the test file but then the following error occurs

No module named 'datasets.nslt_dataset_all'

Training Query

I wanted to know that how are you training the model?
Are you only training the top classification layer or are you training the whole model?
If the whole model is being trained, isn't the dataset not big enough to prevent overfitting?
Also this model only replaces the I3D classification layer (600 classes) with the number of classes we suggest. Right? Or is it doing something else?

Target size is not the same as input size

！loc_loss = F.binary_cross_entropy_with_logits(per_frame_logits, labels)
per_frame_logits.shape = (1, 100, 75)
labels.shape = (1)
per_frame_logits.shape != labels.shape
How to solve this problem？

missing videos

Hi, I got a google drive link after submitting the misssing video list.
However, it seems all videos in the link are pre-processed to have 256x256 shape.
Could you let me know how bboxs should be changed to the 256x256 pre-processed videos?
Also, could you release missing videos of original resolutions?
Thanks!

Training the dataset

Is there an operating procedure for training the dataset?

Testing the mode

Hi there, I was trying to test the trained model, but I am getting the below error while testing -

('11772',) 0.6267942583732058 0.8133971291866029 0.8516746411483254
('17024',) 0.631578947368421 0.8181818181818182 0.8564593301435407
('17020',) 0.631578947368421 0.8181818181818182 0.861244019138756
('17730',) 0.6363636363636364 0.8229665071770335 0.8660287081339713
('62987',) 0.6411483253588517 0.8277511961722488 0.8708133971291866
('05727',) 0.6411483253588517 0.8277511961722488 0.8708133971291866
('38531',) 0.6411483253588517 0.8325358851674641 0.8755980861244019
('63219',) 0.6411483253588517 0.8373205741626795 0.8803827751196173
('06845',) 0.645933014354067 0.8421052631578947 0.8851674641148325
('57273',) 0.645933014354067 0.8421052631578947 0.8851674641148325
('27194',) 0.6507177033492823 0.84688995215311 0.8899521531100478
('28201',) 0.6555023923444976 0.8516746411483254 0.8947368421052632
top-k average per class acc: nan, nan, nan
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:98: RuntimeWarning: invalid value encountered in true_divide
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:99: RuntimeWarning: invalid value encountered in true_divide
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:100: RuntimeWarning: invalid value encountered in true_divide

Videos ignoring the start frame at preprocessing

The preprocessing.py file does not take in account the start frame when the video ends at the last frame of the video.

if end_frame <= 0: shutil.copyfile(src_video_path, dst_video_path) continue

I have found around 6 videos that ends at the last frame of the video but start after the first frame of the video (start_frame != 1) and the code copies the video to the destination folder as it is without slicing the video accordingly.

Information about the nslt_100.json

I opened the nslt_100.json file, read it. It shows 2 rows × 2038 columns. First row shows 'subset' (which I understand). But the second row shows 'action'. Can you help me what information this row is giving?
(Examples below)

subset: train | val | train | val | train | train | train | train | train
action: [77, 1, 55] | [27, 1, 51] | [82, 1, 48] | [82, 1, 39] | [82, 1, 50] | [82, 1, 203] | [82, 1, 46] | [82, 1, 108] | [82, 1, 39]

.swf file not converting into mp4 format

I'm facing issue while in preprocess step when run preprocess file this will discard .swf file and only save mp4 file into other folder.

where exactly we have to install youtube-dl.

Can you specify where exactly we have to install youtube-dl. For some reason, it's skipping only the youtube videos.

Originally posted by @sayanlearn in #1 (comment)

I3D pretrained model on WLASL

Is there a link to download the pretrained I3D model for WLASL? I am struggling with training it myself since I don't have a GPU.

Prepare pose result for TGCN model

Hi, I want to prepare custom dataset for training TGCN model, but the preprocessing step is described not clear enough.

We resize the resolution of all original video frames such that the diagonal size of the person bounding-box is 256 pixels

I check the origin file WLASL_v0.3.json, get bbox value, calculate the diagonal size of this box, then calculate the resize ratio. After resize image, i draw the pose result but the keypoint is not correct. I also found that the bbox value in file asl100.json is the same regardless the video.
Can you explain more detail about which steps you do in preprocessing?

Invalid link

Install youtube-dl for downloading YouTube videos

the above link says 404 error.

default folder for "find_missing" script

The default folder configured in the "find_missing.py" is the "videos", which only contains some samples of the downloaded videos. I realized that after I ran the script and get almost 21k lines in the missing.txt.

It doesn't make more sense to change this to raw_videos or raw_videos_mp4?

ASLSignBank Videos

Hey,

ASLSignBank seems to have changed their links.
To download, you now need to add a "ASL/" after the "glossvideo/" in the urls.

Anyone downloading it can just replace it via:
url = url.replace('glossvideo', 'glossvideo/ASL')

How can I get WLASL_vx.x.json file ??

AttributeError: 'NoneType' object has no attribute 'shape'

Hi,

I am opening an issue here since I am trying to run train_i3d.py, and I keep getting the same error. After trying a few things, it does not work. Here is the error, could anyone help me out please?

[Running] python -u "c:\Users\siste\OneDrive\Desktop\New folder\WLASL-master\WLASL-master\code\I3D\train_i3dVersion2.py"
root {'word': 'C:\\Users\\siste\\OneDrive\\Desktop\\New folder\\archive\\videos'} train_split C:/Users/siste/OneDrive/Desktop/New folder/WLASL-master/WLASL-master/code/I3D/preprocess\nslt_2000.json
types of data run --- config <class 'configs.Config'> mode <class 'str'> root <class 'dict'> save_model <class 'str'> train_split <class 'str'> weights <class 'NoneType'>
bs=6_ups=1_lr=0.0001_eps=0.001_wd=1e-08
Skipped videos:  0
10566
Skipped videos:  0
1414
Step 0/64000
----------
c:\Users\siste\OneDrive\Desktop\New folder\WLASL-master\WLASL-master\code\I3D\train_i3dVersion2.py:118: DeprecationWarning: `[np.int](http://np.int/)` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `[np.int](http://np.int/)`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  (num_classes, num_classes), dtype=[np.int](http://np.int/))
dataloaders <torch.utils.data.dataloader.DataLoader object at 0x000001F76FEB5790>
Traceback (most recent call last):
  File "c:\Users\siste\OneDrive\Desktop\New folder\WLASL-master\WLASL-master\code\I3D\train_i3dVersion2.py", line 220, in <module>
    run(configs=configs, mode=mode, root=root, save_model=save_model,
  File "c:\Users\siste\OneDrive\Desktop\New folder\WLASL-master\WLASL-master\code\I3D\train_i3dVersion2.py", line 121, in run
    for data in dataloaders[phase]:
  File "C:\Python39\lib\site-packages\torch\utils\data\dataloader.py", line 517, in __next__
    data = self._next_data()
  File "C:\Python39\lib\site-packages\torch\utils\data\dataloader.py", line 557, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "C:\Python39\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "C:\Python39\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "c:\Users\siste\OneDrive\Desktop\New folder\WLASL-master\WLASL-master\code\I3D\datasets\nslt_dataset.py", line 191, in __getitem__
    imgs = load_rgb_frames_from_video(
  File "c:\Users\siste\OneDrive\Desktop\New folder\WLASL-master\WLASL-master\code\I3D\datasets\nslt_dataset.py", line 59, in load_rgb_frames_from_video
    w, h, c = img.shape
AttributeError: 'NoneType' object has no attribute 'shape'

video not saved in /video folder showing some error

when I run Preprocess script then only .mp4 video saved in raw_videos_mp4 folder and discard .swf file and after that show this error and only 3 video saved in videos folder.

Finish converting formats..
2 videos\69241.mp4
Traceback (most recent call last):
File "preprocess.py", line 128, in
main()
File "preprocess.py", line 124, in main
extract_all_yt_instances(content)
File "preprocess.py", line 93, in extract_all_yt_instances
selected_frames = extract_frame_as_video(src_video_path, start_frame, end_frame)
File "preprocess.py", line 52, in extract_frame_as_video
frames = video_to_frames(src_video_path)
File "preprocess.py", line 28, in video_to_frames
ret, frame = cap.read()
cv2.error: OpenCV(4.4.0) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-cff9bdsm\opencv\modules\core\src\alloc.cpp:73: error: (-4:Insufficient memory) Failed to allocate 676080 bytes in function 'cv::OutOfMemoryError'

Takes long time to download dataset

I am trying to download the dataset but it takes years to do so.
It downloaded 3k videos out of the 20k in 5 hours and most of the urls return "missing".
Is there any direct link download I can use??

Thank you

Accuracy

When I tested the I3D model for 100 classes, I achieved the following accuracy:
top-k average per class acc: 0.670542635658915, 0.8923076923076927, 0.9346153846153845
0.67 for top 1 class
0.89 for top 5
0.93 for top 10

It is more than the ones mentioned in the title of pretrained weights?
Did I do something wrong? How can I get more than mentioned?

How to generate for my own JSON file

If I need to use WLASL dataset for training specific data classes , and Is there any method to generate the .jason (in preprocess floder)?

Video frame preprocess

Before doing data augmentation (random crop + random flip), is each frame a cropped image based on person bbox or the entire frame from the video?

Which glosses are represented in WLASL100, 300 and 1000

I can't seem to find which glosses are used in the subsets mentioned in your paper (WLASL100, WLASL300 and WLASL1000), the only information in the annotation file regarding data split is split between train/val/test sets. In the paper you mention "we select top-K glosses and organize them into four subsets". Did you use the first K glosses or what exactly does this mean?

Inference for a unseen video

Is it possible to start inferring for an unseen gloss video using the pretrained model?
Assuming I pre process the video to the expected JSON format as in WLASL.

About I3D test accuracy and confusion matrix

Hello,
I find that in test_i3d.py, the top-k accuracy are calculated as TP/(TP+FP), while the general recognition accuracy might be (TP+TN)/(TP+FP+TN+FN).
In train_i3d.py, the accuracy are calculated as acc = float(np.trace(confusion_matrix)) / np.sum(confusion_matrix), so I bulid the confusion_matrix in test_i3d.py too, and get acc = 0.0. (Using WLASL100, and the given weight FINAL_nslt_100_iters=896_top1=65.89_top5=84.11_top10=89.92.pt)
I wonder why this happened, or is there something wrong with my implement?

Start from line 104 in test_i3d.py:
` for data in dataloaders["test"]:
inputs, labels, video_id = data # inputs: b, c, t, h, w

    per_frame_logits = i3d(inputs)

    predictions = torch.max(per_frame_logits, dim=2)[0]
    out_labels = np.argsort(predictions.cpu().detach().numpy()[0])
    out_probs = np.sort(predictions.cpu().detach().numpy()[0])

    confusion_matrix[labels.item(), out_labels[0]] += 1

    if labels[0].item() in out_labels[-5:]:
        correct_5 += 1
        top5_tp[labels[0].item()] += 1
    else:
        top5_fp[labels[0].item()] += 1
    if labels[0].item() in out_labels[-10:]:
        correct_10 += 1
        top10_tp[labels[0].item()] += 1
    else:
        top10_fp[labels[0].item()] += 1
    if torch.argmax(predictions[0]).item() == labels[0].item():
        correct += 1
        top1_tp[labels[0].item()] += 1
    else:
        top1_fp[labels[0].item()] += 1
    print(video_id, float(correct) / len(dataloaders["test"]), float(correct_5) / len(dataloaders["test"]),
          float(correct_10) / len(dataloaders["test"]))

    # per-class accuracy
top1_per_class = np.mean(top1_tp / (top1_tp + top1_fp))
top5_per_class = np.mean(top5_tp / (top5_tp + top5_fp))
top10_per_class = np.mean(top10_tp / (top10_tp + top10_fp))
print('top-k average per class acc: {}, {}, {}'.format(top1_per_class, top5_per_class, top10_per_class))

np.save('confusion_matrix.npy', confusion_matrix)
acc = float(np.trace(confusion_matrix)) / np.sum(confusion_matrix)
print('acc= ', acc)`

None type on nslt_dataset.py", line 59

I have downloaded all the data and gotten the missing data. When I start running the nslt_100, I get the following error:

w, h, c = img.shape
AttributeError: 'NoneType' object has no attribute 'shape'

I made an if else statement to jump over these lines if success is false and also print out the file names. Here is the list on the first part of the first epoch:
/content/drive/MyDrive/WLASL2000/WLASL2000/40841.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/40841.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/50046.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/50046.mp4
Epoch 1 train Loc Loss: 0.3859 Cls Loss: 0.4550 Tot Loss: 0.4204 Accu :0.0000
/content/drive/MyDrive/WLASL2000/WLASL2000/13642.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/22120.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/22120.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/49183.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/57641.mp4
Epoch 1 train Loc Loss: 0.0812 Cls Loss: 0.1002 Tot Loss: 0.0907 Accu :0.0083
/content/drive/MyDrive/WLASL2000/WLASL2000/64292.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/06839.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/69531.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/22121.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/22121.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/36937.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/63673.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/10158.mp4
Epoch 1 train Loc Loss: 0.0574 Cls Loss: 0.0588 Tot Loss: 0.0581 Accu :0.0111
/content/drive/MyDrive/WLASL2000/WLASL2000/57285.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/57285.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/49181.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/49181.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/50044.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/14894.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/14894.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/14894.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/56848.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/63208.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/62169.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/32955.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/05638.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/20986.mp4
Epoch 1 train Loc Loss: 0.0592 Cls Loss: 0.0572 Tot Loss: 0.0582 Accu :0.0083
/content/drive/MyDrive/WLASL2000/WLASL2000/64218.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/64218.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/14893.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/14893.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/57639.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/57639.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/57639.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/38533.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/53275.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/23776.mp4
Epoch 1 train Loc Loss: 0.0592 Cls Loss: 0.0563 Tot Loss: 0.0578 Accu :0.0067
/content/drive/MyDrive/WLASL2000/WLASL2000/64091.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/64219.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/14680.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/14680.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/69422.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/69238.mp4
Epoch 1 train Loc Loss: 0.0589 Cls Loss: 0.0570 Tot Loss: 0.0580 Accu :0.0083
/content/drive/MyDrive/WLASL2000/WLASL2000/34742.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/34742.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/34742.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/13702.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/36936.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/14681.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/14681.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/27216.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/27216.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/13213.mp4
/content/drive/MyDrive/WLASL2000/WLASL2000/69413.mp4

Missing Videos

Can some one please provide the links for the missing videos. I have submitted missing.txt on the form provided but received no response.

Train_I3d.py

' No module named 'torch' ' . Whenever I try to run the train_i3d.py file, it shows this error. I have tried pip installing the modules. Although the modules do get installed, I also restart the kernel , but this error still comes up. If anyone can help me with it.
Waiting for the response.

Running out of RAM while running preprocess.py

Thank you for collecting such a large and comprehensive dataset.
When I run the preprocess.py file after downloading videos(19342 of them), it gives this error-

It's only preprocessing 42 videos and then it runs out of RAM. I have an 8GB RAM on my computer so I'm not sure why it's happening.

Training loss does not go down while using TGCN

Congrats on the excellent repo. I was training using the TGCN module. I had placed all the pose keypoints and correctly changed the config parameters in train_tgcn.py. However, I see that the training loss is not at all decreasing. May I know why might that be happening?

Processed Videos Start and End Frames

I noticed that for the Youtube videos where specific frames were being cut out, the processed video did not match the word. Looking into this issue, I noticed that the start and end frames in the JSON file did not match the same start and end frames in the video for the word (due to a different frame rate). For example, in this link: https://www.youtube.com/watch?v=F5Wef1_PtLk, the word "drink" is supposed to start at frame 5710 if the downloaded Youtube video is 25 fps. However, the downloaded youtube videos are 30 fps and the start and end frames need to be scaled accordingly (so it should start at frame 6852).

About train_i3d.py

When I am training I3D model, I get error at line 177 torch.save(i3d.module.state_dict(), model_name) in train_i3d.py.
The error is show that

Traceback (most recent call last): File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.3.1\plugins\python-ce\helpers\pydev\pydevd.py", line 1483, in _exec pydev_imports.execfile(file, globals, locals) # execute the script File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.3.1\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "D:/WLASLtest/code/I3D/train_i3d.py", line 204, in <module> run(configs=configs, mode=mode, root=root, save_model=save_model, train_split=train_split, weights=weights) File "D:/WLASLtest/code/I3D/train_i3d.py", line 177, in run torch.save(i3d.module.state_dict(), model_name) File "C:\Users\vipuser\.conda\envs\pytorch\lib\site-packages\torch\serialization.py", line 376, in save with _open_file_like(f, 'wb') as opened_file: File "C:\Users\vipuser\.conda\envs\pytorch\lib\site-packages\torch\serialization.py", line 230, in _open_file_like return _open_file(name_or_buffer, mode) File "C:\Users\vipuser\.conda\envs\pytorch\lib\site-packages\torch\serialization.py", line 211, in __init__ super(_open_file, self).__init__(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: 'D:/WLASLtest/code/I3D/checkpoints/checkpoints/nslt_300_000139_0.011976.pt'

The absolute path appears because the last time I ran this code, the error wasNo such file or directory:'checkpoints/nslt_300_000139_0.011976.pt` so I added the absolute path to my training folder after line 174。

I don't know about this file problem, Maybe I am not get the whole repo? I download all repo for github. Or I can get this .pt
file in anyway? My train model was saved successfully？or not?
Also, I am using a Windows system. Anybody can help me?

Problems with preprocess.py and ffmpeg

When running preprocess.py, the .mp4 videos get copied from raw_videos to raw_videos_mp4 as expected, but the .swf files are not converted. If I generate a report I get the following error:

ffmpeg version 4.3.2-2021-02-27-full_build-www.gyan.dev Copyright (c) 2000-2021 the FFmpeg developers
built with gcc 10.2.0 (Rev6, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-lzma --enable-libsnappy --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libdav1d --enable-libzvbi --enable-librav1e --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libilbc --enable-l libavutil 56. 51.100 / 56. 51.100
libavcodec 58. 91.100 / 58. 91.100
libavformat 58. 45.100 / 58. 45.100
libavdevice 58. 10.100 / 58. 10.100
libavfilter 7. 85.100 / 7. 85.100
libswscale 5. 7.100 / 5. 7.100
libswresample 3. 7.100 / 3. 7.100
libpostproc 55. 7.100 / 55. 7.100
Splitting the commandline.
Reading option '-report' ... matched as option 'report' (generate a report) with argument '1'.
Reading option '-loglevel' ... matched as option 'loglevel' (set logging level) with argument 'panic'.
Reading option '-i' ... matched as input url with argument '00340.swf'.
Reading option '00340.mp4' ... matched as output url.
Finished splitting the commandline.
Parsing a group of options: global .
Applying option report (generate a report) with argument 1.
Applying option loglevel (set logging level) with argument panic.
Successfully parsed a group of options.
Parsing a group of options: input url 00340.swf.
Successfully parsed a group of options.
Opening an input file: 00340.swf.
[NULL @ 000002575261ec80] Opening '00340.swf' for reading
[file @ 000002575261fd80] Setting default whitelist 'file,crypto,data'
[AVIOContext @ 0000025752627fc0] Statistics: 1436 bytes read, 0 seeks
00340.swf: Invalid data found when processing input

AttributeError: 'NoneType' object has no attribute 'shape'

Hello,
I'm getting this error:
{'word': '/content/drive/MyDrive/WLASL/code/I3D/data/videos'} preprocess/nslt_100.json
bs=6_ups=1_lr=0.0001_eps=0.001_wd=1e-08
Skipped videos: 0
275
Skipped videos: 0
92
Step 0/64000

12313
/content/drive/MyDrive/WLASL/code/I3D/data/videos/12313.mp4
70107
/content/drive/MyDrive/WLASL/code/I3D/data/videos/70107.mp4
23779
/content/drive/MyDrive/WLASL/code/I3D/data/videos/23779.mp4
70245
/content/drive/MyDrive/WLASL/code/I3D/data/videos/70245.mp4
06368
/content/drive/MyDrive/WLASL/code/I3D/data/videos/06368.mp4
05750
/content/drive/MyDrive/WLASL/code/I3D/data/videos/05750.mp4
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:3509: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:3635: UserWarning: Default upsampling behavior when mode=linear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode)
00618
/content/drive/MyDrive/WLASL/code/I3D/data/videos/00618.mp4
40843
/content/drive/MyDrive/WLASL/code/I3D/data/videos/40843.mp4
42974
/content/drive/MyDrive/WLASL/code/I3D/data/videos/42974.mp4
30841
/content/drive/MyDrive/WLASL/code/I3D/data/videos/30841.mp4
07069
/content/drive/MyDrive/WLASL/code/I3D/data/videos/07069.mp4
65445
/content/drive/MyDrive/WLASL/code/I3D/data/videos/65445.mp4
70211
/content/drive/MyDrive/WLASL/code/I3D/data/videos/70211.mp4
42838
/content/drive/MyDrive/WLASL/code/I3D/data/videos/42838.mp4
12319
/content/drive/MyDrive/WLASL/code/I3D/data/videos/12319.mp4
17722
/content/drive/MyDrive/WLASL/code/I3D/data/videos/17722.mp4
Traceback (most recent call last):
File "train_i3d.py", line 205, in
run(configs=configs, mode=mode, root=root, save_model=save_model, train_split=train_split, weights=weights)
File "train_i3d.py", line 116, in run
for data in dataloaders[phase]:
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 561, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/drive/My Drive/WLASL/code/I3D/datasets/nslt_dataset.py", line 182, in getitem
imgs = load_rgb_frames_from_video(self.root['word'], vid, start_f, total_frames)
File "/content/drive/My Drive/WLASL/code/I3D/datasets/nslt_dataset.py", line 57, in load_rgb_frames_from_video
w, h, c = img.shape
AttributeError: 'NoneType' object has no attribute 'shape'

I printed the paths of the videos to make sure I'm in the right directory, can you tell me what's the problem here?

Pose-based GRU model

Hi there, I am interested in pose-based GRU model so could you please add its implementation to the repository if possible? It would be quite helpful

Dataset Download

Hi sir,
I am really appreciated your work. But there is an issue, how can I download the dataset? I check your download code, each video has a different URL and type. Could you please share your code to download the dataset?

Code licensing

Hi @dxli94,
First, thanks a lot for this beautiful repo.

The license is clear about the use of the dataset. However, concerning the code (Pytorch implementation), is it open to projects with commercial purposes ?

Thanks in advance for your answer.
Best

creating my own dataset

Hello,

I'm trying to use my own dataset,
can you please tell me how?

I already have them in mp4 format, I placed them in the correct folder, changes the content of nslt_100.json to be :
{"0000":{"subset": "train", "action": [ 0 ,1, 93 ]},
"0001":{"subset": "train", "action": [ 0 ,1, 113 ]},
"0002":{"subset": "train", "action": [ 0 ,1, 108 ]},
"0003":{"subset": "val", "action": [ 0 ,1, 102 ]},
"0004":{"subset": "val", "action": [ 0 ,1, 124 ]},
"0005":{"subset": "test", "action": [ 0 ,1, 52 ]}}

Assuming it follows the following format:
vid_id :{"subset": "train/val/test", "action": [ word ,starting frame, ending frame ]}}

is that the correct way? because it prints len(dataset) = 0
and I'm not sure why since the dir path is correct, the dataset is small only 6 files so far, is that why?

ERROR: Unexpected bus error encountered in worker.

Hello, when I try to run the train_i3d.py, it appear the following errors during the first epoch training
#######
...
Epoch 1 train Loc Loss: 0.0019 Cls Loss: 0.0009 Tot Loss: 0.0014 Accu :0.8119
Epoch 1 train Loc Loss: 0.0014 Cls Loss: 0.0006 Tot Loss: 0.0010 Accu :0.8121
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 511, in _try_get_batch
data = self.data_queue.get(timeout=timeout)
File "/usr/local/lib/python3.6/multiprocessing/queues.py", line 104, in get
if not self._poll(timeout):
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 257, in poll
return self._poll(timeout)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 414, in _poll
r = wait([self], timeout)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 911, in wait
ready = selector.select(timeout)
File "/usr/local/lib/python3.6/selectors.py", line 376, in select
fd_event_list = self._poll.poll(timeout)
File "/usr/local/lib/python3.6/site-packages/torch/utils/data/_utils/signal_handling.py", line 63, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 30920) is killed by signal: Bus error.

It struggle me for long time and need some help, thanks