The implementation of 'Watch, Listen, Attend and Spell’ (WLAS) network that learns to transcribe videos of mouth motion to character on pytorch.
- dlib
- opencv3
- pytorch
name@name:~$ git clone https://github.com/artem179/WLAS.git
name@name:~$ ./scripts/download_model
name@name:~$ python utils/detection.py path/to/your/video/file.mp4
name@name:~$ cd faces
name@name:~$ ls