Coder Social home page Coder Social logo

thetobysiu / deepstory Goto Github PK

View Code? Open in Web Editor NEW
93.0 3.0 22.0 8 MB

Deepstory turns a text/generated text into a video where the character is animated to speak your story using his/her voice.

Home Page: https://blog.thetobysiu.com

Python 79.28% HTML 12.23% CSS 0.71% JavaScript 7.78%
pytorch dctts dc-tts deep-learning jquery flask gpt-2 gpt2 witcher-3 witcher3

deepstory's Introduction

Deepstory

Deepstory is an artwork that incorporates Natural Language Generation(NLG) w/GPT-2, Text-to-Speech(TTS) w/Deep Convolutional TTS, speech to animation w/Speech driven animation and image animation w/First Order Motion Model into a media application.

To put it simply, it turns a text/generated text into a video where the character is animated to speak your story using his/her voice.

You can convert image into a video like this:

result

It provides a comfortable web interface and backend written with flask to create your own story.

It supports transformers model, and pytorch-dctts models

Live Demo

Colab (flask-ngrok): https://colab.research.google.com/drive/1HYCPUmFw5rN8kvZdwzFpfBlaUMWPNHas?usp=sharing

Video (In case you need instructions): https://blog.thetobysiu.com/video/

Updates

  1. Redesign interface, especially the whole GPT2 interface
  2. GPT2 now support text loading from original data, so that it can continue to generate a story based on the book
  3. Figure out the token limits in GPT2 and only infer to the nearest 1024 - predict length tokens
  4. GPT2 support interactive mode that generates several batches of sentences and provides an interface to add those sentence
  5. Sentence speaker mapping system, not replacing all speaker by default anymore
  6. text normalization is now in the synthesizing stage so that punctuations are preserved and can be referenced to have a variable duration in synthesized audio
  7. Audio synthesizing are now all in temp folder, synthesized audios are trimmed so that it's animated video is more accurate(sda mode trained data are short also)
  8. Combined audio now have variable silences according to punctuation
  9. Basically, rewrite the web interface and lots of codes...

Colab version will be available soon!

Interface

Folder structure

Deepstory
├── animator.py
├── app.py
├── data
│   ├── dctts
│   │   ├── Geralt
│   │   │   ├── ssrn.pth
│   │   │   └── t2m.pth
│   │   ├── LJ
│   │   │   ├── ssrn.pth
│   │   │   └── t2m.pth
│   │   └── Yennefer
│   │       ├── ssrn.pth
│   │       └── t2m.pth
│   ├── fom
│   │   ├── vox-256.yaml
│   │   ├── vox-adv-256.yaml
│   │   ├── vox-adv-cpk.pth.tar
│   │   └── vox-cpk.pth.tar
│   ├── gpt2
│   │   ├── Waiting for Godot
│   │   │   ├── config.json
│   │   │   ├── default.txt
│   │   │   ├── merges.txt
│   │   │   ├── pytorch_model.bin
│   │   │   ├── special_tokens_map.json
│   │   │   ├── text.txt
│   │   │   ├── tokenizer_config.json
│   │   │   └── vocab.json
│   │   └── Witcher Books
│   │       ├── config.json
│   │       ├── default.txt
│   │       ├── merges.txt
│   │       ├── pytorch_model.bin
│   │       ├── special_tokens_map.json
│   │       ├── text.txt
│   │       ├── tokenizer_config.json
│   │       └── vocab.json
│   ├── images
│   │   ├── Geralt
│   │   │   ├── 0.jpg
│   │   │   └── fx.jpg
│   │   └── Yennefer
│   │       ├── 0.jpg
│   │       ├── 1.jpg
│   │       ├── 2.jpg
│   │       ├── 3.jpg
│   │       ├── 4.jpg
│   │       └── 5.jpg
│   └── sda
│       ├── grid.dat
│       └── image.bmp
├── deepstory.py
├── generate.py
├── modules
│   ├── dctts
│   │   ├── audio.py
│   │   ├── hparams.py
│   │   ├── __init__.py
│   │   ├── layers.py
│   │   ├── ssrn.py
│   │   └── text2mel.py
│   ├── fom
│   │   ├── animate.py
│   │   ├── dense_motion.py
│   │   ├── generator.py
│   │   ├── __init__.py
│   │   ├── keypoint_detector.py
│   │   ├── sync_batchnorm
│   │   │   ├── batchnorm.py
│   │   │   ├── comm.py
│   │   │   ├── __init__.py
│   │   │   └── replicate.py
│   │   └── util.py
│   └── sda
│       ├── encoder_audio.py
│       ├── encoder_image.py
│       ├── img_generator.py
│       ├── __init__.py
│       ├── rnn_audio.py
│       ├── sda.py
│       └── utils.py
├── README.md
├── requirements.txt
├── static
│   ├── bootstrap
│   │   ├── css
│   │   │   └── bootstrap.min.css
│   │   └── js
│   │       └── bootstrap.min.js
│   ├── css
│   │   └── styles.css
│   └── js
│       └── jquery.min.js
├── templates
│   ├── animate.html
│   ├── deepstory.js
│   ├── gen_sentences.html
│   ├── gpt2.html
│   ├── index.html
│   ├── map.html
│   ├── models.html
│   ├── sentences.html
│   ├── status.html
│   └── video.html
├── test.py
├── text.txt
├── util.py
└── voice.py

Complete project download

They are available at the google drive version of this project. All the models(including Geralt, Yennefer) are included.

You have to download the spacy english model first.

make sure you have ffmpeg installed in your computer, and ffmpeg-python installed.

https://drive.google.com/drive/folders/1AxORLF-QFd2wSORzMOKlvCQSFhdZSODJ?usp=sharing

To simplify things, a google colab version will be released soon...

Requirements

It is required to have an nvidia GPU with at least 4GB of VRAM to run this project

Credits

https://github.com/tugstugi/pytorch-dc-tts

https://github.com/DinoMan/speech-driven-animation

https://github.com/AliaksandrSiarohin/first-order-model

https://github.com/huggingface/transformers

Notes

The whole project uses PyTorch, while tensorflow is listed in requirements.txt, it was used for transformers to convert a model trained from gpt-2-simple to a Pytorch model.

Only the files inside modules folder are slightly modified from the original. The remaining files are all written by me, except some parts that are referenced.

Bugs

There's still some memory issues if you synthesize sentences within a session over and over, but it takes at least 10 times to cause memory overflow.

Training models

There are other repos of tools that I created to preprocess the files. They can be found in my profile.

deepstory's People

Contributors

thetobysiu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

deepstory's Issues

Share link broke, instructions needed!

This project is really coooool! However, seems the share link , both Colab and google drive, broke. Could you please add some more instructions about how to run this project from source code? Wanna try it very much. Thx!

Use it as live

I want to connect it with an ai called "vicuna" is there a way that it receive text without the ui, and remain afk waiting till the second text

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.