supasorn / synthesizing_obama_network_training Goto Github PK

Python 100.00%

synthesizing_obama_network_training's Introduction

This is research-code for Synthesizing Obama: Learning Lip Sync from Audio.
Supasorn Suwajanakorn, Steven M. Seitz, Ira Kemelmacher-Shlizerman
SIGGRAPH 2017

Code tested using tensorflow 0.11.0 Please see Supasorn's website for the overview.

To generate MFCC, first normalize the input audio using https://github.com/slhck/ffmpeg-normalize. Then use Sphinx III's snippet by David Huggins-Daines with a modified routine that saves log energy and timestamps:

def sig2s2mfc_energy(self, sig, dn):
  nfr = int(len(sig) / self.fshift + 1)

  mfcc = numpy.zeros((nfr, self.ncep + 2), 'd')
  fr = 0
  while fr < nfr:
    start = int(round(fr * self.fshift))
    end = min(len(sig), start + self.wlen)
    frame = sig[start:end]
    if len(frame) < self.wlen:
      frame = numpy.resize(frame,self.wlen)
      frame[self.wlen:] = 0
    mfcc[fr,:-2] = self.frame2s2mfc(frame)
    mfcc[fr, -2] = math.log(1 + np.mean(np.power(frame.astype(float), 2)))
    mid = 0.5 * (start + end - 1)
    mfcc[fr, -1] = mid / self.samprate

    fr = fr + 1
  return mfcc

synthesizing_obama_network_training's People

Contributors

Stargazers

Watchers

Forkers

joseph-zhong prpankajsingh cvhci-plumcot aclaussen1 linksyncjameshwartlopez yonil jameshwartlopez muchnesss 0i0 kishorevasan xiaoyun4 cydream famasoon abdulhalimaliakbar kiran-raja phuclb1 ajay01994 prashantraina zh794390558 mindis shirleysr niemes 1165048017 david30907d enormousbug ilovecv lemolemac lockejiang xjwxjw cash2one pgmorgan pool-martin shendfff xiaoyeye1117 colijian letsdodatascience young-sun richgit101 esmaeilinia konatasick codeaudit sofwerx marvin521 donhuvy kurnianggoro codeinpeace robinrowe michalliu visionandy bhaveshneekhra acewjh masoudj byzhang abitbetter gogumee kantapithm peter064226 shaw-papadino jia66 morozov-dmitry gdcollect yudie433 liujianzhao6328057 yuzhan2015 pengyun1314123 wangrun guanbo-bao human2b peterzhousz matchading jbdatascience anjanikumar496 knightofdawn msgpo techlowd sjy234sjy234 jokecorleone vasline guomin chenlijn angusharrington kk1997-oct warhammer0 sshuster disinformationlab zouaghihoussem bh-tito ankit650-d bugdiaries liujingxiu23 sahilg06 pluszzh mychiux413 wildkong vs5938 pabmar68hotmail hanifeoglu dreamfarwhb

synthesizing_obama_network_training's Issues

Version of tensorflow used?

Hi. I am experiencing this crash when training a model. I believe this is due to me using the latest version of tensorflow. Any thoughts? Thanks.

RDfpd8GV9dI}}02
load preprocessed 2 2
Traceback (most recent call last):
  File "run.py", line 225, in
    main()
  File "run.py", line 222, in main
    s = Speech()
  File "run.py", line 52, in init
    self.train()
  File "/Users/august/synthesizing_obama_network_training/util.py", line 218, in train
    with tf.Session() as sess:
AttributeError: 'module' object has no attribute 'Session'

how to do video retime?

How to extract training data from original video

I have original videos but i don't know how to extract training data to ".bin" file from these videos. I want to create our virtual MC from the videos, please help me!

google colab

I am trying to train this in google colab but confused about the process

Evaluation method

Is there any method to evaluate the model that we train? Is there any test set? How can we measure the performance of the model? Also, It is still an issue that how to create mouth landmark points from 20D PCA coefficents. Is there anyone to solve this? Thanks.

How to generate 20-dimensional landmark through Obama video frames?

Hello, I recently researched the code of the Synthesizing Obama network you wrote. The video got 2 or 3 or 4 dump files after the 29.97 fps sequence. I have two questions about this:
(1) For a single obama video, if the beginning and end of the removed video does not contain the character part, why is it divided into multiple dumps in the middle, what is the reason for doing this?
(2) For the mouth feature, you use the way to detect the mouth mark, giving 18 points along the outer and inner contours of the lip. We reshaped each 18-point shape into a 36-D vector, applied PCA on all frames, and represented each mouth shape by the coefficients of the first 20 PCA coefficients. But there is no such part in the code you give, is it convenient to put this up because it makes us look like a black box?

I am looking forward to your time to answer my current questions, thank you very much!

Is there a pre-trained model available?

cPickel file not installed

Hiii, i am trying many times to install cPickel in my system but dont get proper results. please help to intall cPickel file

Error with numpy concatenate

When I run the program, I am able to preprocess the data file but then I run into the error below. Any idea what the problem is? I printed the lst from np.concatenate(lst) to the console and it was quite long...

command i am running: python run.py --input2 /home/ec2-user/synthesizing_obama_network_training/obama_data/audio/normalized-cep13/zYGok_gHfY0.wav.npy --save_dir ./saveme

...
vIPUrZuLlCQ}}01
vIPUrZuLlCQ}}02
vIPUrZuLlCQ}}03
vIPUrZuLlCQ}}04
WjX0iJU3vtY (34257, 15) training
WjX0iJU3vtY}}00
WjX0iJU3vtY}}01
WjX0iJU3vtY}}02
WjX0iJU3vtY}}03
WjX0iJU3vtY}}04
WjX0iJU3vtY}}05
WjX0iJU3vtY}}06
WjX0iJU3vtY}}07
WjX0iJU3vtY}}08
kV_D6avFtdo (31768, 15) training
kV_D6avFtdo}}00
kV_D6avFtdo}}01
kV_D6avFtdo}}02
kV_D6avFtdo}}03
kV_D6avFtdo}}04
kV_D6avFtdo}}05
Traceback (most recent call last):
File "run.py", line 225, in
main()
File "run.py", line 222, in main
s = Speech()
File "run.py", line 42, in init
self.loadData()
File "/home/weclash247/synthesizing_obama_network_training/util.py", line 129, in loadData
meani, stdi, meano, stdo = self.normalize(inps, outps)
File "/home/weclash247/synthesizing_obama_network_training/util.py", line 103, in normalize
meani, stdi = normalizeData(inps["training"], "save/" + self.args.save_dir, "statinput", ["fea%02d" % x for x i
n range(inps["training"][0].shape[1])], normalize=self.args.normalizeinput)
File "/home/weclash247/synthesizing_obama_network_training/util.py", line 46, in normalizeData
allstrokes = np.concatenate(lst)
MemoryError

I am running on a machine with little RAM.. I will try upgrading RAM and seeing if this fixes it.

How to use the trained model

Hi, Thanks for sharing this awesome peace of work. I trained the model with the data you provided using following command:
python run.py --save_dir ./saveme
Now, I want to test this using my voice. How should I run the code ?

Tutorial

Hello, could you maybe make a tutorial from how to run this, to train the model, to inputing your own audio file? Thank you

something wrong with multi-layer networks

Hi,
when I tried to change the network to be multi-layer networks, I came across the error "Trying to share variable rnnlm/multi_rnn_cell/cell_0/lstm_cell/kernel,but specified shape (120, 240) and found shape (88, 240)“ in train process. I tried a lot, but did not solve it, could you please tell me how to solve it since I notice that you constructed multi-layer networks in the related paper? Thanks ^_^

missing sample_videoinput method

I've checked https://github.com/supasorn/synthesizing_obama_network_training/blob/master/run.py and sample_audioinput is there but sample_videoinput is missing.

About the pipeline to video

Hi, I train and generate the result (.txt with lots of parameters), what's next to generate a video? How could I generate a video with the result? thanks.