Coder Social home page Coder Social logo

synthesizing_obama_network_training's Introduction

This is research-code for Synthesizing Obama: Learning Lip Sync from Audio.
Supasorn Suwajanakorn, Steven M. Seitz, Ira Kemelmacher-Shlizerman
SIGGRAPH 2017

Code tested using tensorflow 0.11.0 Please see Supasorn's website for the overview.

To generate MFCC, first normalize the input audio using https://github.com/slhck/ffmpeg-normalize. Then use Sphinx III's snippet by David Huggins-Daines with a modified routine that saves log energy and timestamps:

def sig2s2mfc_energy(self, sig, dn):
  nfr = int(len(sig) / self.fshift + 1)

  mfcc = numpy.zeros((nfr, self.ncep + 2), 'd')
  fr = 0
  while fr < nfr:
    start = int(round(fr * self.fshift))
    end = min(len(sig), start + self.wlen)
    frame = sig[start:end]
    if len(frame) < self.wlen:
      frame = numpy.resize(frame,self.wlen)
      frame[self.wlen:] = 0
    mfcc[fr,:-2] = self.frame2s2mfc(frame)
    mfcc[fr, -2] = math.log(1 + np.mean(np.power(frame.astype(float), 2)))
    mid = 0.5 * (start + end - 1)
    mfcc[fr, -1] = mid / self.samprate

    fr = fr + 1
  return mfcc

synthesizing_obama_network_training's People

Contributors

supasorn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

synthesizing_obama_network_training's Issues

cPickel file not installed

Hiii, i am trying many times to install cPickel in my system but dont get proper results. please help to intall cPickel file

Version of tensorflow used?

Hi. I am experiencing this crash when training a model. I believe this is due to me using the latest version of tensorflow. Any thoughts? Thanks.

RDfpd8GV9dI}}02
load preprocessed 2 2
Traceback (most recent call last):
  File "run.py", line 225, in
    main()
  File "run.py", line 222, in main
    s = Speech()
  File "run.py", line 52, in init
    self.train()
  File "/Users/august/synthesizing_obama_network_training/util.py", line 218, in train
    with tf.Session() as sess:
AttributeError: 'module' object has no attribute 'Session'

Evaluation method

Is there any method to evaluate the model that we train? Is there any test set? How can we measure the performance of the model? Also, It is still an issue that how to create mouth landmark points from 20D PCA coefficents. Is there anyone to solve this? Thanks.

How to use the trained model

Hi, Thanks for sharing this awesome peace of work. I trained the model with the data you provided using following command:
python run.py --save_dir ./saveme
Now, I want to test this using my voice. How should I run the code ?

Tutorial

Hello, could you maybe make a tutorial from how to run this, to train the model, to inputing your own audio file? Thank you

something wrong with multi-layer networks

Hi,
when I tried to change the network to be multi-layer networks, I came across the error "Trying to share variable rnnlm/multi_rnn_cell/cell_0/lstm_cell/kernel,but specified shape (120, 240) and found shape (88, 240)“ in train process. I tried a lot, but did not solve it, could you please tell me how to solve it since I notice that you constructed multi-layer networks in the related paper? Thanks ^_^

About the pipeline to video

Hi, I train and generate the result (.txt with lots of parameters), what's next to generate a video? How could I generate a video with the result? thanks.

google colab

I am trying to train this in google colab but confused about the process

Error with numpy concatenate

When I run the program, I am able to preprocess the data file but then I run into the error below. Any idea what the problem is? I printed the lst from np.concatenate(lst) to the console and it was quite long...

command i am running: python run.py --input2 /home/ec2-user/synthesizing_obama_network_training/obama_data/audio/normalized-cep13/zYGok_gHfY0.wav.npy --save_dir ./saveme

...
vIPUrZuLlCQ}}01
vIPUrZuLlCQ}}02
vIPUrZuLlCQ}}03
vIPUrZuLlCQ}}04
WjX0iJU3vtY (34257, 15) training
WjX0iJU3vtY}}00
WjX0iJU3vtY}}01
WjX0iJU3vtY}}02
WjX0iJU3vtY}}03
WjX0iJU3vtY}}04
WjX0iJU3vtY}}05
WjX0iJU3vtY}}06
WjX0iJU3vtY}}07
WjX0iJU3vtY}}08
kV_D6avFtdo (31768, 15) training
kV_D6avFtdo}}00
kV_D6avFtdo}}01
kV_D6avFtdo}}02
kV_D6avFtdo}}03
kV_D6avFtdo}}04
kV_D6avFtdo}}05
Traceback (most recent call last):
File "run.py", line 225, in
main()
File "run.py", line 222, in main
s = Speech()
File "run.py", line 42, in init
self.loadData()
File "/home/weclash247/synthesizing_obama_network_training/util.py", line 129, in loadData
meani, stdi, meano, stdo = self.normalize(inps, outps)
File "/home/weclash247/synthesizing_obama_network_training/util.py", line 103, in normalize
meani, stdi = normalizeData(inps["training"], "save/" + self.args.save_dir, "statinput", ["fea%02d" % x for x i
n range(inps["training"][0].shape[1])], normalize=self.args.normalizeinput)
File "/home/weclash247/synthesizing_obama_network_training/util.py", line 46, in normalizeData
allstrokes = np.concatenate(lst)
MemoryError

I am running on a machine with little RAM.. I will try upgrading RAM and seeing if this fixes it.

How to generate 20-dimensional landmark through Obama video frames?

Hello, I recently researched the code of the Synthesizing Obama network you wrote. The video got 2 or 3 or 4 dump files after the 29.97 fps sequence. I have two questions about this:
(1) For a single obama video, if the beginning and end of the removed video does not contain the character part, why is it divided into multiple dumps in the middle, what is the reason for doing this?
(2) For the mouth feature, you use the way to detect the mouth mark, giving 18 points along the outer and inner contours of the lip. We reshaped each 18-point shape into a 36-D vector, applied PCA on all frames, and represented each mouth shape by the coefficients of the first 20 PCA coefficients. But there is no such part in the code you give, is it convenient to put this up because it makes us look like a black box?

I am looking forward to your time to answer my current questions, thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.