Coder Social home page Coder Social logo

align_friends's Introduction

align_friends

Module to help align subtitles to audio using penn 2 forced aligner

REQUIREMENTS (see appendix for hints or more technical information) try to do these in order

This was done on with

Distributor ID: Ubuntu
Description:    Ubuntu 18.04.2 LTS
Release:        18.04
Codename:       bionic

how to use this?

1. Get the subtitles for an episode(s). I include examples from Season 2 in subtitles\friends-season2\. They should be in either following format extensions:
a) .srt which is of the form

    1
    00:00:02,877 --> 00:00:04,294
    words words etc
    
    2
    00:00:04,407 --> 00:00:05,891
    words words etc
    words words etc

b) .sub which is of the form

    {1}{45}words words etc
    {48}{55}words words etc

2. Get the mp3/mkv/wav file for the episode(s). The Penn 2 Forced Aligner (p2fa) is expecting a certain input. Using 'ffmpeg' it is possible to convert the file:

import subprocess
import os
import sys

def convert_video_to_audio_ffmpeg(video_file, output_ext="wav"):
  """Converts video to audio directly using `ffmpeg` command
  with the help of subprocess module"""
  filename, ext = os.path.splitext(video_file)
  # 16 bit mono sampled 16000
  subprocess.call(["ffmpeg", "-y", "-i", video_file,"-acodec", "pcm_s16le", "-ac", "1", "-ar", "16000", f"{filename}.{output_ext}"],
                  stdout=subprocess.DEVNULL,
                  stderr=subprocess.STDOUT)

or using cmd line / terminal

ffmpeg -i input.mp3 -acodec pcm_s16le -ac 1 -ar 16000 output.wav

3. Transform the subtitles into a json file that can be given to p2fa.
For ease of use:

python process_subtitles.py

and all you have to modify is
a) choose the season season = {x} (assuming subtitles are stored in subtitles/friends-season-{x})
b) run it twice for each episode for the first half ("a") and the second half ("b") ending="a"
c) add midpoint time / frame for each episode times = {1:"00:12:23", 2:"00:11:43", 3: 17068, 4: 16834}
or
get the file in this format: NOTE VERY IMPORTANT if you have it as },\n] instead of }\n] at the end it will not work.

    [
	  {
		  "speaker": "Narrator", 
		  "line": "What you guys don't understand is...  "
	  },
	  {
		  "speaker": "Narrator", 
		  "line": "...for us, kissing is as important as any part of it. "
	  }
  ]

4. Now align the text to the audio! In my fork of p2fa-vislab align.py and align_subtitles.py are how to accomplish this feat.

cd p2fa-vislab  
python align.py {wav} {json} {outputfile}

or

cd p2fa-vislab  
python align_subtitles.py

where
a) wav files were in audios/s{s}/friends_s{s}e{stri}a.wav
b) json files in processed_subtitles/s{s}/s0{s}e{stri}a.json
c) aligned_output folder exists

Appendix

Some Helpful Tips for Downloading HTK

  • When you make if you get the error /usr/include/stdio.h:27:10: fatal error: bits/libc-header-start.h: No such file or directory #include <bits/libc-header-start.h>
    Do this: sudo apt-get install gcc-multilib or sudo apt-get install g++-multilib to install the missing 32 bit libraries per this
  • When you make if you get the error "/usr/bin/ld: cannot find -lX11" error when installing htk
    Do this sudo apt-get install libx11-devper this
  • When you make if you get the error "gnu/stubs-32.h: No such file or directory"
    Do one of these per this (there is other systems in that answer)
    • UBUNTU sudo apt-get install libc6-dev-i386
    • CentOS 5.8 The package is glibc-devel.i386
    • CentOS 6 /7 The package is glibc-devel.i686

Why do we need those other python modules?

See my blog post here

align_friends's People

Contributors

zacandcheese avatar

Watchers

 avatar Isil Bilgin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.