Coder Social home page Coder Social logo

psola's Introduction

Time-domain pitch-synchronous overlap-add (TD-PSOLA)

PyPI License: GPL v3 Downloads

This module is for constant- and variable-rate pitch-shifting and time-stretching of speech. It is a wrapper around the parselmouth [1] wrapper around the Praat [2] implementation of TD-PSOLA [3]. Pitch-shifting is performed by providing a numpy array of target pitch values equally spaced over time. Variable-rate time stretching uses forced phoneme alignment via pypar.

If you need to extract pitch features or phoneme alignments, see penn for pitch estimation and pyfoal for forced alignment. If you only want to perform pitch-shifting, you do not need to extract forced alignments. If you want to do variable-rate time stretching, you do not need to perform pitch estimation.

Installation

pip install psola

Usage

If you want to perform pitch-shifting or time-stretching on audio already loaded into memory, use psola.vocode. If you want to do this with audio saved in a file, use psola.from_file. You can use psola.to_file or psola.from_file_to_file to save the results to a file. To process many files at once with multiprocessing, use psola.from_files_to_files. Each of these functions is documented below. The command-line interface wraps the arguments of psola.from_files_to_files and is described in the next section.

psola.vocode

"""Performs pitch vocoding using Praat

Arguments
    audio : np.array(shape=(samples,))
        The speech signal to process
    sample_rate : int
        The audio sampling rate.
    source_alignment : pypar.Alignment
        The current alignment if performing time-stretching
    target_alignment : pypar.Alignment
        The target alignment if performing time-stretching
    target_pitch : np.array(shape=(frames,))
        The target pitch contour
    constant_stretch : float or None
        A constant value for time-stretching
    fmin : int
        The minimum allowable frequency in Hz.
    fmax : int
        The maximum allowable frequency in Hz.

Returns
    audio : np.array(shape=(samples,))
        The vocoded audio
"""

psola.from_file

"""Performs vocoding using Praat

Arguments
    audio_file : string
        The file containing the speech signal to process
    source_alignment_file : string or None
        The file containing the original alignment
    target_alignment_file : string or None
        The file containing the target alignment
    target_pitch_file : string or None
        The file containing the target pitch
    constant_stretch : float or None
        A constant value for time-stretching
    fmin : int
        The minimum allowable frequency in Hz.
    fmax : int
        The maximum allowable frequency in Hz.

Returns
    audio : np.array(shape=(samples,))
        The vocoded audio
    sample_rate : int
        The audio sampling rate
"""

psola.to_file

"""Performs pitch vocoding and saves audio to disk

Arguments
    audio : np.array(shape=(samples,))
        The speech signal to process
    sample_rate : int
        The audio sampling rate
    output_file : string
        The file to save the vocoded speech
    source_alignment : pypar.Alignment
        The current alignment if performing time-stretching
    target_alignment : pypar.Alignment
        The target alignment if performing time-stretching
    target_pitch : np.array(shape=(frames,))
        The target pitch contour
    constant_stretch : float or None
        A constant value for time-stretching
    fmin : int
        The minimum allowable frequency in Hz.
    fmax : int
        The maximum allowable frequency in Hz.
"""

psola.from_file_to_file

"""Performs vocoding using Praat and save to disk

Arguments
    audio_file : string
        The file containing the speech signal to process
    output_file : string
        The file to save the vocoded speech
    source_alignment_file : string or None
        The file containing the original alignment
    target_alignment_file : string or None
        The file containing the target alignment
    target_pitch_file : string or None
        The file containing the target pitch
    constant_stretch : float or None
        A constant value for time-stretching
    fmin : int
        The minimum allowable frequency in Hz.
    fmax : int
        The maximum allowable frequency in Hz.
"""

psola.from_files_to_files

"""Performs vocoding using Praat and save to disk

Arguments
    audio_files : list
        The files containing the speech signals to process
    output_files : list
        The files to save the vocoded speech
    source_alignment_files : string or None
        The files containing the original alignments
    target_alignment_files : list or None
        The files containing the target alignments
    target_pitch_files : list or None
        The files containing the target pitch
    constant_stretch : float or None
        A constant value for time-stretching
    fmin : int
        The minimum allowable frequency in Hz.
    fmax : int
        The maximum allowable frequency in Hz.
"""

Command-line interface

usage: python -m psola
    [-h]
    [--audio_files AUDIO_FILES [AUDIO_FILES ...]]
    [--source_alignment_files SOURCE_ALIGNMENT_FILES [SOURCE_ALIGNMENT_FILES ...]]
    [--target_alignment_files TARGET_ALIGNMENT_FILES [TARGET_ALIGNMENT_FILES ...]]
    [--constant_stretch CONSTANT_STRETCH]
    [--target_pitch_files TARGET_PITCH_FILES [TARGET_PITCH_FILES ...]]
    [--fmin FMIN]
    [--fmax FMAX]
    [--output_files OUTPUT_FILES [OUTPUT_FILES ...]]

optional arguments:
  -h, --help            show this help message and exit
  --audio_files AUDIO_FILES [AUDIO_FILES ...]
                        The speech signal to process
  --source_alignment_files SOURCE_ALIGNMENT_FILES [SOURCE_ALIGNMENT_FILES ...]
                        The files containing the original alignments
  --target_alignment_files TARGET_ALIGNMENT_FILES [TARGET_ALIGNMENT_FILES ...]
                        The files containing the target alignments
  --constant_stretch CONSTANT_STRETCH
                        A constant value for time-stretching
  --target_pitch_files TARGET_PITCH_FILES [TARGET_PITCH_FILES ...]
                        The target pitch contour
  --fmin FMIN           The minimum allowable frequency in Hz
  --fmax FMAX           The maximum allowable frequency in Hz
  --output_files OUTPUT_FILES [OUTPUT_FILES ...]
                        Where to save the vocoded audio

References

[1] Y. Jadoul, B. Thompson, and B. De Boer, "Introducing parselmouth: A python interface to praat," Journal of Phonetics, vol. 71, pp. 1โ€“15, 2018.

[2] P. Boersma, "Praat: doing phonetics by computer", http://www.praat.org/, 2006.

[3] E. Moulines and F. Charpentier, "Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones," Speech communication, 1990.

psola's People

Contributors

maxrmorrison avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.