maxrmorrison / pyfoal Goto Github PK

Python forced alignment

License: MIT License

Python 22.13% Shell 0.07% Jupyter Notebook 77.80%

phoneme speech alignment

pyfoal's Introduction

Python forced alignment

Forced alignment suite. Includes English grapheme-to-phoneme (G2P) and phoneme alignment from the following forced alignment tools.

RAD-TTS [1]
Montreal Forced Aligner (MFA) [2]
Penn Phonetic Forced Aligner (P2FA) [3]

RAD-TTS is used by default. Alignments can be saved to disk or accessed via the pypar.Alignment phoneme alignment representation. See pypar for more details.

pyfoal also includes the following

Converting alignments to and from a categorical representation suitable for training machine learning models (pyfoal.convert)
Natural interpolation of forced alignments for time-stretching speech (pyfoal.interpolate)

Installation
Inference
- Application programming interface
- Command-line interface
Training
- Download
- Preprocess
- Partition
- Train
- Monitor
- Evaluate
References

Installation

pip install pyfoal

MFA and P2FA both require additional installation steps found below.

Montreal Forced Aligner (MFA)

conda install -c conda-forge montreal-forced-aligner

Penn Phonetic Forced Aligner (P2FA)

P2FA depends on the Hidden Markov Model Toolkit (HTK), which has been tested on Mac OS and Linux using HTK version 3.4.0. There are known issues in using version 3.4.1 on Linux. HTK is released under a license that prohibits redistribution, so you must install HTK yourself and verify that the commands HCopy and HVite are available as system-wide binaries. After downloading HTK, I use the following for installation on Linux.

sudo apt-get install -y gcc-multilib libx11-dev
sudo chmod +x configure
./configure --disable-hslab
make all
sudo make install

For more help with HTK installation, see notes by Jaekoo Kang and Steve Rubin.

Inference

Force-align text and audio

import pyfoal

# Load text
text = pyfoal.load.text(text_file)

# Load and resample audio
audio = pyfoal.load.audio(audio_file)

# Select an aligner. One of ['mfa', 'p2fa', 'radtts' (default)].
aligner = 'radtts'

# For RAD-TTS, select a model checkpoint
checkpoint = pyfoal.DEFAULT_CHECKPOINT

# Select a GPU to run inference on
gpu = 0

alignment = pyfoal.from_text_and_audio(
    text,
    audio,
    pyfoal.SAMPLE_RATE,
    aligner=aligner,
    checkpoint=checkpoint,
    gpu=gpu)

Application programming interface

`pyfoal.from_text_and_audio`

"""Phoneme-level forced-alignment

Arguments
    text : string
        The speech transcript
    audio : torch.tensor(shape=(1, samples))
        The speech signal to process
    sample_rate : int
        The audio sampling rate

Returns
    alignment : pypar.Alignment
        The forced alignment
"""

`pyfoal.from_file`

"""Phoneme alignment from audio and text files

Arguments
    text_file : Path
        The corresponding transcript file
    audio_file : Path
        The audio file to process
    aligner : str
        The alignment method to use
    checkpoint : Path
        The checkpoint to use for neural methods
    gpu : int
        The index of the gpu to perform alignment on for neural methods

Returns
    alignment : Alignment
        The forced alignment
"""

`pyfoal.from_file_to_file`

"""Perform phoneme alignment from files and save to disk

Arguments
    text_file : Path
        The corresponding transcript file
    audio_file : Path
        The audio file to process
    output_file : Path
        The file to save the alignment
    aligner : str
        The alignment method to use
    checkpoint : Path
        The checkpoint to use for neural methods
    gpu : int
        The index of the gpu to perform alignment on for neural methods
"""

`pyfoal.from_files_to_files`

"""Perform parallel phoneme alignment from many files and save to disk

Arguments
    text_files : list
        The transcript files
    audio_files : list
        The corresponding speech audio files
    output_files : list
        The files to save the alignments
    aligner : str
        The alignment method to use
    num_workers : int
        Number of CPU cores to utilize. Defaults to all cores.
    checkpoint : Path
        The checkpoint to use for neural methods
    gpu : int
        The index of the gpu to perform alignment on for neural methods
"""

Command-line interface

python -m pyfoal
    [-h]
    --text_files TEXT_FILES [TEXT_FILES ...]
    --audio_files AUDIO_FILES [AUDIO_FILES ...]
    --output_files OUTPUT_FILES [OUTPUT_FILES ...]
    [--aligner ALIGNER]
    [--num_workers NUM_WORKERS]
    [--checkpoint CHECKPOINT]
    [--gpu GPU]

Arguments:
    -h, --help
        show this help message and exit
    --text_files TEXT_FILES [TEXT_FILES ...]
        The speech transcript files
    --audio_files AUDIO_FILES [AUDIO_FILES ...]
        The speech audio files
    --output_files OUTPUT_FILES [OUTPUT_FILES ...]
        The files to save the alignments
    --aligner ALIGNER
        The alignment method to use
    --num_workers NUM_WORKERS
        Number of CPU cores to utilize. Defaults to all cores.
    --checkpoint CHECKPOINT
        The checkpoint to use for neural methods
    --gpu GPU
        The index of the GPU to use for inference. Defaults to CPU.

Training

Download

python -m pyfoal.data.download

Downloads and uncompresses the arctic and libritts datasets used for training.

Preprocess

python -m pyfoal.data.preprocess

Converts each dataset to a common format on disk ready for training.

Partition

python -m pyfoal.partition

Generates train valid, and test partitions for arctic and libritts. Partitioning is deterministic given the same random seed. You do not need to run this step, as the original partitions are saved in pyfoal/assets/partitions.

Train

python -m pyfoal.train --config <config> --gpus <gpus>

Trains a model according to a given configuration on the libritts dataset. Uses a list of GPU indices as an argument, and uses distributed data parallelism (DDP) if more than one index is given. For example, --gpus 0 3 will train using DDP on GPUs 0 and 3.

Monitor

Run tensorboard --logdir runs/. If you are running training remotely, you must create a SSH connection with port forwarding to view Tensorboard. This can be done with ssh -L 6006:localhost:6006 <user>@<server-ip-address>. Then, open localhost:6006 in your browser.

Evaluate

python -m pyfal.evaluate \
    --config <config> \
    --checkpoint <checkpoint> \
    --gpu <gpu>

Evaluate a model. <checkpoint> is the checkpoint file to evaluate and <gpu> is the GPU index.

References

[1] R. Badlani, A. Łańcucki, K. J. Shih, R. Valle, W. Ping, and B. Catanzaro, "One TTS Alignment to Rule Them All," International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.

[2] J. Yuan and M. Liberman, “Speaker identification on the scotus corpus,” Journal of the Acoustical Society of America, vol. 123, p. 3878, 2008.

[3] M. McAuliffe, M. Socolof, S. Mihuc, M. Wagner, and M. Sonderegger, "Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi," Interspeech, vol. 2017, p. 498-502. 2017.

pyfoal's People

Contributors

Stargazers

Watchers

Forkers

nelson-liu jaedukseo naitian nathanpruyne

pyfoal's Issues

Multiprocessing issue

I tried running the pyfoal.from_files_to_files method to get my alignments. But due to some multiprocessor issue i'm not able to execute the method. My current structure is the following:

json_path = sorted([os.path.abspath(files) for files in glob.glob(os.path.join(directory, f'**/*{ext1}'), recursive= True)])
wav_path = sorted([os.path.abspath(files) for files in glob.glob(os.path.join(directory, f'**/*{ext2}'), recursive= True)])
directory_path = sorted([os.path.join(os.path.dirname(files), "alignment.json") for files in json_path])

pyfoal.from_files_to_files(json_path, wav_path, directory_path).

Where json_path is the array of all json paths, wav_path is the array of all wav paths, and directory_path is the array of the location where I want the pyfoal to be placed in. I get the following error when trying to run the code:

RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

The full stacktrace is below:
File "", line 1, in
rozen to produce an executable.
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
exitcode = _main(fd, parent_sentinel)
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
prepare(preparation_data)
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
prepare(preparation_data)
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/runpy.py", line 265, in run_path
main_content = runpy.run_path(main_path,
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/runpy.py", line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/runpy.py", line 97, in _run_module_code
return _run_module_code(code, init_globals, run_name,
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/pranavmalik/Documents/Clipper Dataset/scripts/audio_preprocess.py", line 457, in
_run_code(code, mod_globals, init_globals,
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
create_alignment_json(colab_files, ext1 = '.wav', ext2 = '.json')
File "/Users/pranavmalik/Documents/Clipper Dataset/scripts/audio_preprocess.py", line 388, in create_alignment_json
exec(code, run_globals)
File "/Users/pranavmalik/Documents/Clipper Dataset/scripts/audio_preprocess.py", line 457, in
pyfoal.from_files_to_files(json_path, wav_path, directory_path)
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/site-packages/pyfoal/core.py", line 109, in from_files_to_files
with mp.get_context('spawn').Pool() as pool:
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/context.py", line 119, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild,
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/pool.py", line 212, in init
self._repopulate_pool()
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/pool.py", line 303, in _repopulate_pool
return self._repopulate_pool_static(self._ctx, self.Process,
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/pool.py", line 326, in _repopulate_pool_static
w.start()
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/process.py", line 121, in start
create_alignment_json(colab_files, ext1 = '.wav', ext2 = '.json')
File "/Users/pranavmalik/Documents/Clipper Dataset/scripts/audio_preprocess.py", line 388, in create_alignment_json
pyfoal.from_files_to_files(json_path, wav_path, directory_path)
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/site-packages/pyfoal/core.py", line 109, in from_files_to_files
with mp.get_context('spawn').Pool() as pool:
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/context.py", line 119, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild,
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/pool.py", line 212, in init
self._repopulate_pool()
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/pool.py", line 303, in _repopulate_pool
self._popen = self._Popen(self)
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return self._repopulate_pool_static(self._ctx, self.Process,
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/pool.py", line 326, in _repopulate_pool_static
return Popen(process_obj)
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
w.start()
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
self._launch(process_obj)
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
return Popen(process_obj)
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
_check_not_importing_main()
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.
self._launch(process_obj)

File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Traceback (most recent call last):
File "", line 1, in
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
prepare(preparation_data)
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/runpy.py", line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/pranavmalik/Documents/Clipper Dataset/scripts/audio_preprocess.py", line 457, in
create_alignment_json(colab_files, ext1 = '.wav', ext2 = '.json')
File "/Users/pranavmalik/Documents/Clipper Dataset/scripts/audio_preprocess.py", line 388, in create_alignment_json
pyfoal.from_files_to_files(json_path, wav_path, directory_path)
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/site-packages/pyfoal/core.py", line 109, in from_files_to_files
with mp.get_context('spawn').Pool() as pool:
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/context.py", line 119, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild,
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/pool.py", line 212, in init
self._repopulate_pool()
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/pool.py", line 303, in _repopulate_pool
return self._repopulate_pool_static(self._ctx, self.Process,
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/pool.py", line 326, in _repopulate_pool_static
w.start()
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "/Users/pranavmalik/opt/anaconda3/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

And it just repeats over and over

Use 11025 Hz model

This model achieves the highest accuracy according to the original paper "Speaker Identification on the SCOTUS corpus".

Apply constant duration shift

The original open-source implementations use a constant shift when loading the mlf file as follows

        if (SR == 11025):
            st = (float(lines[j].split()[0])/10000000.0 + 0.0125)*(11000.0/11025.0)
            en = (float(lines[j].split()[1])/10000000.0 + 0.0125)*(11000.0/11025.0)
        else:
            st = float(lines[j].split()[0])/10000000.0 + 0.0125
            en = float(lines[j].split()[1])/10000000.0 + 0.0125

IndexError: list index out of range

File "/home/vic/anaconda3/envs/text2audio/lib/python3.7/site-packages/pyfoal/core.py", line 174, in correct_alignment
durations[0] += .0125
IndexError: list index out of range

just use the test.txt and test.wav, crashed as above.

AttributeError: module 'pyfoal' has no attribute 'align'

video_file_path = video_generator.generate_video(audio_object, video.script, video.title, video.filename or csv_name, audio_object)

File "d:\tool_tao_video\AI\src\video_generator.py", line 240, in generate_video
final_clip = self.overlay_subtitles(final_clip, get_alignement(script_without_emojis, audio_object), script,
File "d:\tool_tao_video\AI\src\video_generator.py", line 56, in get_alignement
return pyfoal.align(text, audio_object.get_audio_np_array(), audio_object.get_sample_rate()).json()
AttributeError: module 'pyfoal' has no attribute 'align'

hi @maxrmorrison , i get a erorr module 'pyfoal' has no attribute 'align' . how can i fix this Attribute. thanks you

def get_alignement(text, audio_object):
return pyfoal.align(text, audio_object.get_audio_np_array(), audio_object.get_sample_rate()).json()

pyfoal has no attribute load / a little more explanation would be great :)

Hello I have many problems with using pypar ~ pyfoal~ emphases libraries but in here I have the question for pyfoal.
I couldn't achive to make pyfoal to align with mfa (I didn't tried the other two aligners but they seem too complicated to install so.. just with mfa) instead I used mfa directly by hand and created my own aligned TextGrids and pasted this aligned text grid as pypar.Alignment object and started to try these pyfoal.interploate.... functions to obtain features and planned to fill my tables. But what I only have as far is some kind of <pypar.alignment.Alignment at 0x1db01f9ffa0> outputs.
Also I have an intresting error I will be sharing the code, error and also my text grid files' screenshot due to state myself better.Also please know I'm already appriciate all this work :)
Also I need more explanations for these parameters. as instance when you ask for text in the parameters you need textgrids, output of mfa or just the 'speech'. Also I couldn't find a way to control language(acoustic) and dict options with pyfoal.

Also I couldn't understand what those ratio's are stands for, I played with that a little but only thing I figure is when I made it too big nothing changed in output pyfoal.interpolate.phonemes(align,ratio= 1) <pypar.alignment.Alignment at 0x1db01f9ffa0>
Code :

aligns_lst = glob('alignment/aligned/*.TextGrid')
align = pypar.Alignment(aligns_lst[0])
pyfoal.interpolate.voiced(align, ratio = 1)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[109], line 2
      1 align = pypar.Alignment(aligns_lst[0])
----> 2 pyfoal.interpolate.voiced(align, ratio = 1)

File [~\anaconda3\envs\aligner\lib\site-packages\pyfoal\interpolate.py:25](http://localhost:8888/lab/tree/~/anaconda3/envs/aligner/lib/site-packages/pyfoal/interpolate.py#line=24), in voiced(alignment, ratio)
     23 # Determine which phonemes to stretch
     24 phonemes = alignment.phonemes()
---> 25 voicing = [is_voiced(str(phoneme)) for phoneme in phonemes]
     27 # Compute the ratio of the stretch applied to voiced regions
     28 duration = alignment.duration()

File [~\anaconda3\envs\aligner\lib\site-packages\pyfoal\interpolate.py:25](http://localhost:8888/lab/tree/~/anaconda3/envs/aligner/lib/site-packages/pyfoal/interpolate.py#line=24), in <listcomp>(.0)
     23 # Determine which phonemes to stretch
     24 phonemes = alignment.phonemes()
---> 25 voicing = [is_voiced(str(phoneme)) for phoneme in phonemes]
     27 # Compute the ratio of the stretch applied to voiced regions
     28 duration = alignment.duration()

File [~\anaconda3\envs\aligner\lib\site-packages\pyfoal\interpolate.py:82](http://localhost:8888/lab/tree/~/anaconda3/envs/aligner/lib/site-packages/pyfoal/interpolate.py#line=81), in is_voiced(phoneme)
     80 def is_voiced(phoneme):
     81     """Returns True iff the phoneme is voiced"""
---> 82     return bool(pyfoal.load.voicing()[phoneme])

AttributeError: module 'pyfoal' has no attribute 'load
---------------------------------------------------------------------------

Screen shot of txtgrid :

Remove torch and torchaudio dependencies

Switch to numpy and soundfile, respectively

Issues with special characters

Hey!
First of all, thanks for making such a great package available.
I seem to run into the issue that certain characters are not rendered correctly such as Umlaute. Is this intentional? Could they alternatively just be rendered as the "next best character", i.e. a instead of ä? Would appreciate your feedback!

Best,
Caro

Remove torchaudio dependency from setup.py

Thanks for this package! I noticed that torch and torchaudio are no longer dependencies in #1 but they are still listed in the setup.py. In particular, the version pins make installing this package problematic, since I have torchaudio 2.2.0 installed. I ended up doing pip install pyfoal --no-deps and manually installing the other dependencies.

I'd be happy to make a PR if that'd make life easier for you.

How can I get the checkpoint for RADTTS

Where can I get the checkpoint file for RADTTS? pyfoal.DEFAULT_CHECKPOINT points to the checkpoints directory in the assets directory but that doesn't seem to exist. Thanks!

Possible to run this on AWS Lambda or Beam (beam.cloud)?

I've been digging into Gentle a lot and having a lot of pain in setting it up locally and even on a remote server. I did finally set it up but have been trying to find a faster way to align the text/audio if possible. I recently learned about AWS Lambda functions, but also wondered 1) if it would be hard to adapt the Gentle code to use Lambda and 2) if it would even render that much faster than just using a remote server. Then I found https://beam.cloud that seems to do serverless but with GPUs and thought that might work for Gentle.

Then I stumbled on pyfoal.

It seems 1) it'll be easier to install in general, but 2) that it might align better with the serverless function models.

Have you thought about running it on AWS Lambda or Beam, and if not, do you think it might work?

Compare results with original implementation

Reorder interface to be text first

e.g., pyfoal.align(text, audio, sample_rate)

Remove load.py and save.py

Splicing beginning and end silence

I'm trying to cut out the silence from the beginning and end of the files, but i run into some issues with this splicing. My steps are the following:

run the alignment and produce the alignment.json file for a specific label.json and wav file (already provided in the drive file)
splice out the beginning and end if it's "sp"
run the forced alignment again on this new spliced wav file (without the silence) with the original label.json
the alignment should have no silence at beginning or end, but i've observed an error

Following is the code i used with the attached methods, was hoping you could try reproducing and pointing out why I can't splice out the silence.

Here is the drive containing the files:
https://drive.google.com/drive/folders/1tj6nHljdxZbyghsUm7WNNqI6kLW_SMZV?usp=sharing

import os
import glob
from pydub import AudioSegment
import pyfoal

def remove_silence(directory, ext = '.wav', json_file = 'alignment.json'):
    wav_path = sorted([os.path.abspath(path) for path in glob.glob(os.path.join(directory, f'**/*{ext}') , recursive = True)])
    json_path = sorted([os.path.abspath(path) for path in glob.glob(os.path.join(directory, f'**/*{json_file}'), recursive = True)])

    for json_file, wav_file in zip(json_path, wav_path):
        f = open(json_file)
        json_object = json.load(f)
        first_value = [value for key,value in json_object['words'][0].items()]
        last_value = [value for key, value in json_object['words'][-1].items()]
        final_sound = AudioSegment.from_file(wav_file)
        if first_value[0] == 'sp' and last_value[0] == 'sp':
            print(len(final_sound))
            print(wav_file)
            ending_time_first_silence = first_value[2]
            print("Beginning frame end sp " + str(ending_time_first_silence))
            starting_time_last_silence = last_value[1]
            print("Last frame start " + str(starting_time_last_silence))
            new_starting_time = ending_time_first_silence * 1000
            new_ending_time = starting_time_last_silence * 1000
            final_sound = final_sound[new_starting_time:new_ending_time]
            final_sound.export(wav_file, format = 'wav')
        
        elif first_value[0] == 'sp' and last_value[0] != 'sp':
            print(len(final_sound))
            print(wav_file)
            ending_time_first_silence = first_value[2]
            print("Beginning frame end sp " + str(ending_time_first_silence))
            new_starting_time = ending_time_first_silence* 1000
            final_sound = final_sound[new_starting_time:]
            final_sound.export(wav_file, format = 'wav')
    
        elif first_value[0] != 'sp' and last_value[0] == 'sp':
            print(len(final_sound))
            print(wav_file)
            starting_time_last_silence = last_value[1]
            print("Last frame start " + str(starting_time_last_silence))
            new_ending_time = starting_time_last_silence * 1000
            final_sound = final_sound[:new_ending_time]
            final_sound.export(wav_file, format = 'wav')


def clean_adjusted_alignment(directory, wav_ext = '.wav',  json_ext = 'label.json'):
    wav_path = sorted([os.path.abspath(path) for path in glob.glob(os.path.join(directory, f'**/*{wav_ext}'), recursive = True)])

    missing_wav_files = []
    missing_json_files = []
    missing_alignment_files = []

    for files in wav_path:
        if not os.path.exists(os.path.join(os.path.dirname(files), "alignment2.json")):      
            missing_wav_files.append(files)
            missing_json_files.append(os.path.join(os.path.dirname(files), 'label.json'))
            missing_alignment_files.append(os.path.join(os.path.dirname(files), 'alignment2.json'))

    print(missing_wav_files)

    pyfoal.from_files_to_files(missing_json_files, missing_wav_files, missing_alignment_files)



if __name__ == '__main__':

    remove_silence('AK2/')
    clean_adjusted_alignment('AK2/')

After the above runs, there should be 0 silence in front and end of the file, but for some reason that's not the case. As we see on 'alignment2.json', there is still silence for some reason.

TLDR:

The first forced alignment ran perfectly. When i tried to splice the beginning and end silence out of the wav file, i got an issue with the alignment when i retried. I'm not sure why this happens to a few files and doesn't with others

mp3 file not recognized

Tried running a simple script to align from a file, shown below:

import pyfoal

alignment = pyfoal.from_file("test.txt", "test.mp3")
print(alignment)

This results in a runtime error:

Traceback (most recent call last):
  File "align.py", line 3, in <module>
    alignment = pyfoal.from_file("test.txt", "test.mp3")
  File "/Users/aneeshnaik/Documents/RTR-source-separation/venv/lib/python3.8/site-packages/pyfoal/core.py", line 77, in from_file
    audio = pyfoal.load.audio(audio_file)
  File "/Users/aneeshnaik/Documents/RTR-source-separation/venv/lib/python3.8/site-packages/pyfoal/load.py", line 13, in audio
    audio, sample_rate = soundfile.read(file)
  File "/Users/aneeshnaik/Documents/RTR-source-separation/venv/lib/python3.8/site-packages/soundfile.py", line 256, in read
    with SoundFile(file, 'r', samplerate, channels,
  File "/Users/aneeshnaik/Documents/RTR-source-separation/venv/lib/python3.8/site-packages/soundfile.py", line 629, in __init__
    self._file = self._open(file, mode_int, closefd)
  File "/Users/aneeshnaik/Documents/RTR-source-separation/venv/lib/python3.8/site-packages/soundfile.py", line 1183, in _open
    _error_check(_snd.sf_error(file_ptr),
  File "/Users/aneeshnaik/Documents/RTR-source-separation/venv/lib/python3.8/site-packages/soundfile.py", line 1357, in _error_check
    raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening 'test.mp3': Format not recognised.

Any ideas on how to resolve this? I did try converting the file to wav, and changing the input audio file accordingly, but I get the following:

Traceback (most recent call last):
  File "align.py", line 3, in <module>
    alignment = pyfoal.from_file("test.txt", "test.wav")
  File "/Users/aneeshnaik/Documents/RTR-source-separation/venv/lib/python3.8/site-packages/pyfoal/core.py", line 77, in from_file
    audio = pyfoal.load.audio(audio_file)
  File "/Users/aneeshnaik/Documents/RTR-source-separation/venv/lib/python3.8/site-packages/pyfoal/load.py", line 16, in audio
    return pyfoal.resample(audio, sample_rate)
  File "/Users/aneeshnaik/Documents/RTR-source-separation/venv/lib/python3.8/site-packages/pyfoal/core.py", line 296, in resample
    return resampy.resample(audio, sample_rate, SAMPLE_RATE)
  File "/Users/aneeshnaik/Documents/RTR-source-separation/venv/lib/python3.8/site-packages/resampy/core.py", line 97, in resample
    raise ValueError('Input signal length={} is too small to '
ValueError: Input signal length=2 is too small to resample from 44100->11025

Alignment of Phonemes

I am using the Allosaurus library to extract phonemes from an audio file, since Allosaurus is able to work language indipendent. Is there a way to use pyfoal to align a list of phonemes rathert than a text? The language indipendence from such an approach would be very appreciated and open a plethora of possibilities.

Investigate shortened audio file

The reported final duration of the audio is incorrect after processing with HTK.

Checkpoints not found

From the documentation, I use pyfoal.DEFAULT_CHECKPOINT, but it looks like this path (assets/checkpoints) doesn't exist. Are we supposed to train the model ourselves? Are there pre-trained weights available for download somewhere?

Thanks!

Not compatible with Python 3.10/3.11?

I was trying topip installthe package but I got the error

ERROR: Ignored the following versions that require a different python version: 1.6.2 Requires-Python >=3.7,<3.10; 1.6.3 Requires-Python >=3.7,<3.10; 1.7.0 Requires-Python >=3.7,<3.10; 1.7.1 Requires-Python >=3.7,<3.10; 1.7.2 Requires-Python >=3.7,<3.11; 1.7.3 Requires-Python >=3.7,<3.11; 1.8.0 Requires-Python >=3.8,<3.11; 1.8.0rc1 Requires-Python >=3.8,<3.11; 1.8.0rc2 Requires-Python >=3.8,<3.11; 1.8.0rc3 Requires-Python >=3.8,<3.11; 1.8.0rc4 Requires-Python >=3.8,<3.11; 1.8.1 Requires-Python >=3.8,<3.11
ERROR: Could not find a version that satisfies the requirement torch<2.0.0 (from pyfoal) (from versions: 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2, 2.2.0)
ERROR: No matching distribution found for torch<2.0.0

OSError: [errorno 8] Exec format error: 'HCopy'

When trying to use pyfoal.from_files_to_files(text_files, audio_files, output_files) I get the following exception

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/aaron/anaconda3/envs/ling120-final/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/aaron/anaconda3/envs/ling120-final/lib/python3.8/multiprocessing/pool.py", line 51, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "/home/aaron/anaconda3/envs/ling120-final/lib/python3.8/site-packages/pyfoal/core.py", line 95, in from_file_to_file
    from_file(text_file, audio_file).save(output_file)
  File "/home/aaron/anaconda3/envs/ling120-final/lib/python3.8/site-packages/pyfoal/core.py", line 80, in from_file
    return align(text, audio, SAMPLE_RATE)
  File "/home/aaron/anaconda3/envs/ling120-final/lib/python3.8/site-packages/pyfoal/core.py", line 56, in align
    return align.aligner(text, audio, duration)
  File "/home/aaron/anaconda3/envs/ling120-final/lib/python3.8/site-packages/pyfoal/core.py", line 143, in __call__
    self.format(directory, audio, script_file)
  File "/home/aaron/anaconda3/envs/ling120-final/lib/python3.8/site-packages/pyfoal/core.py", line 200, in format
    subprocess.Popen(
  File "/home/aaron/anaconda3/envs/ling120-final/lib/python3.8/subprocess.py", line 858, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/home/aaron/anaconda3/envs/ling120-final/lib/python3.8/subprocess.py", line 1704, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 8] Exec format error: 'HCopy'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/aaron/anaconda3/envs/ling120-final/lib/python3.8/site-packages/pyfoal/core.py", line 111, in from_files_to_files
    pool.starmap(align_fn, zip(text_files, audio_files, output_files))
  File "/home/aaron/anaconda3/envs/ling120-final/lib/python3.8/multiprocessing/pool.py", line 372, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/home/aaron/anaconda3/envs/ling120-final/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
OSError: [Errno 8] Exec format error: 'HCopy'

I can run the program without multiprocessing and it works fine, but when I use multiprocessing with pyfoal I get this exception.

any phoneme list?

I wanna know where i can found the phoneme list that pyfoal used?

maxrmorrison / pyfoal Goto Github PK

pyfoal's Introduction

Python forced alignment

Table of contents

Installation

Montreal Forced Aligner (MFA)

Penn Phonetic Forced Aligner (P2FA)

Inference

Force-align text and audio

Application programming interface

pyfoal.from_text_and_audio

pyfoal.from_file

pyfoal.from_file_to_file

pyfoal.from_files_to_files

Command-line interface

Training

Download

Preprocess

Partition

Train

Monitor

Evaluate

References

pyfoal's People

Contributors

Stargazers

Watchers

Forkers

pyfoal's Issues

Recommend Projects

Recommend Topics

Recommend Org

`pyfoal.from_text_and_audio`

`pyfoal.from_file`

`pyfoal.from_file_to_file`

`pyfoal.from_files_to_files`