benfmiller / audalign Goto Github PK

View Code? Open in Web Editor NEW

86.0 86.0 2.0 14.34 MB

Package for aligning audio files through audio fingerprinting

License: MIT License

Python 96.07% Shell 3.93%

audalign's Introduction

Hi there 👋

audalign's People

Contributors

Stargazers

Watchers

Forkers

speechoceantech

audalign's Issues

Installation not working

Hi I wanted to try out audalign but had some trouble with the installation. The installation instructions in the readme wont let me install the module. I always get this error

pip install audalign
ERROR: Could not find a version that satisfies the requirement audalign (from versions: none)
ERROR: No matching distribution found for audalign

So I tried it with the following commands
pip install git+https://github.com/benfmiller/audalign.git
or (after downloading from github)
pip install audalign-master.zip

In both cases pip will state it "Successfully installed audalign-0.0.2" but it seams like the installation is not working. I just tried to import audalign and get:
Traceback (most recent call last): File "aligndashit.py", line 1, in <module> import audalign ModuleNotFoundError: No module named 'audalign'

Also the Lib\site-packages\ of my python env have an audalign-0.0.2.dist-info folder but no audalign folder.

Any idea whats wrong?

EDIT: I'm on Windows10, Python 3.7.6

Add Documentation

[Request] Modular installation of the package

Hi! Once again, great work on this project!

Is there any chance that the requirements can be split into modular parts?

Like this:

pip install audalign[correlation,fingerprint]

At the moment the installation takes a very long time for parts of the code that I'm not using. For now, I simply extracted the correlation bits of code into my project in order to make it slimmer, but it could be great if this can be supported officially so I don't miss on any helpful updates from you.

Thank you

Reintroduce Multiprocessing

[IDEA] Phase/Polarity adjust

Hi there,
dunno if it falls within the scope of the project but often, after the aligning, some phase/polarity "errors" could degrade the recording.

Here's a couple of interesting resources about those issues:

Dunno if these softwares may help...

@csteinmetz1's (JUCE) PhaseAnalyzer;
@x42's phaserotate.lv2;
@nullstar's (VST) KickFace;
@conundrumer's A4PC;
@victormassatieze's phase_reconstruction;
@zied-mnasri's phase_retrieval;
@hgroenenboom's Phase Rotation Experiment;

Last but not least, here's a very interesting research about phase recovery by @magronp:
Phase recovery with Bregman divergences for audio source separation

Hope that inspires !

Resolving slight offset

When regording two microphones where one microphone picks up the sound of the other, a slight offset causes nasty echo. Would you be interested in such examples? If yes, how to deliver them?

API changes coming soon

I was working to add in ML-based fingerprinting recognitions, but I hit a wall with the current layout of the program. It was originally designed for simplicity of use, but as more recognition techniques have been added, it has become increasingly difficult to keep track of what each parameter is actually used for in the alignments.

The recognitions will take a recognition object (one for each type of recognition technique), which will also contain a corresponding config object. This way, all configuration for a specific technique will be contained within that technique's config. Align will also be rewritten for generic objects.

This will make it much easier for those who want to extend the functionality of the program and add their own recognizers. It will also hopefully make the audalign much easier to use and understand.

Higher Sample Rates issues

Hi,

I have issues to align audio files with a sample rate higher than 44100 (in my case 48000).

  1 import audalign
  2 
  3 def main():
  4     ada = audalign.Audalign()
  5     rough_alignment = ada.align(
  6         "./not_aligned/",
  7         cor_sample_rate=48000,
  8         )
  9     
 10     fine_alignment = ada.fine_align(
 11         rough_alignment,
 12         destination_path="./aligned",
 13         cor_sample_rate=48000,
 14         )
 15 
 16 if __name__ == "__main__":
 17     main()

I stuck after the fingerprinting:

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.8/site-packages/audalign/fingerprint.py", line 109, in _fingerprint_worker
    channel, _ = audalign.filehandler.read(file_path, start_end=start_end)
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.8/site-packages/audalign/filehandler.py", line 154, in read
    audiofile = create_audiosegment(
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.8/site-packages/audalign/filehandler.py", line 59, in create_audiosegment
    audiofile = AudioSegment.from_file(filepath)
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.8/site-packages/pydub/audio_segment.py", line 685, in from_file
    info = mediainfo_json(orig_file, read_ahead_limit=read_ahead_limit)
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.8/site-packages/pydub/utils.py", line 279, in mediainfo_json
    info = json.loads(output)
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "aligner.py", line 17, in <module>
    main()
  File "aligner.py", line 5, in main
    rough_alignment = ada.align(
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.8/site-packages/audalign/__init__.py", line 22, in wrapper_decorator
    results = func(*args, **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.8/site-packages/audalign/__init__.py", line 1141, in align
    return align._align(
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.8/site-packages/audalign/align.py", line 48, in _align
    set_ada_file_names(
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.8/site-packages/audalign/align.py", line 176, in set_ada_file_names
    ada_obj.fingerprint_directory(file_dir)
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.8/site-packages/audalign/__init__.py", line 302, in fingerprint_directory
    result = self._fingerprint_directory(
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.8/site-packages/audalign/__init__.py", line 388, in _fingerprint_directory
    result = self.pool.map(
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.8/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value

Are higher sample rates supported anyway?

AudioAlign

Just wanted to let you know about https://github.com/protyposis/AudioAlign
It's coded in C# but maybe helpful for reference.

Good luck with the project

Add post-decoding exporting to read

total.wav is a single channel, why not align it in a multichannel fashion?

Currently total.wav seems to contain the cummulative result. I think it would be much more valuable to have a total file that contains all tracks being correctly aligned. It could be even nicer to use a container format that does not require the 'shift' to be encoded, hence the offset would only be encoded as metadata (in a edit decision list fashion).

Handle Non-audio-files

Address double-adding files

Update the dependencies

At this moment many of the dependencies become unsupported. It would be nice that the package would use the latest versions of noisereduce, librosa, skikit, typed-ast, typing-extensions and others.

finally of _allign wipes important data

Hi again, this time its lines 114-117 in align.py

    finally:
        ada_obj.file_names = temp_file_names
        ada_obj.fingerprinted_files = temp_fingerprinted_files
        ada_obj.total_fingerprints = temp_total_fingerprints

all the temps seem to be empty when declared and they never appear again in the code. I noticed because save_fingerprinted_files was saving 3 empty variables that get replaced with empty temp_ files in this snippet. Removing these 3 lines solves the issue but probably isnt the intentional behavior nor an optimal solution.

Accuracy Setting in Recognize

When aligning audio and video files the default script fails

As user I have multiple recording devices. Practically a camera, and two rode wireless go II devices. I would like to achieve alignment between the different recordings. The source data of each device may be assumed to sequential in nature, but the recordings of different devices may not have been continous, thus have a different overlap.

What I would like to see is something happening where blocks within the same folder are not correlated, but different folders are. In addition, given that the input sequence is not a 'bag of files' but a 'sorted list of files' this knowledge should be used in the alignment proces: a forward search given the last prior.

At this moment I notice after the fingerprinting process the following error when I try to add my video folder. I have also attempted to make a wav file out of all the videos. The same error applied.

From the description above I could do a iterative approach which would sequentially align files single files, by initial finger print. My preference would obviously be an unsupervised method.

VID_20220409_115501.mp4: Finding Matches...  Aligning matches
Traceback (most recent call last):
  File "/mnt/storage/home/skinkie/Sources/audalign/run_align.py", line 284, in <module>
    main(args=args)
  File "/mnt/storage/home/skinkie/Sources/audalign/run_align.py", line 196, in main
    results = ad.align(
  File "/mnt/storage/home/skinkie/Sources/audalign/audalign/__init__.py", line 36, in wrapper_decorator
    results = func(*args, **kwargs)
  File "/mnt/storage/home/skinkie/Sources/audalign/audalign/__init__.py", line 91, in align
    return aligner._align(
  File "/mnt/storage/home/skinkie/Sources/audalign/audalign/align/__init__.py", line 48, in _align
    files_shifts = calc_final_alignments(
  File "/mnt/storage/home/skinkie/Sources/audalign/audalign/align/__init__.py", line 169, in calc_final_alignments
    files_shifts = find_matches_not_in_file_shifts(
  File "/mnt/storage/home/skinkie/Sources/audalign/audalign/align/__init__.py", line 290, in find_matches_not_in_file_shifts
    nmatch_wt_most[main_name][audalign.Audalign.OFFSET_SECS] = None
AttributeError: module 'audalign' has no attribute 'Audalign'. Did you mean: 'datalign'?

Increase align accuracy

[SUGGESTION] NN & Video synch

Hi there, audalign is very cool !

We would suggest to keep in consideration some interesting fingerprint projects in order to evolve it even more:

FingerprintDNN by @carlmoore256 - Fast pitch detection using a deep neural network
neural-audio-fp by @mimbres - Neural Audio Fingerprint for High-specific Audio Retrieval based on Contrastive Learning
neuralfp by @chymaera96 - Audio Fingerprinter
pfann by @stdio2016 - Neural Network Audio FingerPrint

Please check out AudioAlign - a tool written for research purposes to automatically synchronize audio and video recordings that have either been recorded in parallel at the same event or contain the same aural information - by @protyposis too, wich have a very cool advanced GUI.

Hope that inspires !

Audio Alignment Gains

Hello,

I've working whit your code; thank you very much is so useful.

My problem happens when I trying to reproduce the audio, the audio is amplified, therefore the noise floor as well. Is there a way to hold the original audios gains?

Thank you!

[Feature] Add a filter for matches that are too close to each other

I need to get a list of all posible matches that an audio could have, and by looking at the code I found that I could get a list of matches using match_len_filter, however most of results were around the same second and filter_matches wasn't useful at solving that problem, so I made this function for my script that filters numbers that are close to each other while conserving the original order, ex 1.5, 1.1, 60.1, 60.5, 30.4; it would return 1.5, 60.1, 30.4, and I thought it could also be useful for this program.

In my case I needed to remove matches that have less than 0.5 absolute difference, but it could be changed easily.

def remove_close_numbers_by_abs_diff(nums):
    if not nums:
        return []

    output = [nums[0]]

    for num in nums[1:]:
        if all(abs(num - prev) > 0.5 for prev in output):
            output.append(num)

    return output

import unittest

class TestRemoveCloseNumbers(unittest.TestCase):

    def test_remove_close_numbers(self):
        self.assertEqual(remove_close_numbers_by_abs_diff([1.5, 1.1, 3.2, 3.9, 5, 5.9, 0.5, 3.3, 3.3]), [1.5, 3.2, 3.9, 5, 5.9, 0.5])
        self.assertEqual(remove_close_numbers_by_abs_diff([1, 1, 2.2, 2.3, 2.5, 3.5, 4.4, 4.8]), [1, 2.2, 3.5, 4.4])
        self.assertEqual(remove_close_numbers_by_abs_diff([10, 10, 2.2, 2.3, 2.5, 1.5, 1, 0.8]), [10, 2.2, 1.5, 0.8])
        self.assertEqual(remove_close_numbers_by_abs_diff([2, 3]), [2, 3])
        self.assertEqual(remove_close_numbers_by_abs_diff([1, 3, 5, 7, 9]), [1, 3, 5, 7, 9])
        self.assertEqual(remove_close_numbers_by_abs_diff([]), [])


if __name__ == '__main__':
    unittest.main()

Request for a feature to control audio normalization before finding offset

I need to make a script that finds the cuts in an audio, and for that I have decided to make cuts of one second for an audio. Apparently, Audalign already does that, but since I don't know how to modify it, I have not chosen that option.

After making cuts with my script, and then finding the offset with audalign, in some cases it give bad results, as each audio piece is normalized individually . So I would like to know if there is any way to prevent the audio from being normalized, and do the normalization myself.

Print recognizing details while recognizing

Multiprocessing error when aligning

Sorry to bother you again.
I used audalign to convert two video files to wav. This worked like a charm. Now I was trying to align the output:

import audalign

ada = audalign.Audalign()

...

ada.convert_audio_file(filepath1, filepath1wav)
ada.convert_audio_file(filepath2, filepath2wav)

print(ada.align(r'.\files'))

I get the following error then:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "c:\users\retrohelix\appdata\local\programs\python\python38\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "c:\users\retrohelix\appdata\local\programs\python\python38\lib\multiprocessing\spawn.py", line 125, in _main
    prepare(preparation_data)
  File "c:\users\retrohelix\appdata\local\programs\python\python38\lib\multiprocessing\spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "c:\users\retrohelix\appdata\local\programs\python\python38\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "c:\users\retrohelix\appdata\local\programs\python\python38\lib\runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "c:\users\retrohelix\appdata\local\programs\python\python38\lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "c:\users\retrohelix\appdata\local\programs\python\python38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "c:\Users\RetroHelix\Programming\Audio\aligndashit.py", line 12, in <module>
    print(ada.align(r'.\files'))
  File "C:\Users\RetroHelix\Envs\audalignPy38\lib\site-packages\audalign\__init__.py", line 528, in align
    self.fingerprint_directory(directory_path)
  File "C:\Users\RetroHelix\Envs\audalignPy38\lib\site-packages\audalign\__init__.py", line 223, in fingerprint_directory
    result = self._fingerprint_directory(path, plot, nprocesses, extensions)
  File "C:\Users\RetroHelix\Envs\audalignPy38\lib\site-packages\audalign\__init__.py", line 289, in _fingerprint_directory
    with multiprocessing.Pool(nprocesses) as self.pool:
  File "c:\users\retrohelix\appdata\local\programs\python\python38\lib\multiprocessing\context.py", line 119, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild,
  File "c:\users\retrohelix\appdata\local\programs\python\python38\lib\multiprocessing\pool.py", line 212, in __init__
    self._repopulate_pool()
  File "c:\users\retrohelix\appdata\local\programs\python\python38\lib\multiprocessing\pool.py", line 303, in _repopulate_pool
    return self._repopulate_pool_static(self._ctx, self.Process,
  File "c:\users\retrohelix\appdata\local\programs\python\python38\lib\multiprocessing\pool.py", line 326, in _repopulate_pool_static
    w.start()
  File "c:\users\retrohelix\appdata\local\programs\python\python38\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "c:\users\retrohelix\appdata\local\programs\python\python38\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "c:\users\retrohelix\appdata\local\programs\python\python38\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "c:\users\retrohelix\appdata\local\programs\python\python38\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "c:\users\retrohelix\appdata\local\programs\python\python38\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

Explanation of match_info dictionary

I have some trouble understanding the values in the match_info dictionary. Having two audio files where one file is a short snippet of the other one I got these values for the match_info dictionary:

{'match_time': 148.59945511817932, 
'match_info': 
	{'line_170.mp3': 
		{'confidence': [99], 
		'offset_samples': [-16972], 
		'locality': [[(83, 26)]], 
		'locality_setting': [4.96907], 
		'offset_seconds': [-788.17814], 
		'locality_seconds': [[(3.85451, 1.20744)]]}}}

I interpret this as follows:
For the file "line_170.mp3" a match was found that has a confidence of 99. locality_setting just states that the match was found in a ~5 second window. offset_seconds gives the offset in seconds to where the match was found. But what about the tuples in locality_seconds/locality and the offset_samples value?
Can you please explain the meaning behind these values?

Thank you very much :)

Add typing

I know very little about coding. I never really used GitHub that much.

Is there a way someone that can do this process of aligning two audio files I have, without me having to do it using this command program (audalign)?

How to sync and align one audio file wrt another audio file?

I have a source audio file from a video and a target audio file, which is a cloned audio of the source audio file. I am trying to sync the cloned audio onto the original video. I tried align_files, but the saved final file has two channels (both source and target audio).

How do we align the target with the source?
Thanks in advance for your help; so far, this repo gives the best results for the alignment task I am trying.

No module named 'audalign.align'

Hello there,

The newest version of the library audalign==1.0.0 is giving me the following error while importing the module.

>>> import audalign

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/pdaawr/Documents/project/metrics/venv.nosync/lib/python3.8/site-packages/audalign/__init__.py", line 21, in <module>
    import audalign.align.aligner as aligner
ModuleNotFoundError: No module named 'audalign.align'

There is no issue with the previous one, i.e 0.7.2. I tested both versions on python3.8

target_file never fingerprinting

def prelim_fingerprint_checks(ada_obj, target_file, directory_path):
    all_against_files = audalign.filehandler.find_files(directory_path)
    all_against_files_full = [x[0] for x in all_against_files]
    all_against_files_base = [os.path.basename(x) for x in all_against_files_full]
    if (
        os.path.basename(target_file) in all_against_files_base
        # if the target file is outside directory_path the above line makes it never fingerprint, shouldnt it look like:
        os.path.basename(target_file) not in all_against_files_base
        and target_file not in all_against_files_full
    ):
        ada_obj.fingerprint_file(target_file)

Could be me being bad at programming but this change seems to solve the issue i had with target_align

Fingerprint Directory Broken

[Request/Suggestion] Support unpredictable frame drops and unmatching speed/pitch (drift correction)

I'm looking for a possibility to perform (potentially destructive) audio tracks synchronization from old (dubbed in different language) and remastered versions of movies.

In my scenario, applying single audio shift is not enough: sooner or later audios become out of sync at least due to

unpredictable frame drops in both tracks
unmatching overall average speed (often with higher pitch for faster audio)

Any interest in supporting such a scenario?

Any existing projects that try to accomplish this problem?

Any ideas what's the best way to implement it?

Naive idea for implementation:

do initial synchronization
until old dubbed audio ends
- detect whether segment potentially contains voice (with something like silero-vad) or something non-silent/non-voiced (ideally, music segment)
- somehow measure tempo difference between the old and new audio segments
  - if it's voice — recognize it (with something like whisper.cpp) and compare time differences of first and last word of the segment, between old and new audio segment
  - if it's something else — probably just compare differences of two most loud points of old and new audio segment
- shrink/stretch (speedup/slowdown) the (old, dubbed in other language) audio segments (the possible analyzed non-silent/non-voiced segment and any next N segments)
- repeat

Thanks!

Improve fingerprinting algorithm

[Request/Suggestion] Visualization of alignments

I would like to request a feature.
It's nice to be able to easily align various audio files with adalign but it would also be nice to see how the files differ. When you explained the structure of the match dictionary to me I posted a screenshot of an offset graph. Something like this or a graph of wave forms with colored parts that match would be nice. I hope you get what I mean :D

Could you please incorporate something like this into audalign?

[REQ] fix license on GH

Hi there, we just realized that we've never asked for it: can you please "standardize" the license file ?

Although it may sounds like a minor aspect, a GH "uncompliant" license file causes an inconsistent generation of the relative badge:

(badge-generator URL: https://flat.badgen.net/github/license/benfmiller/audalign/?label=LICENSE)

You can easily set a "correct" one through the GH's license wizard tool.

Last but not least, we're revising the AUDIO category \ Tools section \ Alignment/synch subsection where your project is listed, so let us know how - in your opinion - we could improve our categorizations and links to resources in order to favor collaboration between developers (and therefore evolution) of listed projects.

Thanks in advance.