Comments (15)
That is pretty much exactly what each of those stand for!
The samples would be more accurately described as FFT frames. The defaults for FFT are defined in the fingerprint file and are DEFAULT_FS=44100, DEFAULT_WINDOW_SIZE=4096, DEFAULT_OVERLAP_RATIO=0.5.
Every audio file is automatically converted to 44100 FS, combined into one channel, and normalized when it is read in. So, the formula to convert between seconds and FFT samples/frames is:
Seconds = Samples / FS * WINDOW_SIZE * OVERLAP_RATIO
confidence = the count of fingerprint matches between the files.
offset_samples = Alignment in FFT samples/frames
locality = center of the matching windows of sound events in each file in FFT samples/frames
locality_setting = width of locality window in seconds after it has been converted to FFT samples/frames
offset_seconds = Alignment in seconds
locality_seconds = center of the matching windows of sound events in each file in seconds
The tuples are of the form (target_file, against_file), where target_file is the input to recognize and against_file is an already fingerprinted file.
Using locality could result in more than one window alignment that has the exact same confidence, so those will be added to the list. e.g. [(1,2),(1,5)] if they both have the same confidence.
Each index of the results is an alignment with the first index having the highest confidence. So if there are multiple possible alignments, attribute[index] gets the values for that single alignment
I suppose I should put an explanation in the readme or the wiki. Do you think I should change "samples" to "frames" to be clearer about their values?
Hope this helps!
from audalign.
Thank you very much for the explanation.
I'm trying to plot an alignment graph where you can see the offset over time of one recording against another (using this to visualize where a tv recording (with advertisement breaks) differs from a dvd video release) and figured I need more information about all the values :D
Can't help you much about the naming. I always think of "fingerprint samples" but frame sounds equally good to me :)
from audalign.
Ok, I was confident I can use your explanation to build what I explained but... I still have trouble to interpret the output. :-(
Lets say I have two files:
- short_orig_intro.mp3
- short_new_intro.mp3
(same as short_orig_intro but the first 12.245 seconds are removed and 2 seconds of silence audio was inserted at the 12th second)
Fingerprinting and recognizing yields the following output:
Finished fingerprinting short_new_intro.mp3
Finished fingerprinting short_orig_intro.mp3
short_new_intro.mp3: Finding Matches... Aligning matches
{
'match_time': 0.005000114440917969,
'match_info':
{
'short_orig_intro.mp3':
{
'confidence': [152, 100],
'offset_samples': [222, 264],
'locality': [[(54, 53), (62, 55), (62, 53), (58, 55), (58, 58), (65, 55), (65, 65), (60, 66), (60, 60), (56, 58), (56, 54), (62, 62), (54, 54), (67, 68), (55, 255), (55, 54), (62, 264), (62, 62), (66, 280), (66, 49), (59, 59), (72, 215), (72, 52), (71, 237), (71, 71), (63, 267), (63, 86), (54, 267), (54, 86), (61, 45), (59, 60), (54, 54), (53, 50)], [(58, 58), (58, 165), (54, 165), (58, 58), (58, 155), (55, 155), (55, 54), (55, 149), (56, 56), (56, 145), (54, 55), (54, 54), (54, 143), (55, 55), (55, 139), (58, 58), (58, 55), (58, 57), (58, 129), (54, 58), (54, 56), (54, 54), (54, 122), (86, 58), (86, 56), (86, 54), (86, 110), (56, 54), (56, 87), (60, 54), (60, 83), (61, 54), (61, 374)]],
'locality_setting': [4.96907],
'offset_seconds': [10.30966, 12.26014],
'locality_seconds': [[(2.50776, 2.46132), (2.87927, 2.5542), (2.87927, 2.46132), (2.69351, 2.5542), (2.69351, 2.69351), (3.01859, 2.5542), (3.01859, 3.01859), (2.78639, 3.06503), (2.78639, 2.78639), (2.60063, 2.69351), (2.60063, 2.50776), (2.87927, 2.87927), (2.50776, 2.50776), (3.11147, 3.15791), (2.5542, 11.84218), (2.5542, 2.50776), (2.87927, 12.26014), (2.87927, 2.87927), (3.06503, 13.00317), (3.06503, 2.27556), (2.73995, 2.73995), (3.34367, 9.98458), (3.34367, 2.41488), (3.29723, 11.00626), (3.29723, 3.29723), (2.92571, 12.39946), (2.92571, 3.99383), (2.50776, 12.39946), (2.50776, 3.99383), (2.83283, 2.0898), (2.73995, 2.78639), (2.50776, 2.50776), (2.46132, 2.322)], [(2.69351, 2.69351), (2.69351, 7.66259), (2.50776, 7.66259), (2.69351, 2.69351), (2.69351, 7.19819), (2.5542, 7.19819), (2.5542, 2.50776), (2.5542, 6.91955), (2.60063, 2.60063), (2.60063, 6.73379), (2.50776, 2.5542), (2.50776, 2.50776), (2.50776, 6.64091), (2.5542, 2.5542), (2.5542, 6.45515), (2.69351, 2.69351), (2.69351, 2.5542), (2.69351, 2.64707), (2.69351, 5.99075), (2.50776, 2.69351), (2.50776, 2.60063), (2.50776, 2.50776), (2.50776, 5.66567), (3.99383, 2.69351), (3.99383, 2.60063), (3.99383, 2.50776), (3.99383, 5.10839), (2.60063, 2.50776), (2.60063, 4.04027), (2.78639, 2.50776), (2.78639, 3.85451), (2.83283, 2.50776), (2.83283, 17.36853)]]
}
}
}
Can you please explain where I can find the mentioned edits in this output? The 12.26014 in offset_seconds seems to be one thing I'm looking for but what does this offset mean exactly?
from audalign.
That sounds like a very fun project! I hope it's otherwise going well!
Sorry for the late reply, I had family in for the weekend.
The output is saying that the best alignment is 10.309 seconds with a confidence of 152. The offset in seconds says that 'short_orig_intro.mp3' starts 10.309 seconds before the target file, which is 'short_new_intro.mp3.' Likewise, there is a 100 confidence level that 'short_orig_intro.mp3' starts 12.260 seconds before 'short_new_intro.mp3.'
It's hard for fingerprinting to be completely accurate for the offset because mp3 encodings move the files a little, and the window size, overlap, and frame rate affect the accuracy of the spectrogram. Each frame corresponds to roughly 0.04 seconds, so alignment is always within that range for the current settings. I recently added a correlation-based alignment that uses the waveform, so it is much more accurate for alignment time but is more susceptible to noise. I am also planning on adding locality to that, too.
I hope this clears the alignment up a little better! It would be interesting to hear what you find for those alignments
from audalign.
Thanks for always getting back to me.
It seems that the second alignment (12.260) is for the first part of the track and the first alignment (10.309) is for the second part.
Since I removed the first ~12 seconds the track has to be shifted 12 seconds. Then comes the part where I inserted ~2 seconds so the offset is 12-2 =10... I was perplexed because of how this is ordered.
I still struggle to pinpoint the information to where the alignment starts or ends. Some of the locality_seconds matching tuples are looking quite promising but some don't make any sense to me.
from audalign.
You betcha! Happy to help!
The order of the matches is sorted by confidence. If you were to pass the folder to the align function and write them to a folder with the shifts, it uses the first, strongest match for the alignment. For your project, it might work better to do something like
new_match={}
for filename, info in match["match_info"].items():
temp_match = zip(info["offset_seconds"],info["confidence"],info["offset_samples"],info["locality"],info["locality_seconds"])
temp_match = sorted(temp_match, key=lambda x: x[0])
new_match[filename] = temp_match
to order it by offset_seconds.
Can I ask what version you are using? There was a bug in the locality setting that I fixed in 0.1.5. If your version is after that, then it's not quite making sense to me either.
from audalign.
Seems like I used an older version since the output is different after updating to 0.2.0. Both outputs are bellow (I inserted some line breaks into the new_result to distinguish the output a bit.
if __name__ == '__main__':
ada = audalign.Audalign()
ada.fingerprint_directory("shortintro")
ada.save_fingerprinted_files("shortintro.json")
# Only returns matches with total fingerprint matches greater than 50 within 5 second windows
result = ada.recognize("short_new_intro.mp3", filter_matches=50, locality=5)
print(result)
print("---#####---")
new_result={}
for filename, info in result["match_info"].items():
temp_match = zip(info["offset_seconds"],info["confidence"],info["offset_frames"],info["locality_frames"],info["locality_seconds"])
temp_match = sorted(temp_match, key=lambda x: x[0])
new_result[filename] = temp_match
print(new_result)
Fingerprinting short_new_intro.mp3
Fingerprinting short_orig_intro.mp3
Finished fingerprinting short_new_intro.mp3
Finished fingerprinting short_orig_intro.mp3
short_new_intro.mp3: Finding Matches... Aligning matches
{'match_time': 0.004990816116333008,
'match_info':
{'short_orig_intro.mp3':
{'confidence':
[138, 99],
'offset_frames':
[222, 264],
'locality_frames':
[[(62, 47), (57, 66), (57, 65), (60, 66), (60, 65), (58, 65), (58, 56), (57, 79), (57, 65), (57, 80), (57, 188), (57, 84), (57, 65), (57, 57), (57, 70), (57, 188), (54, 89), (54, 65), (54, 54), (54, 68), (54, 188), (64, 65), (64, 54), (64, 63), (64, 188), (54, 116), (54, 188), (54, 116), (54, 54), (54, 178), (65, 126), (65, 65), (65, 162), (56, 220), (56, 49), (54, 66), (54, 46), (62, 66), (62, 62), (56, 66), (56, 56), (84, 66), (84, 52), (64, 368), (64, 41), (54, 264), (54, 36), (54, 50), (59, 60), (54, 50), (50, 50)], [(58, 58), (58, 115), (58, 58), (58, 110), (55, 55), (55, 105), (55, 54), (55, 99), (55, 56), (55, 54), (55, 93), (55, 59), (55, 307), (55, 56), (55, 55), (55, 88), (55, 59), (55, 307), (58, 56), (58, 85), (58, 59), (58, 87), (54, 56), (54, 54), (54, 83), (54, 59), (54, 87), (91, 56), (91, 54), (91, 70), (91, 66), (91, 59), (91, 87), (56, 54), (56, 87), (60, 54), (60, 83), (61, 54), (61, 115)]],
'locality_frames_setting':
[4.96907],
'offset_seconds':
[10.30966, 12.26014],
'locality_seconds':
[[(2.87927, 2.18268), (2.64707, 3.06503), (2.64707, 3.01859), (2.78639, 3.06503), (2.78639, 3.01859), (2.69351, 3.01859), (2.69351, 2.60063), (2.64707, 3.66875), (2.64707, 3.01859), (2.64707, 3.71519), (2.64707, 8.7307), (2.64707, 3.90095), (2.64707, 3.01859), (2.64707, 2.64707), (2.64707, 3.25079), (2.64707, 8.7307), (2.50776, 4.13315), (2.50776, 3.01859), (2.50776, 2.50776), (2.50776, 3.15791), (2.50776, 8.7307), (2.97215, 3.01859), (2.97215, 2.50776), (2.97215, 2.92571), (2.97215, 8.7307), (2.50776, 5.38703), (2.50776, 8.7307), (2.50776, 5.38703), (2.50776, 2.50776), (2.50776, 8.2663), (3.01859, 5.85143), (3.01859, 3.01859), (3.01859, 7.52327), (2.60063, 10.21678), (2.60063, 2.27556), (2.50776, 3.06503), (2.50776, 2.13624), (2.87927, 3.06503), (2.87927, 2.87927), (2.60063, 3.06503), (2.60063, 2.60063), (3.90095, 3.06503), (3.90095, 2.41488), (2.97215, 17.08989), (2.97215, 1.90404), (2.50776, 12.26014), (2.50776, 1.67184), (2.50776, 2.322), (2.73995, 2.78639), (2.50776, 2.322), (2.322, 2.322)], [(2.69351, 2.69351), (2.69351, 5.34059), (2.69351, 2.69351), (2.69351, 5.10839), (2.5542, 2.5542), (2.5542, 4.87619), (2.5542, 2.50776), (2.5542, 4.59755), (2.5542, 2.60063), (2.5542, 2.50776), (2.5542, 4.31891), (2.5542, 2.73995), (2.5542, 14.25705), (2.5542, 2.60063), (2.5542, 2.5542), (2.5542, 4.08671), (2.5542, 2.73995), (2.5542, 14.25705), (2.69351, 2.60063), (2.69351, 3.94739), (2.69351, 2.73995), (2.69351, 4.04027), (2.50776, 2.60063), (2.50776, 2.50776), (2.50776, 3.85451), (2.50776, 2.73995), (2.50776, 4.04027), (4.22603, 2.60063), (4.22603, 2.50776), (4.22603, 3.25079), (4.22603, 3.06503), (4.22603, 2.73995), (4.22603, 4.04027), (2.60063, 2.50776), (2.60063, 4.04027), (2.78639, 2.50776), (2.78639, 3.85451), (2.83283, 2.50776), (2.83283, 5.34059)]]}}}
{'short_orig_intro.mp3': [
(10.30966, 138, 222, [(62, 47), (57, 66), (57, 65), (60, 66), (60, 65), (58, 65), (58, 56), (57, 79), (57, 65), (57, 80), (57, 188), (57, 84), (57, 65), (57, 57), (57, 70), (57, 188), (54, 89), (54, 65), (54, 54), (54, 68), (54, 188), (64, 65), (64, 54), (64, 63), (64, 188), (54, 116), (54, 188), (54, 116), (54, 54), (54, 178), (65, 126), (65, 65), (65, 162), (56, 220), (56, 49), (54, 66), (54, 46), (62, 66), (62, 62), (56, 66), (56, 56), (84, 66), (84, 52), (64, 368), (64, 41), (54, 264), (54, 36), (54, 50), (59, 60), (54, 50), (50, 50)],
[(2.87927, 2.18268), (2.64707, 3.06503), (2.64707, 3.01859), (2.78639, 3.06503), (2.78639, 3.01859), (2.69351, 3.01859), (2.69351, 2.60063), (2.64707, 3.66875), (2.64707, 3.01859), (2.64707, 3.71519), (2.64707, 8.7307), (2.64707, 3.90095), (2.64707, 3.01859), (2.64707, 2.64707), (2.64707, 3.25079), (2.64707, 8.7307), (2.50776, 4.13315), (2.50776, 3.01859), (2.50776, 2.50776), (2.50776, 3.15791), (2.50776, 8.7307), (2.97215, 3.01859), (2.97215, 2.50776), (2.97215, 2.92571), (2.97215, 8.7307), (2.50776, 5.38703), (2.50776, 8.7307), (2.50776, 5.38703), (2.50776, 2.50776), (2.50776, 8.2663), (3.01859, 5.85143), (3.01859, 3.01859), (3.01859, 7.52327), (2.60063, 10.21678), (2.60063, 2.27556), (2.50776, 3.06503), (2.50776, 2.13624), (2.87927, 3.06503), (2.87927, 2.87927), (2.60063, 3.06503), (2.60063, 2.60063), (3.90095, 3.06503), (3.90095, 2.41488), (2.97215, 17.08989), (2.97215, 1.90404), (2.50776, 12.26014), (2.50776, 1.67184), (2.50776, 2.322), (2.73995, 2.78639), (2.50776, 2.322), (2.322, 2.322)]),
(12.26014, 99, 264, [(58, 58), (58, 115), (58, 58), (58, 110), (55, 55), (55, 105), (55, 54), (55, 99), (55, 56), (55, 54), (55, 93), (55, 59), (55, 307), (55, 56), (55, 55), (55, 88), (55, 59), (55, 307), (58, 56), (58, 85), (58, 59), (58, 87), (54, 56), (54, 54), (54, 83), (54, 59), (54, 87), (91, 56), (91, 54), (91, 70), (91, 66), (91, 59), (91, 87), (56, 54), (56, 87), (60, 54), (60, 83), (61, 54), (61, 115)],
[(2.69351, 2.69351), (2.69351, 5.34059), (2.69351, 2.69351), (2.69351, 5.10839), (2.5542, 2.5542), (2.5542, 4.87619), (2.5542, 2.50776), (2.5542, 4.59755), (2.5542, 2.60063), (2.5542, 2.50776), (2.5542, 4.31891), (2.5542, 2.73995), (2.5542, 14.25705), (2.5542, 2.60063), (2.5542, 2.5542), (2.5542, 4.08671), (2.5542, 2.73995), (2.5542, 14.25705), (2.69351, 2.60063), (2.69351, 3.94739), (2.69351, 2.73995), (2.69351, 4.04027), (2.50776, 2.60063), (2.50776, 2.50776), (2.50776, 3.85451), (2.50776, 2.73995), (2.50776, 4.04027), (4.22603, 2.60063), (4.22603, 2.50776), (4.22603, 3.25079), (4.22603, 3.06503), (4.22603, 2.73995), (4.22603, 4.04027), (2.60063, 2.50776), (2.60063, 4.04027), (2.78639, 2.50776), (2.78639, 3.85451), (2.83283, 2.50776), (2.83283, 5.34059)])]}
To be honest the new output is not helping me much :( I still wonder how to read the information about the actual time of where the match starts and ends. Do I have to add the locality_frames_setting seconds to the tuples or do I have multiply the locality_frames_setting with the len of the locality_seconds or something :D
from audalign.
I totally getcha. Sorry, there is definitely a bug. I will have it fixed and a new release out by tonight!!
Each alignment has a list of locality tuples where each tuple has the same confidence. The tuples are of the form (target_file, against_file), where target_file is the input to recognize and against_file is an already fingerprinted file. Each number in the tuple is the position in seconds of the center of the locality window for the respective file.
It's not calculating the tuples correctly right now, but I'll fix that up
from audalign.
new_result = {}
for filename, info in result["match_info"].items():
temp_match = zip(
info["offset_seconds"],
info["confidence"],
info["offset_frames"],
info["locality_frames"],
info["locality_seconds"],
)
temp_match = sorted(temp_match, key=lambda x: x[0])
temp_match = list(zip(*temp_match))
new_result[filename] = {}
new_result[filename]["offset_seconds"] = temp_match[0]
new_result[filename]["confidence"] = temp_match[1]
new_result[filename]["offset_frames"] = temp_match[2]
new_result[filename]["localigy_frames"] = temp_match[3]
new_result[filename]["locality_seconds"] = temp_match[4]
If you use this, it puts it back in the same dictionary format, just sorted by offset_seconds
from audalign.
I totally getcha. Sorry, there is definitely a bug. I will have it fixed and a new release out by tonight!!
👍
Wow, that's nice to hear but take your time. There really is no need to rush this.
It's good a bug was found. I'll make sure to verify the fix :)
from audalign.
It works well for me, now! I uploaded a fix to pypi, v0.2.1, so it should be all fixed.
Thanks! It'd be awesome to know if it works or you find anything fishy.
from audalign.
Well done. It's works for me too :)
Here is the actual output:
short_new_intro.mp3: Finding Matches... Aligning matches
confidence
[97, 137]
offset_frames
[222, 264]
locality_frames
[[(88, 352), (115, 379), (129, 393), (138, 401), (157, 411), (157, 423), (157, 510), (157, 527), (164, 411), (164, 428), (164, 510), (164, 527), (169, 411), (169, 434), (169, 510), (169, 527), (180, 411), (180, 434), (180, 444), (180, 510), (180, 527), (183, 411), (183, 434), (183, 444), (183, 447), (183, 510), (183, 527), (263, 481), (263, 506), (270, 481), (270, 513), (286, 481), (286, 528)], [(620, 842), (643, 842), (653, 842), (653, 875), (671, 842), (671, 895), (683, 842), (683, 905), (692, 842), (692, 910), (692, 969), (702, 842), (702, 924), (702, 969), (708, 842), (708, 928), (708, 931), (708, 969), (750, 976), (755, 977), (773, 995), (804, 1026), (814, 1030), (836, 993), (836, 1041), (856, 993), (856, 1078), (864, 993), (864, 1086), (939, 1171), (966, 1175), (1033, 1259), (1040, 1262), (1063, 1289), (1075, 1296)]]
locality_frames_setting
[4.96907]
offset_seconds
[10.30966, 12.26014]
locality_seconds
[[(4.08671, 16.34685), (5.34059, 17.60073), (5.99075, 18.25088), (6.40871, 18.6224), (7.29107, 19.0868), (7.29107, 19.64408), (7.29107, 23.68435), (7.29107, 24.47383), (7.61615, 19.0868), (7.61615, 19.87628), (7.61615, 23.68435), (7.61615, 24.47383), (7.84834, 19.0868), (7.84834, 20.15492), (7.84834, 23.68435), (7.84834, 24.47383), (8.35918, 19.0868), (8.35918, 20.15492), (8.35918, 20.61932), (8.35918, 23.68435), (8.35918, 24.47383), (8.4985, 19.0868), (8.4985, 20.15492), (8.4985, 20.61932), (8.4985, 20.75864), (8.4985, 23.68435), (8.4985, 24.47383), (12.2137, 22.3376), (12.2137, 23.49859), (12.53878, 22.3376), (12.53878, 23.82367), (13.28181, 22.3376), (13.28181, 24.52027)], [(28.79274, 39.1024), (29.86086, 39.1024), (30.32526, 39.1024), (30.32526, 40.63492), (31.16118, 39.1024), (31.16118, 41.56372), (31.71846, 39.1024), (31.71846, 42.02812), (32.13642, 39.1024), (32.13642, 42.26032), (32.13642, 45.00027), (32.60082, 39.1024), (32.60082, 42.91048), (32.60082, 45.00027), (32.87946, 39.1024), (32.87946, 43.09624), (32.87946, 43.23556), (32.87946, 45.00027), (34.82993, 45.32535), (35.06213, 45.37179), (35.89805, 46.20771), (37.33769, 47.64735), (37.80209, 47.83311), (38.82376, 46.11483), (38.82376, 48.34395), (39.75256, 46.11483), (39.75256, 50.06222), (40.12408, 46.11483), (40.12408, 50.43374), (43.60707, 54.38113), (44.86095, 54.56689), (47.97243, 58.46785), (48.29751, 58.60717), (49.36562, 59.86104), (49.9229, 60.18612)]]
I tried to visualize the output for understanding it better. There are two matches with the offset 10.30966 and 12.26014. The graphic shows four tracks. The original audio, the new audio (I try to recognize) and both matches separately. The offset seconds are matching very well.
Were the matches start and end can be read from the locality seconds output. E.g. (4.08671, 16.34685). This is the middle of one of the locality frames for the first match of the target file (the one I want to recognize). So the match starts at 4.08671 - locality_frames_setting/2 (half since its the middle) is that right?
In blue color I also outlined two locality_seconds tuples (the last ones for each match).
Thank you for your support.
PS. the code you posted gives me an index out of range error :)
from audalign.
I fixed the code I posted above; it was missing a line to unzip it. Thanks for the notice!
The tuple (4.08671, 16.34685) means that the center of the locality window for your target file (short_new_intro) is at 4.08671 seconds and the center of the locality window for the against file(short_orig_intro) is at 16.34685. With the locality setting at 5, the windows are up to 5 seconds wide (2.5 seconds on either side), but they could be smaller if there aren't many fingerprints at that location.
Each tuple is a separate "match" as I've been calling it, and each offset is a separate "alignment."
I realized that the locality tuples were not being added correctly by confidence. I'm so sorry for all the bugs and frustration. I am pretty confident that it is all working correctly now. I just pushed v0.2.2, which ensures that all tuples have the same confidence. I also just pushed v0.3.0, which lets you specify the locality_filter_prop. It filters the tuples by the proportion of tuple confidence over the highest confidence for that offset. This also gives the confidence of each tuple as a third number in the tuple, so you can tell exactly what each tuple's confidence is. Again, so sorry for the bugs.
Thanks for trying it all out!! I hope this version and answer helps you out!
from audalign.
Hi, I'm sorry for the late answer. But I was finally able to try the new version (0.3.1).
First I thought something was not working right because printing the recognize results, I only get two locality frames. The reason seems to be the locality_filter_prop. Not defining a locality filter probability is resulting in a much smaller list of frames then before the feature was introduced:
'match_time': 0.005000114440917969,
'match_info':
{
'short_orig_intro.mp3':
{
'confidence':
[137, 97],
'offset_frames':
[222, 264],
'locality_frames':
[[(620, 842, 137)], [(88, 352, 97)]],
'locality_frames_setting':
[4.96907],
'offset_seconds':
[10.30966, 12.26014],
'locality_seconds':
[[(28.79274, 39.1024, 137)], [(4.08671, 16.34685, 97)]]}}}
Now I wonder which locality filter probability would yield the same results as before? And which probability would make the most sense in general?
EDIT
I was able to finish what I wanted to do (visualizing the difference of two audio files by plotting the offset to another over time) . This is the result with a locality_filter_prop of 0.1 and with 0.9:
So I think a very low locality_filter_prop works best for me 👍
from audalign.
Yep, that's all working right and as intended. It was supposed to include tuples that all had the same alignment confidence, but that's not how it was working before. There was no way to tell the confidence of each tuple, and most of them were irrelevant. Now it reports the confidence of each tuple as the third value.
There is no probability. By adjusting the locality filter proportion, it will include tuples with a proportion of the form (tuple confidence / alignment confidence) that is higher than the given proportion. (0 to 1.0)
A good value to use might be 0.5, but it really depends on how many tuples you want and how concerned you are about noise.
from audalign.
Related Issues (20)
- Higher Sample Rates issues HOT 2
- API changes coming soon HOT 1
- No module named 'audalign.align' HOT 3
- Update the dependencies HOT 7
- When aligning audio and video files the default script fails HOT 14
- total.wav is a single channel, why not align it in a multichannel fashion? HOT 5
- Resolving slight offset HOT 7
- [IDEA] Phase/Polarity adjust HOT 7
- [Request/Suggestion] Visualization of alignments HOT 5
- Request for a feature to control audio normalization before finding offset HOT 6
- [Feature] Add a filter for matches that are too close to each other HOT 2
- I know very little about coding. I never really used GitHub that much. HOT 52
- Audio Alignment Gains HOT 1
- How to sync and align one audio file wrt another audio file? HOT 7
- [REQ] fix license on GH HOT 2
- [Request/Suggestion] Support unpredictable frame drops and unmatching speed/pitch (drift correction) HOT 1
- [Request] Modular installation of the package HOT 1
- Can't install HOT 3
- Requested output format 'mka' is not known. - audalign.target_align with destination_path HOT 2
- "File could not be decoded" should be fatal error HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from audalign.