Hi,
Thanks for sharing the code. It is really helpful.
Can you please share the preprocessing code for YT-MUSIC for preparing the data. I understand the data was proposed earlier. But there is no public implementation available to convert the ambisonics audio to binaural audio and also the segmenting the long video. It would be really helpful if you can share these codes as well.
Hello, I am also a researcher in the direction of sound source separation. I have benefited a lot from seeing your work.
But I have some questions about sound processing.
Why should the raw audios' values be limited between -1 and 1?
I did some verification. There is a difference between performing STFT after sound mixing, and performing STFT first and then mixing on the spectrum. STFT(audio1+audio2)โ (STFT(audio1) + STFT(audio2))
How can this be explained?
Looking forward to your reply.