sheldontsui / pseudobinaural_cvpr2021 Goto Github PK
View Code? Open in Web Editor NEWCodebase for the paper "Visually Informed Binaural Audio Generation without Binaural Audios" (CVPR 2021)
License: Creative Commons Attribution 4.0 International
Codebase for the paper "Visually Informed Binaural Audio Generation without Binaural Audios" (CVPR 2021)
License: Creative Commons Attribution 4.0 International
When I use APNet_train_crop.sh or APNet_train.sh to run the code, there is an error:
Traceback (most recent call last):
File "train.py", line 210, in
for i, data in enumerate(data_loader):
File "/home/yzx/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 517, in next
data = self._next_data()
File "/home/yzx/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/home/yzx/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/home/yzx/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/yzx/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/home/yzx/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/yzx/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/yzx/lunwen/PseudoBinaural_CVPR2021-master/data/Augment_dataset.py", line 66, in getitem
data_ret, data_ret_sep = self._get_pseudo_item(index)
File "/home/yzx/lunwen/PseudoBinaural_CVPR2021-master/data/Pseudo_dataset.py", line 263, in _get_pseudo_item
stereo = self.construct_stereo_ambi(pst_sources)
File "/home/yzx/lunwen/PseudoBinaural_CVPR2021-master/data/Pseudo_dataset.py", line 120, in construct_stereo_ambi
signals = np.stack([src.signal for src in pst_sources], axis=1) # signals shape: [Len, n_signals]
File "<array_function internals>", line 5, in stack
File "/home/yzx/anaconda3/envs/pytorch/lib/python3.8/site-packages/numpy/core/shape_base.py", line 423, in stack
raise ValueError('need at least one array to stack')
Before the running ,I remade the hdf5 file and train_FAIR_data.txt, the sequences of them are all according to the sequence of the content of ./new_splits/split1,for example,the first line of train_FAIR_data.txt is “/home/yzx/lunwen/datasets/FAIR-PLAY/binaural_audios/000383.wav,/home/yzx/lunwen/datasets/FAIR-PLAY/frames/000383”. The former means the path of each binaural_audio,the latter is the path of the frames corresponding to the binaural_audio. For each video, I take the method of
Extracting 10 frames per second as your code. Because I found that in the function “_get_pseudo_item” in Pseudo_dataset.py,They are splited to audio_file and img_folder respectively. But I don't know why there is always the problem. I am eager for your help.
Hello, I find that you mentioned 'new_patches' folder, but I don't find that in this repo. Shall I do it by myself? Can you provide it for me? Thank you!
Hi, I have some confusion about the evaluation metric used for the paper.
Thanks.
Hi, I have a question regarding the HRIR used in the experiment. Is the HRIR data from the CIPIC HRTF Database? It seems like you are using only the subject 3 data. I am wondering whether there will be a big difference if you are using HRIR for other subjects. Thank you.
Thank you for your great research. I have read your paper and I am reading the code. I have a question on how to calculate the “virtual speakers” when making pseudo binaural sound.
In your paper, you calculate the “virtual array” in equation (7).
I think the same calculation is performed by costruct_stereo_ambi
function in data/Pseudo_dataset.py
in your code.
I understand that variables in this function correspond to equation (7) as follows.
Here, in data/Pseudo_dataset.py
line 123, we have
array_speakers_sound = np.dot(ambisonic, self.sph_mat.T)
but I cannot understand how Is taken into account.
I would like to know how you reflect the calculation of equation (7) in your code.
Regards.
Hi, Thanks for sharing the code. Could you please clarify which script to run for the results mentioned in Table-1? In the scripts folder, there are two types of scripts one with simple audionet
and the other APNet
backbone. As per the paper, for the separation task, APNet
backbone is only used whereas for the stereo task APNet
is not used. But I could not find a setting that satisfies this condition. Please clarify.
Hi. I test the model that you hava listed in the Readme. But the result is like this
This result is not as good as that in the paer.
And I trained a model by myself with the ./scripts/sepstereo_Augment/audionet_train_crop.sh, but the result is worse, which is shown below
Could you help me how to improve the result? I am sincerely eager for your help.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.