swimmiing / acl-ssl Goto Github PK
View Code? Open in Web Editor NEWRepository of the WACV'24 paper "Can CLIP Help Sound Source Localization?"
Repository of the WACV'24 paper "Can CLIP Help Sound Source Localization?"
Hi,
Thanks for making the code public for this interesting work.
I was trying to train the network on VGG. I noticed there is a difference between the batch size and learning rate mentioned in the paper compared to the one mentioned in the config file here. The learning rate and batch size are mentioned in the paper to be 1e-3
and 16
, whereas in the config file, the values are 1e-4
and 8
. I tried training with both settings, I observed that with the values (1e-3
and 16
)` mentioned in the paper, the training loss diverges whereas the values for the config file seem to perform well. I am attaching the screenshot of both curves below for your reference.
Also, in practice, the batch size for InfoNCE loss is usually higher (~128), is there any specific reason you have chosen a small batch size (8/16)? And what can be the reason for the curve diverging for a higher batch size?
Thanks.
I noticed there is a difference in the way the outputs are converted into logits for training and evaluation. Here the logits are obtained by multiplying with w
and adding b
to it during training. But during evaluation, the operation is different, where the output is added to b/w
. Can you please clarify it? Thanks in advance.
Dear authors,
Thanks for open-sourcing this project. Generally, the codebase is well organized.
However, I can't get the same or similar number as in the paper. I expect there are some basic mistakes I made.
Below are My "test_rst.txt" on VGGSound dataset, the AP is worse than that of the paper.
ACL (vggss_test with thr = 0.05) AP50(cIoU)=30.579964850615116, AUC=36.1779925795743 ACL (vggss_test with thr = 0.1) AP50(cIoU)=36.711579769576254, AUC=39.26381566100371 ACL (vggss_test with thr = 0.15) AP50(cIoU)=39.58211286858035, AUC=40.89728568638938 ACL (vggss_test with thr = 0.2) AP50(cIoU)=41.08572544424917, AUC=41.67350126928334 ACL (vggss_test with thr = 0.25) AP50(cIoU)=41.59343878148799, AUC=41.87561023237649 ACL (vggss_test with thr = 0.3) AP50(cIoU)=41.515329037297406, AUC=41.8140988088264 ACL (vggss_test with thr = 0.35) AP50(cIoU)=41.67154852567858, AUC=41.55828939660222 ACL (vggss_test with thr = 0.4) AP50(cIoU)=41.49580160124975, AUC=41.17555165006834 ACL (vggss_test with thr = 0.45) AP50(cIoU)=40.81234133958211, AUC=40.65026362038665 ACL (vggss_test with thr = 0.5) AP50(cIoU)=39.660222612770944, AUC=40.068346026166765 ACL (vggss_test with thr = 0.55) AP50(cIoU)=38.6838508103886, AUC=39.41515329037297 ACL (vggss_test with thr = 0.6) AP50(cIoU)=37.609841827768015, AUC=38.604764694395634 ACL (vggss_test with thr = 0.65) AP50(cIoU)=36.1452841241945, AUC=37.68892794376099 ACL (vggss_test with thr = 0.7) AP50(cIoU)=34.876000781097446, AUC=36.6373755125952 ACL (vggss_test with thr = 0.75) AP50(cIoU)=32.98183948447569, AUC=35.32024995118142 ACL (vggss_test with thr = 0.8) AP50(cIoU)=30.931458699472756, AUC=33.86447959382934 ACL (vggss_test with thr = 0.85) AP50(cIoU)=27.514157391134543, AUC=31.9859402460457 ACL (vggss_test with thr = 0.9) AP50(cIoU)=23.667252489748098, AUC=29.254051942979892 ACL (vggss_test with thr = 0.95) AP50(cIoU)=16.285881663737552, AUC=23.326498730716654
I have attached one of the visualization outputs, and it seems reasonable.
I have a few doubts:
Do you know any other possible ways to result in the accuracy drop?
Thanks for your help!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.