jpthu17 / hbi Goto Github PK
View Code? Open in Web Editor NEW[CVPR 2023 Highlight] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
License: Apache License 2.0
[CVPR 2023 Highlight] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
License: Apache License 2.0
Thanks for sharing a nice work! When I check the code, I find the config.estimator is None, which is the weight for self.banzhafteacher. So How can I get the checkpoint of banzhafteacher?
Thank you for your excellent work!
But I have a question, is the following code "banzhaf[:, i, j] = self.banzhaf_interaction(retrieve_logits, text_mask, video_mask, text_weight,video_weight, i, j)" missing a plus sign?
for i in range(self.t_len):
for j in range(self.v_len):
for _ in range(self.num):
banzhaf[:, i, j] = self.banzhaf_interaction(retrieve_logits, text_mask, video_mask, text_weight,
video_weight, i, j)
banzhaf = banzhaf / self.num
banzhaf = torch.einsum('btv,bt->btv', [banzhaf, text_mask])
banzhaf = torch.einsum('btv,bv->btv', [banzhaf, video_mask])
return banzhaf
Thank you for your great work!
But I'm still wondering how the Banzhaf Interaction works in the following codes:
s_t = (torch.rand((self.t_len)) > 0.5).long().to(retrieve_logits.device)
s_j = (torch.rand((self.v_len)) > 0.5).long().to(retrieve_logits.device)
s_t[i], s_j[j] = 0, 0
_text_mask, _video_mask = text_mask.clone(), video_mask.clone()
_text_mask[:, s_t] = 0
_video_mask[:, s_j] = 0
Does the _text_mask[:, s_t] = 0 mean masking the first word token and second word token because values in s_t and s_j are only 1 and 0? Or I just have the wrong understanding about it.
any reply will be helpful!
你好,请问对于Activity-Net数据集,max_words 与max_frames 都是64的情况下,v_rate0到t_rate1都是保持原来的MSR-VTT的标准吗,以及Activity-Net的训练的Batchsize是64还是128?
Did you ignore that _text_mask[:, i] = 0 when calculating banzhaf_value3?
您好,我想请问您会在近期放一个arxiv吗,我对您的论文非常感兴趣,如果可以的话希望可以尽快阅读到您的著作,感谢!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.