Thanks for this great work! I am a bit confused about the computatio

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Why do we average out correlation matrices from different GPUs? Is this mathematically valid? about barlowtwins HOT 4 CLOSED

facebookresearch commented on July 3, 2024

Why do we average out correlation matrices from different GPUs? Is this mathematically valid?

from barlowtwins.

Comments (4)

jzbontar commented on July 3, 2024

Or even better, why not compute the full cross correlation matrix (i.e. gather all embedding vectors onto one device and computing the cross correlation there?)

Summing cross correlation matrices (like we do in our code) is equivalent to computing the full cross correlation matrix by gathering all embedding vectors onto one device. They give you exactly the same result.

Think about how to distribute a dot product operation across n machines (computing the cross correlation matrix is basically just a bunch of dot products, one for each pair of features). You could split the vectors into n chunks, compute n smaller dot products (one dot product for each of the n chunks) and sum them to get the final result. Or if you prefer code:

>>> import torch
>>> x = torch.Tensor(8).normal_()
>>> y = torch.Tensor(8).normal_()
>>> torch.allclose(x[:4] @ y[:4] + x[4:] @ y[4:], x @ y)
True

from barlowtwins.

radekd91 commented on July 3, 2024

Thanks for the explanation. It's clear to me, now.

from barlowtwins.

ltnghia commented on July 3, 2024

Hi, how do we do this on a single GPU? Because torch.distributed seems not to work on a single GPU? I use torch.nn.DataParallel on to run the code on a single GPU.

# empirical cross-correlation matrix 
 c = self.bn(z1).T @ self.bn(z2) 

 # sum the cross-correlation matrix between all gpus 
 c.div_(self.args.batch_size) 
 torch.distributed.all_reduce(c) 

 on_diag = torch.diagonal(c).add_(-1).pow_(2).sum() 
 off_diag = off_diagonal(c).pow_(2).sum() 
 loss = on_diag + self.args.lambd * off_diag

from barlowtwins.

TQi-Yang commented on July 3, 2024

Hi @ltnghia .
Is the problem solved? I think the following code may not be needed, right?:

sum the cross-correlation matrix between all gpus

c.div_(self.args.batch_size)
torch.distributed.all_reduce(c)

from barlowtwins.

Recommend Projects

Why do we average out correlation matrices from different GPUs? Is this mathematically valid? about barlowtwins HOT 4 CLOSED

Comments (4)

sum the cross-correlation matrix between all gpus

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent