Hi, Thank you for your implementation, it is really helpful since the original ver

Did exactly this quite recently. Put <div class="sn

Thank you for the instructions <a class="user-mention notranslate" data-hovercard-type

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thank you for the code <a class="user-mention notranslate" data-hovercard-type="user"

Data Parallel for PWC-Net about pytorch-pwc HOT 9 CLOSED

sniklaus commented on June 27, 2024 1

Data Parallel for PWC-Net

from pytorch-pwc.

Comments (9)

TengdaHan commented on June 27, 2024 2

Another potentially efficient solution is to add device id into the dictionary keys. Like this:

Backward_tensorGrid = {}
Backward_tensorPartial = {}

def Backward(tensorInput, tensorFlow):
    if str(tensorFlow.size())+str(tensorFlow.device) not in Backward_tensorGrid:
        tensorHorizontal = torch.linspace(-1.0, 1.0, tensorFlow.size(3)).view(1, 1, 1, tensorFlow.size(3)).expand(tensorFlow.size(0), -1, tensorFlow.size(2), -1)
        tensorVertical = torch.linspace(-1.0, 1.0, tensorFlow.size(2)).view(1, 1, tensorFlow.size(2), 1).expand(tensorFlow.size(0), -1, -1, tensorFlow.size(3))

        Backward_tensorGrid[str(tensorFlow.size())+str(tensorFlow.device)] = torch.cat([ tensorHorizontal, tensorVertical ], 1).cuda()
    # end

    if str(tensorFlow.size())+str(tensorFlow.device) not in Backward_tensorPartial:
        Backward_tensorPartial[str(tensorFlow.size())+str(tensorFlow.device)] = tensorFlow.new_ones([ tensorFlow.size(0), 1, tensorFlow.size(2), tensorFlow.size(3) ])
    # end

    tensorFlow = torch.cat([ tensorFlow[:, 0:1, :, :] / ((tensorInput.size(3) - 1.0) / 2.0), tensorFlow[:, 1:2, :, :] / ((tensorInput.size(2) - 1.0) / 2.0) ], 1)
    tensorInput = torch.cat([ tensorInput, Backward_tensorPartial[str(tensorFlow.size())+str(tensorFlow.device)] ], 1)

    tensorOutput = torch.nn.functional.grid_sample(input=tensorInput, grid=(Backward_tensorGrid[str(tensorFlow.size())+str(tensorFlow.device)] + tensorFlow).permute(0, 2, 3, 1), mode='bilinear', padding_mode='zeros')

    tensorMask = tensorOutput[:, -1:, :, :]; tensorMask[tensorMask > 0.999] = 1.0; tensorMask[tensorMask < 1.0] = 0.0

    return tensorOutput[:, :-1, :, :] * tensorMask
# end

from pytorch-pwc.

v-iashin commented on June 27, 2024 1

Did exactly this quite recently.

    Backward_tensorGrid = {}
    Backward_tensorPartial = {}

inside of the Backward function.
2. Repeat the things I described in #22.
3. Wrap the code in run.py in a module so you can do forward call. The forward call should have a device argument.
4. Write another module that does what you need with each chunk of your dataset you want to parallelize (tensor with indices in my case). Inside of this module you can access the gpu_id from the input dataset (by using for example chunk_tensor.device). You can use this gpu index as the device input argument for the module defined in 3rd step.
5. Initialize this module, replicate, scatter your data, apply.

Mind the preprocessing.

from pytorch-pwc.

v-iashin commented on June 27, 2024 1

Now, I think that another solution would be to remove the lines that check whether the value with this key exists in a dict

pytorch-pwc/run.py

Line 47 in da45293

if str(tensorFlow.size()) not in Backward_tensorGrid:

and here

pytorch-pwc/run.py

Line 54 in da45293

if str(tensorFlow.size()) not in Backward_tensorPartial:

from pytorch-pwc.

sniklaus commented on June 27, 2024

Thank you for the instructions @v-iashin! Would you mind sharing the error message, @ZacharyGong?

from pytorch-pwc.

ZacharyGong commented on June 27, 2024

Hi @sniklaus ,

I tried the step No.1 in v-iashin's reply. For my case, the code works. I check the flow generated in the multi-GPUs environment, and the flow seems correct. If you can check if this will cause some mistakes, I will be very grateful.

I leave my original snippet as follow for future review:

I wrote a small code snippet to reproduce the problem that I met

import torch
import torch.nn as nn
from flow_generation_model.model import Network as flow_generation_net
import math

def estimate(moduleNetwork,tensorFirst,tensorSecond):
    intWidth = tensorFirst.size(-1)
    intHeight = tensorFirst.size(-2)

    tensorPreprocessedFirst = tensorFirst
    tensorPreprocessedSecond = tensorSecond

    intPreprocessedWidth = int(math.floor(math.ceil(intWidth / 64.0) * 64.0))
    intPreprocessedHeight = int(math.floor(math.ceil(intHeight / 64.0) * 64.0))

    tensorPreprocessedFirst = torch.nn.functional.interpolate(input=tensorPreprocessedFirst, size=(intPreprocessedHeight, intPreprocessedWidth), mode='bilinear', align_corners=False)
    tensorPreprocessedSecond = torch.nn.functional.interpolate(input=tensorPreprocessedSecond, size=(intPreprocessedHeight, intPreprocessedWidth), mode='bilinear', align_corners=False)

    tensorFlow = 20.0 * torch.nn.functional.interpolate(input=moduleNetwork(tensorPreprocessedFirst, tensorPreprocessedSecond), size=(intHeight, intWidth), mode='bilinear', align_corners=False)

    tensorFlow[:, 0, :, :] *= float(intWidth) / float(intPreprocessedWidth)
    tensorFlow[:, 1, :, :] *= float(intHeight) / float(intPreprocessedHeight)
    return tensorFlow

class Network(nn.Module):
	def __init__(self):
		super(Network, self).__init__()
		self.flow_generation_net = flow_generation_net()

	def forward(self, x1, x2):
		flow = estimate(self.flow_generation_net, x1, x2)
		return flow

pwc_net = Network().cuda()
gpus = [0,1]
pwc_net = nn.DataParallel(pwc_net, device_ids=gpus)
MSE = nn.MSELoss()

x1 = torch.rand([4,3,128,128]).cuda()
x2 = torch.rand([4,3,128,128]).cuda()
flow_gt = torch.rand([4,2,128,128]).cuda()

output = pwc_net(x1, x2)
#print(output.shape)
loss = MSE(output, flow_gt)
loss.backward()

where the flow_generate_net is your network in the run.py, by running this code, I got the following error:

Traceback (most recent call last):
  File "snippet.py", line 43, in <module>
    output = pwc_net(x1, x2)
  File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 123, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 133, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 77, in parallel_apply
    raise output
  File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 53, in _worker
    output = module(*input, **kwargs)
  File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "snippet.py", line 31, in forward
    flow = estimate(self.flow_generation_net, x1, x2)
  File "snippet.py", line 19, in estimate
    tensorFlow = 20.0 * torch.nn.functional.interpolate(input=moduleNetwork(tensorPreprocessedFirst, tensorPreprocessedSecond), size=(intHeight, intWidth), mode='bilinear', align_corners=False)
  File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/video/flow_generation_test/flow_generation_model/model.py", line 242, in forward
    objectEstimate = self.moduleFiv(tensorFirst[-2], tensorSecond[-2], objectEstimate)
  File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/video/flow_generation_test/flow_generation_model/model.py", line 177, in forward
    tensorVolume = torch.nn.functional.leaky_relu(input=correlation.FunctionCorrelation(tensorFirst=tensorFirst, tensorSecond=Backward(tensorInput=tensorSecond, tensorFlow=tensorFlow * self.dblBackward)), negative_slope=0.1, inplace=False)
  File "/mnt/video/flow_generation_test/flow_generation_model/model.py", line 36, in Backward
    tensorOutput = torch.nn.functional.grid_sample(input=tensorInput, grid=(Backward_tensorGrid[str(tensorFlow.size())] + tensorFlow).permute(0, 2, 3, 1), mode='bilinear', padding_mode='zeros')
RuntimeError: arguments are located on different GPUs at /pytorch/aten/src/THC/generated/../generic/THCTensorMathPointwise.cu:234

from pytorch-pwc.

v-iashin commented on June 27, 2024

Hopefully, it can save some time for you.

The problem is that you import this run.py module and these two dicts are created once the module is imported.

Now imagine the following scenario. The first GPU adds a tensor to Backward_tensorGrid dict and the second GPU tries to access the dict with the key (Backward_tensorGrid[str(tensorFlow.size())]) which maps to the tensor that is on the first GPU. Hence the error arguments are located on different GPUs.

To verify this try to print the content of these dicts inside of the Backward.

from pytorch-pwc.

sniklaus commented on June 27, 2024

Thank you for the code @ZacharyGong and thank you for sharing your thoughts with us @v-iashin!

The easiest workaround is probably to not use the dictionary as suggested by @v-iashin in his most recent post. That may have a negative impact on performance though that I am unable to predict. Another solution would be to have the tensors in the dict / cache be located on the CPU instead of the GPU, again with a negative effect on performance. The backwards-warping would then be as follows.

def Backward(tensorInput, tensorFlow):
	if str(tensorFlow.size()) not in Backward_tensorGrid:
		tensorHorizontal = torch.linspace(-1.0, 1.0, tensorFlow.size(3)).view(1, 1, 1, tensorFlow.size(3)).expand(tensorFlow.size(0), -1, tensorFlow.size(2), -1)
		tensorVertical = torch.linspace(-1.0, 1.0, tensorFlow.size(2)).view(1, 1, tensorFlow.size(2), 1).expand(tensorFlow.size(0), -1, -1, tensorFlow.size(3))

		Backward_tensorGrid[str(tensorFlow.size())] = torch.cat([ tensorHorizontal, tensorVertical ], 1)
	# end

	if str(tensorFlow.size()) not in Backward_tensorPartial:
		Backward_tensorPartial[str(tensorFlow.size())] = torch.FloatTensor([ tensorFlow.size(0), 1, tensorFlow.size(2), tensorFlow.size(3) ]).fill_(1.0)
	# end

	tensorFlow = torch.cat([ tensorFlow[:, 0:1, :, :] / ((tensorInput.size(3) - 1.0) / 2.0), tensorFlow[:, 1:2, :, :] / ((tensorInput.size(2) - 1.0) / 2.0) ], 1)
	tensorInput = torch.cat([ tensorInput, Backward_tensorPartial[str(tensorFlow.size())] ].cuda(), 1)

	tensorOutput = torch.nn.functional.grid_sample(input=tensorInput, grid=(Backward_tensorGrid[str(tensorFlow.size())].cuda() + tensorFlow).permute(0, 2, 3, 1), mode='bilinear', padding_mode='zeros')

	tensorMask = tensorOutput[:, -1:, :, :]; tensorMask[tensorMask > 0.999] = 1.0; tensorMask[tensorMask < 1.0] = 0.0

	return tensorOutput[:, :-1, :, :] * tensorMask
# end

from pytorch-pwc.

ZacharyGong commented on June 27, 2024

Thank you guys, both of you give out good solutions!

from pytorch-pwc.

sniklaus commented on June 27, 2024

Interesting approach, thank you for sharing!

from pytorch-pwc.

Data Parallel for PWC-Net about pytorch-pwc HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent