Comments (9)
Another potentially efficient solution is to add device id into the dictionary keys. Like this:
Backward_tensorGrid = {}
Backward_tensorPartial = {}
def Backward(tensorInput, tensorFlow):
if str(tensorFlow.size())+str(tensorFlow.device) not in Backward_tensorGrid:
tensorHorizontal = torch.linspace(-1.0, 1.0, tensorFlow.size(3)).view(1, 1, 1, tensorFlow.size(3)).expand(tensorFlow.size(0), -1, tensorFlow.size(2), -1)
tensorVertical = torch.linspace(-1.0, 1.0, tensorFlow.size(2)).view(1, 1, tensorFlow.size(2), 1).expand(tensorFlow.size(0), -1, -1, tensorFlow.size(3))
Backward_tensorGrid[str(tensorFlow.size())+str(tensorFlow.device)] = torch.cat([ tensorHorizontal, tensorVertical ], 1).cuda()
# end
if str(tensorFlow.size())+str(tensorFlow.device) not in Backward_tensorPartial:
Backward_tensorPartial[str(tensorFlow.size())+str(tensorFlow.device)] = tensorFlow.new_ones([ tensorFlow.size(0), 1, tensorFlow.size(2), tensorFlow.size(3) ])
# end
tensorFlow = torch.cat([ tensorFlow[:, 0:1, :, :] / ((tensorInput.size(3) - 1.0) / 2.0), tensorFlow[:, 1:2, :, :] / ((tensorInput.size(2) - 1.0) / 2.0) ], 1)
tensorInput = torch.cat([ tensorInput, Backward_tensorPartial[str(tensorFlow.size())+str(tensorFlow.device)] ], 1)
tensorOutput = torch.nn.functional.grid_sample(input=tensorInput, grid=(Backward_tensorGrid[str(tensorFlow.size())+str(tensorFlow.device)] + tensorFlow).permute(0, 2, 3, 1), mode='bilinear', padding_mode='zeros')
tensorMask = tensorOutput[:, -1:, :, :]; tensorMask[tensorMask > 0.999] = 1.0; tensorMask[tensorMask < 1.0] = 0.0
return tensorOutput[:, :-1, :, :] * tensorMask
# end
from pytorch-pwc.
Did exactly this quite recently.
- Put
Backward_tensorGrid = {}
Backward_tensorPartial = {}
inside of the Backward
function.
2. Repeat the things I described in #22.
3. Wrap the code in run.py
in a module so you can do forward call. The forward call should have a device
argument.
4. Write another module that does what you need with each chunk of your dataset you want to parallelize (tensor with indices in my case). Inside of this module you can access the gpu_id from the input dataset (by using for example chunk_tensor.device
). You can use this gpu index as the device
input argument for the module defined in 3rd step.
5. Initialize this module, replicate, scatter your data, apply.
Mind the preprocessing.
from pytorch-pwc.
Now, I think that another solution would be to remove the lines that check whether the value with this key exists in a dict
Line 47 in da45293
and here
Line 54 in da45293
from pytorch-pwc.
Thank you for the instructions @v-iashin! Would you mind sharing the error message, @ZacharyGong?
from pytorch-pwc.
Hi @sniklaus ,
I tried the step No.1 in v-iashin's reply. For my case, the code works. I check the flow generated in the multi-GPUs environment, and the flow seems correct. If you can check if this will cause some mistakes, I will be very grateful.
I leave my original snippet as follow for future review:
I wrote a small code snippet to reproduce the problem that I met
import torch
import torch.nn as nn
from flow_generation_model.model import Network as flow_generation_net
import math
def estimate(moduleNetwork,tensorFirst,tensorSecond):
intWidth = tensorFirst.size(-1)
intHeight = tensorFirst.size(-2)
tensorPreprocessedFirst = tensorFirst
tensorPreprocessedSecond = tensorSecond
intPreprocessedWidth = int(math.floor(math.ceil(intWidth / 64.0) * 64.0))
intPreprocessedHeight = int(math.floor(math.ceil(intHeight / 64.0) * 64.0))
tensorPreprocessedFirst = torch.nn.functional.interpolate(input=tensorPreprocessedFirst, size=(intPreprocessedHeight, intPreprocessedWidth), mode='bilinear', align_corners=False)
tensorPreprocessedSecond = torch.nn.functional.interpolate(input=tensorPreprocessedSecond, size=(intPreprocessedHeight, intPreprocessedWidth), mode='bilinear', align_corners=False)
tensorFlow = 20.0 * torch.nn.functional.interpolate(input=moduleNetwork(tensorPreprocessedFirst, tensorPreprocessedSecond), size=(intHeight, intWidth), mode='bilinear', align_corners=False)
tensorFlow[:, 0, :, :] *= float(intWidth) / float(intPreprocessedWidth)
tensorFlow[:, 1, :, :] *= float(intHeight) / float(intPreprocessedHeight)
return tensorFlow
class Network(nn.Module):
def __init__(self):
super(Network, self).__init__()
self.flow_generation_net = flow_generation_net()
def forward(self, x1, x2):
flow = estimate(self.flow_generation_net, x1, x2)
return flow
pwc_net = Network().cuda()
gpus = [0,1]
pwc_net = nn.DataParallel(pwc_net, device_ids=gpus)
MSE = nn.MSELoss()
x1 = torch.rand([4,3,128,128]).cuda()
x2 = torch.rand([4,3,128,128]).cuda()
flow_gt = torch.rand([4,2,128,128]).cuda()
output = pwc_net(x1, x2)
#print(output.shape)
loss = MSE(output, flow_gt)
loss.backward()
where the flow_generate_net
is your network in the run.py
, by running this code, I got the following error:
Traceback (most recent call last):
File "snippet.py", line 43, in <module>
output = pwc_net(x1, x2)
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 123, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 133, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 77, in parallel_apply
raise output
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 53, in _worker
output = module(*input, **kwargs)
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "snippet.py", line 31, in forward
flow = estimate(self.flow_generation_net, x1, x2)
File "snippet.py", line 19, in estimate
tensorFlow = 20.0 * torch.nn.functional.interpolate(input=moduleNetwork(tensorPreprocessedFirst, tensorPreprocessedSecond), size=(intHeight, intWidth), mode='bilinear', align_corners=False)
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/video/flow_generation_test/flow_generation_model/model.py", line 242, in forward
objectEstimate = self.moduleFiv(tensorFirst[-2], tensorSecond[-2], objectEstimate)
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/video/flow_generation_test/flow_generation_model/model.py", line 177, in forward
tensorVolume = torch.nn.functional.leaky_relu(input=correlation.FunctionCorrelation(tensorFirst=tensorFirst, tensorSecond=Backward(tensorInput=tensorSecond, tensorFlow=tensorFlow * self.dblBackward)), negative_slope=0.1, inplace=False)
File "/mnt/video/flow_generation_test/flow_generation_model/model.py", line 36, in Backward
tensorOutput = torch.nn.functional.grid_sample(input=tensorInput, grid=(Backward_tensorGrid[str(tensorFlow.size())] + tensorFlow).permute(0, 2, 3, 1), mode='bilinear', padding_mode='zeros')
RuntimeError: arguments are located on different GPUs at /pytorch/aten/src/THC/generated/../generic/THCTensorMathPointwise.cu:234
from pytorch-pwc.
Hopefully, it can save some time for you.
The problem is that you import this run.py
module and these two dicts are created once the module is imported.
Now imagine the following scenario. The first GPU adds a tensor to Backward_tensorGrid
dict and the second GPU tries to access the dict with the key (Backward_tensorGrid[str(tensorFlow.size())]
) which maps to the tensor that is on the first GPU. Hence the error arguments are located on different GPUs
.
To verify this try to print the content of these dicts inside of the Backward
.
from pytorch-pwc.
Thank you for the code @ZacharyGong and thank you for sharing your thoughts with us @v-iashin!
The easiest workaround is probably to not use the dictionary as suggested by @v-iashin in his most recent post. That may have a negative impact on performance though that I am unable to predict. Another solution would be to have the tensors in the dict / cache be located on the CPU instead of the GPU, again with a negative effect on performance. The backwards-warping would then be as follows.
def Backward(tensorInput, tensorFlow):
if str(tensorFlow.size()) not in Backward_tensorGrid:
tensorHorizontal = torch.linspace(-1.0, 1.0, tensorFlow.size(3)).view(1, 1, 1, tensorFlow.size(3)).expand(tensorFlow.size(0), -1, tensorFlow.size(2), -1)
tensorVertical = torch.linspace(-1.0, 1.0, tensorFlow.size(2)).view(1, 1, tensorFlow.size(2), 1).expand(tensorFlow.size(0), -1, -1, tensorFlow.size(3))
Backward_tensorGrid[str(tensorFlow.size())] = torch.cat([ tensorHorizontal, tensorVertical ], 1)
# end
if str(tensorFlow.size()) not in Backward_tensorPartial:
Backward_tensorPartial[str(tensorFlow.size())] = torch.FloatTensor([ tensorFlow.size(0), 1, tensorFlow.size(2), tensorFlow.size(3) ]).fill_(1.0)
# end
tensorFlow = torch.cat([ tensorFlow[:, 0:1, :, :] / ((tensorInput.size(3) - 1.0) / 2.0), tensorFlow[:, 1:2, :, :] / ((tensorInput.size(2) - 1.0) / 2.0) ], 1)
tensorInput = torch.cat([ tensorInput, Backward_tensorPartial[str(tensorFlow.size())] ].cuda(), 1)
tensorOutput = torch.nn.functional.grid_sample(input=tensorInput, grid=(Backward_tensorGrid[str(tensorFlow.size())].cuda() + tensorFlow).permute(0, 2, 3, 1), mode='bilinear', padding_mode='zeros')
tensorMask = tensorOutput[:, -1:, :, :]; tensorMask[tensorMask > 0.999] = 1.0; tensorMask[tensorMask < 1.0] = 0.0
return tensorOutput[:, :-1, :, :] * tensorMask
# end
from pytorch-pwc.
Thank you guys, both of you give out good solutions!
from pytorch-pwc.
Interesting approach, thank you for sharing!
from pytorch-pwc.
Related Issues (20)
- Can it test in CPU device? HOT 1
- Cannot inspect the model using TensorBoard HOT 4
- Generalization to unseen data HOT 1
- how to compute optical flow for small size imgs? HOT 1
- what is the range of optical flow value in an image? HOT 1
- About the direction of estimated flow HOT 2
- About pwc-net pretrained model in pytorch version HOT 1
- Just want to confirm if my analysis of your commit 5f4d7de is correct HOT 7
- In run.py, self.netSix has out_channels = 196. Paper says 192. HOT 1
- Receiving a random error, CUDA_ERROR_ILLEGAL_ADDRESS HOT 9
- CuPy correlation layer error HOT 4
- Cupy cuda error. HOT 1
- What is the purpose of `tenMask` in `run.backwarp`? HOT 2
- the two same images HOT 4
- I already did what @fabiopk said but I am still getting this error HOT 2
- How to change the cuda version of correlation.py to the python version HOT 1
- how to import frames of river videos into PWC-Net codes HOT 7
- cupy issue HOT 1
- Normalization in `backward` function. HOT 1
- Deprecation of cupy.cuda.compile_with_cache() in cupy 13.0 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch-pwc.