Coder Social home page Coder Social logo

GPU problem about minkowskiengine HOT 4 CLOSED

nvidia avatar nvidia commented on May 16, 2024
GPU problem

from minkowskiengine.

Comments (4)

chrischoy avatar chrischoy commented on May 16, 2024

Seems related to this #40

from minkowskiengine.

chrischoy avatar chrischoy commented on May 16, 2024

This should not happen for sure. I'll dig deeper into this after some deadlines.

But is the GPU mapping different from the physical GPU ID on your system?
I found that on some servers the GPU ID and the actual GPU used are mapped differently.

Also, have you tried using export CUDA_VISIBLE_DEVICES=0 or any other GPU ID to mask the visible GPU to the program?

from minkowskiengine.

murrdpirate avatar murrdpirate commented on May 16, 2024

Edit: I did just try setting export CUDA_VISIBLE_DEVICES=1 as you mentioned, and that does make it work. That's probably a good enough solution for me. See below for original post.

I seem to be having this issue as well. Using the default cuda device works without issues.

I'm running:

  • MinkowskiEngine 0.4.3
  • cuda 10.2
  • python 3.7.7
  • pytorch 1.5.0

I'm actually running FCGF, but I figured this was the best place to mention the issue.

Here's the output (basically same as @GaoQiyu ):

Traceback (most recent call last):
File "test.py", line 80, in
main(config)
File "test.py", line 55, in main
trainer._valid_epoch()
File "/home/km/workspaces/FCGF/lib/trainer.py", line 338, in _valid_epoch
F0 = self.model(sinput0).F
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/km/workspaces/FCGF/model/resunet.py", line 143, in forward
out_s1 = self.conv1(x)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/MinkowskiEngine/MinkowskiConvolution.py", line 281, in forward
input.coords_man)
File "/usr/local/lib/python3.7/dist-packages/MinkowskiEngine/MinkowskiConvolution.py", line 92, in forward
ctx.coords_man.CPPCoordsManager)
RuntimeError: an illegal memory access was encountered at src/convolution.cu:285

from minkowskiengine.

chrischoy avatar chrischoy commented on May 16, 2024

Seems like the default cuda calls for cuda malloc and the pytorch default GPU are using different GPU IDs.

from minkowskiengine.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.