RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! about minigpt-4 HOT 11 CLOSED

vision-cair commented on August 21, 2024

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

from minigpt-4.

Comments (11)

aamir-gmail commented on August 21, 2024 3

Hello, thanks for your interest in our work! I guess the issue comes from the 8-bit loading. As a temporal solution, you can go to this line and change the device_map from 'auto' to {'': THE_CUDA_ID_YOU_WANT}. I'll update the code later to fix this issue. Thanks!

Hey mate it's the community that should be saying a big Thank you, Great work, best of all open source, Awsesome

from minigpt-4.

TsuTikgiau commented on August 21, 2024 2

@Marcophono2 The device id should be an int. For example, if you want it to run on cuda:2, set it as {": 2}. I have updated the code now. Now you can simply run

python demo.py --cfg-path path/to/config  --gpu-device 2

to run the demo on the GPU 2

from minigpt-4.

Iceland-Leo commented on August 21, 2024 2

@TsuTikgiau There are still some other errors:

  File "/root/miniconda3/lib/python3.10/site-packages/sentencepiece/__init__.py", line 1038, in _func
    raise IndexError('piece id is out of range.')
IndexError: piece id is out of range.

@WangRongsheng How did you fix it?

from minigpt-4.

Marcophono2 commented on August 21, 2024

Same here! I have three gpus and want to let the model run on cuda:2. I could only find one location where I can set the cuda device. And that is in demo.py

from minigpt-4.

aamir-gmail commented on August 21, 2024

same here RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
one possible fix would be CUDA_VISIBLE_DEVICES = '1' , that way only one GPU is visible, however, the developers of this repo can provide a more permanent fix. In Linux you can set this variable as export CUDA_VISIBLE_DEVICES=1 , be fore you run the demo script

from minigpt-4.

TsuTikgiau commented on August 21, 2024

Hello, thanks for your interest in our work! I guess the issue comes from the 8-bit loading. As a temporal solution, you can go to this line and change the device_map from 'auto' to {'': THE_CUDA_ID_YOU_WANT}. I'll update the code later to fix this issue. Thanks!

from minigpt-4.

Marcophono2 commented on August 21, 2024

Great! Thank you very much, @TsuTikgiau ! I have all set up now and are looking forward to test it! :)

from minigpt-4.

WangRongsheng commented on August 21, 2024

@TsuTikgiau There are still some other errors:

  File "/root/miniconda3/lib/python3.10/site-packages/sentencepiece/__init__.py", line 1038, in _func
    raise IndexError('piece id is out of range.')
IndexError: piece id is out of range.

from minigpt-4.

Marcophono2 commented on August 21, 2024

@TsuTikgiau

I tried it now as you recommended. But then I get an error from transformers:

/home/marc/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/modeling_utils.py:27 │
│ 06 in from_pretrained                                                                            │
│                                                                                                  │
│   2703 │   │   │   │   raise ValueError(f"{model.__class__.__name__} does not support `device_m  │
│   2704 │   │   │   no_split_modules = model._no_split_modules                                    │
│   2705 │   │   │   if device_map not in ["auto", "balanced", "balanced_low_0", "sequential"]:    │
│ ❱ 2706 │   │   │   │   raise ValueError(                                                         │
│   2707 │   │   │   │   │   "If passing a string for `device_map`, please choose 'auto', 'balanc  │
│   2708 │   │   │   │   │   "'sequential'."                                                       │
│   2709 │   │   │   │   )

transformers ver 4.29.0.dev0

I don't think it's enough to disable that error catcher. 😶‍🌫️ (sorry, just wanted to be the very first human using that emoji 😄 )

from minigpt-4.

Marcophono2 commented on August 21, 2024

I tried it now verx exactly as you mentioned

device_map={'': 2}

It starts but when uploaded a picture, I again get the message

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

from minigpt-4.

Marcophono2 commented on August 21, 2024

You are the king, @TsuTikgiau !! So courteous! I really appreciate this!
But I couldn't wait so long, so that I had swapped the running models from cuda0 to cuda2 and used cuda0 already. :-) It is working without issues now! I must play around a bit today but some first impressions: It is really nice that your model has a build-in memory what BLIP2 does not (sure, forwarding the before happened conversation helps and you surely do the same, but in BLIP2 this is not build-in) It ssems that your model does not output more informations from an image than BLIP2, if I'm right?! But the combination with a LLM is really funny. I uploaded a picture from myself and told your model to write the text for a RAP song about me. Then it asked me, what my name is, then what my profession is. I thought your model lost its memory but then in the third step it started to build the text, including my name and profession build in it. :-)
I realized that the standard captioning ("What do you see on the image?", what is the default setting in BLIP2) takes much more time than "my" BLIP2. My BLIP2 needs 0.6 seconds on my 4090 for an output with a minimum of 25 token. On the other hand side I use the slimmer model opt.2.7. That was much better in output quality as the T5, even better as the T5 XXL and also than the opt-6.7. I just wanted to let you know my experience. As far as I remember what I saw in your code you are using the T5 XXL model.

from minigpt-4.

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! about minigpt-4 HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent