Coder Social home page Coder Social logo

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! about minigpt-4 HOT 11 CLOSED

vision-cair avatar vision-cair commented on August 21, 2024
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

from minigpt-4.

Comments (11)

aamir-gmail avatar aamir-gmail commented on August 21, 2024 3

Hello, thanks for your interest in our work! I guess the issue comes from the 8-bit loading. As a temporal solution, you can go to this line and change the device_map from 'auto' to {'': THE_CUDA_ID_YOU_WANT}. I'll update the code later to fix this issue. Thanks!

Hey mate it's the community that should be saying a big Thank you, Great work, best of all open source, Awsesome

from minigpt-4.

TsuTikgiau avatar TsuTikgiau commented on August 21, 2024 2

@Marcophono2 The device id should be an int. For example, if you want it to run on cuda:2, set it as {": 2}. I have updated the code now. Now you can simply run

python demo.py --cfg-path path/to/config  --gpu-device 2 

to run the demo on the GPU 2

from minigpt-4.

Iceland-Leo avatar Iceland-Leo commented on August 21, 2024 2

@TsuTikgiau There are still some other errors:

  File "/root/miniconda3/lib/python3.10/site-packages/sentencepiece/__init__.py", line 1038, in _func
    raise IndexError('piece id is out of range.')
IndexError: piece id is out of range.

@WangRongsheng How did you fix it?

from minigpt-4.

Marcophono2 avatar Marcophono2 commented on August 21, 2024

Same here! I have three gpus and want to let the model run on cuda:2. I could only find one location where I can set the cuda device. And that is in demo.py

from minigpt-4.

aamir-gmail avatar aamir-gmail commented on August 21, 2024

same here RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
one possible fix would be CUDA_VISIBLE_DEVICES = '1' , that way only one GPU is visible, however, the developers of this repo can provide a more permanent fix. In Linux you can set this variable as export CUDA_VISIBLE_DEVICES=1 , be fore you run the demo script

from minigpt-4.

TsuTikgiau avatar TsuTikgiau commented on August 21, 2024

Hello, thanks for your interest in our work! I guess the issue comes from the 8-bit loading. As a temporal solution, you can go to this line and change the device_map from 'auto' to {'': THE_CUDA_ID_YOU_WANT}. I'll update the code later to fix this issue. Thanks!

from minigpt-4.

Marcophono2 avatar Marcophono2 commented on August 21, 2024

Great! Thank you very much, @TsuTikgiau ! I have all set up now and are looking forward to test it! :)

from minigpt-4.

WangRongsheng avatar WangRongsheng commented on August 21, 2024

@TsuTikgiau There are still some other errors:

  File "/root/miniconda3/lib/python3.10/site-packages/sentencepiece/__init__.py", line 1038, in _func
    raise IndexError('piece id is out of range.')
IndexError: piece id is out of range.

from minigpt-4.

Marcophono2 avatar Marcophono2 commented on August 21, 2024

@TsuTikgiau

I tried it now as you recommended. But then I get an error from transformers:

/home/marc/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/modeling_utils.py:27 │
│ 06 in from_pretrained                                                                            │
│                                                                                                  │
│   2703 │   │   │   │   raise ValueError(f"{model.__class__.__name__} does not support `device_m  │
│   2704 │   │   │   no_split_modules = model._no_split_modules                                    │
│   2705 │   │   │   if device_map not in ["auto", "balanced", "balanced_low_0", "sequential"]:    │
│ ❱ 2706 │   │   │   │   raise ValueError(                                                         │
│   2707 │   │   │   │   │   "If passing a string for `device_map`, please choose 'auto', 'balanc  │
│   2708 │   │   │   │   │   "'sequential'."                                                       │
│   2709 │   │   │   │   )    

transformers ver 4.29.0.dev0

I don't think it's enough to disable that error catcher. 😶‍🌫️ (sorry, just wanted to be the very first human using that emoji 😄 )

from minigpt-4.

Marcophono2 avatar Marcophono2 commented on August 21, 2024

I tried it now verx exactly as you mentioned

device_map={'': 2}

It starts but when uploaded a picture, I again get the message

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

from minigpt-4.

Marcophono2 avatar Marcophono2 commented on August 21, 2024

You are the king, @TsuTikgiau !! So courteous! I really appreciate this!
But I couldn't wait so long, so that I had swapped the running models from cuda0 to cuda2 and used cuda0 already. :-) It is working without issues now! I must play around a bit today but some first impressions: It is really nice that your model has a build-in memory what BLIP2 does not (sure, forwarding the before happened conversation helps and you surely do the same, but in BLIP2 this is not build-in) It ssems that your model does not output more informations from an image than BLIP2, if I'm right?! But the combination with a LLM is really funny. I uploaded a picture from myself and told your model to write the text for a RAP song about me. Then it asked me, what my name is, then what my profession is. I thought your model lost its memory but then in the third step it started to build the text, including my name and profession build in it. :-)
I realized that the standard captioning ("What do you see on the image?", what is the default setting in BLIP2) takes much more time than "my" BLIP2. My BLIP2 needs 0.6 seconds on my 4090 for an output with a minimum of 25 token. On the other hand side I use the slimmer model opt.2.7. That was much better in output quality as the T5, even better as the T5 XXL and also than the opt-6.7. I just wanted to let you know my experience. As far as I remember what I saw in your code you are using the T5 XXL model.

from minigpt-4.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.