I have two 4090 24GB, if possible please provide an extra argument to demo.py to eithe

Loading the model on multiple GPUs,about vision-cair/minigpt-4

thcheung commented on August 21, 2024 7

Set the parameter device_map='auto' when load the LlamaForCausalLM.from_pretrained()
Replace the line in demo.py as: chat = Chat(model, vis_processor, device='cuda')
It can run on two RTX 2080Ti in my computer.

It seems the model is implemented in two devices. But when doing the inference, the tensor flowed in two deivces and it will throw the two devices error. RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

(1) Load the LLaMA with device map to 'auto':

MiniGPT-4/minigpt4/models/mini_gpt4.py

Line 94 in 22d8888

device_map={'': device_8bit}

device_map = 'auto'

(2) Modify the line below from 'cuda:{}'.format(args.gpu_id)' to 'cuda', It will automatically assign to device0 or device1 if you have two devices:

MiniGPT-4/demo.py

Line 64 in 22d8888

chat = Chat(model, vis_processor, device='cuda:{}'.format(args.gpu_id))

chat = Chat(model, vis_processor, device='cuda' )

(3) The "to device" can be removed from the line below because llama has been loaded to GPUs automatically:

MiniGPT-4/demo.py

Line 60 in 22d8888

model = model_cls.from_config(model_config).to('cuda:{}'.format(args.gpu_id))

model = model_cls.from_config(model_config)

(4) When encode the image, we may encode the image with CPU and assign the image embedding to GPU

MiniGPT-4/minigpt4/conversation/conversation.py

Line 185 in 22d8888

image_emb, _ = self.model.encode_img(image)

MiniGPT-4/minigpt4/conversation/conversation.py

Line 186 in 22d8888

img_list.append(image_emb)

image_emb, _ = self.model.encode_img(image.to('cpu'))
img_list.append(image_emb.to('cuda'))

The model should now work if you have multiple GPUs with low memory space.

from minigpt-4.

JainitBITW commented on August 21, 2024 1

Yes i just restarted my cuda.

from minigpt-4.

JainitBITW commented on August 21, 2024 1

Nope exactly same

from minigpt-4.

JainitBITW commented on August 21, 2024 1

I think you van go ahead

from minigpt-4.

CyberTimon commented on August 21, 2024

I also would like to know how to do this?
I have 2x3060 12gb so I could load the 13b model but it doesn't seem to be implemented

from minigpt-4.

taomanwai commented on August 21, 2024

I have same request.

from minigpt-4.

wJc-cn commented on August 21, 2024

I have same request too.

from minigpt-4.

thcheung commented on August 21, 2024

Set the parameter device_map='auto' when load the LlamaForCausalLM.from_pretrained()
Replace the line in demo.py as: chat = Chat(model, vis_processor, device='cuda')

It can run on two RTX 2080Ti in my computer.

from minigpt-4.

sinsauzero commented on August 21, 2024

Set the parameter device_map='auto' when load the LlamaForCausalLM.from_pretrained()
Replace the line in demo.py as: chat = Chat(model, vis_processor, device='cuda')
It can run on two RTX 2080Ti in my computer.

It seems the model is implemented in two devices. But when doing the inference, the tensor flowed in two deivces and it will throw the two devices error.
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

from minigpt-4.

JainitBITW commented on August 21, 2024

Traceback (most recent call last):
File "/home2/jainit/MiniGPT-4/demo.py", line 61, in
model = model_cls.from_config(model_config)
File "/home2/jainit/MiniGPT-4/minigpt4/models/mini_gpt4.py", line 243, in from_config
model = cls(
File "/home2/jainit/MiniGPT-4/minigpt4/models/mini_gpt4.py", line 90, in init
self.llama_model = LlamaForCausalLM.from_pretrained(
File "/home2/jainit/torchy/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2722, in from_pretrained
max_memory = get_balanced_memory(
File "/home2/jainit/torchy/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 731, in get_balanced_memory
max_memory = get_max_memory(max_memory)
File "/home2/jainit/torchy/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 624, in get_max_memory
_ = torch.tensor([0], device=i)
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

I did all of these steps but i still get

from minigpt-4.

sushilkhadkaanon commented on August 21, 2024

@JainitBITW Is it working now for you?

from minigpt-4.

sushilkhadkaanon commented on August 21, 2024

@JainitBITW Did you do anything apart from @thcheung 's instruction?
Thanks anyway!

from minigpt-4.

JainitBITW commented on August 21, 2024

What error you are getting

from minigpt-4.

sushilkhadkaanon commented on August 21, 2024

I'm trying to run the 13 B model on multiple GPUs. The author has written they currently don't support multi-GPU inference. So , I want to be sure that it's possible to do inference on multiple GPUs before provisioning the ec2 instance.

from minigpt-4.

sushilkhadkaanon commented on August 21, 2024

@JainitBITW @thcheung thanks it worked for me (8 bit). Have any idea how to do it for 16 bit (low resource = False) ?
It is throwing this error:
RuntimeError: Input type (float) and bias type (c10::Half) should be the same

from minigpt-4.

daniellandau commented on August 21, 2024

RuntimeError: Input type (float) and bias type (c10::Half) should be the same

I got through this error by setting vit_precision: "fp32" in minigpt_v2.yaml, but I didn't figure out what would need to be done to get the new input to also be fp16 (half precision) instead of making everything fp32.

from minigpt-4.

uiyo commented on August 21, 2024

My solution is:
CUDA_VISIBLE_DEVICES=1 python demo_v2.py --cfg-path eval_configs/minigptv2_eval.yaml --gpu-id 0

from minigpt-4.

Loading the model on multiple GPUs about minigpt-4 HOT 17 OPEN

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent