Coder Social home page Coder Social logo

Comments (17)

thcheung avatar thcheung commented on August 21, 2024 7

Set the parameter device_map='auto' when load the LlamaForCausalLM.from_pretrained()
Replace the line in demo.py as: chat = Chat(model, vis_processor, device='cuda')
It can run on two RTX 2080Ti in my computer.

It seems the model is implemented in two devices. But when doing the inference, the tensor flowed in two deivces and it will throw the two devices error. RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

(1) Load the LLaMA with device map to 'auto':

device_map={'': device_8bit}

device_map = 'auto'

(2) Modify the line below from 'cuda:{}'.format(args.gpu_id)' to 'cuda', It will automatically assign to device0 or device1 if you have two devices:

chat = Chat(model, vis_processor, device='cuda:{}'.format(args.gpu_id))

chat = Chat(model, vis_processor, device='cuda' )

(3) The "to device" can be removed from the line below because llama has been loaded to GPUs automatically:

model = model_cls.from_config(model_config).to('cuda:{}'.format(args.gpu_id))

model = model_cls.from_config(model_config)

(4) When encode the image, we may encode the image with CPU and assign the image embedding to GPU

image_emb, _ = self.model.encode_img(image)

img_list.append(image_emb)

image_emb, _ = self.model.encode_img(image.to('cpu'))
img_list.append(image_emb.to('cuda'))

The model should now work if you have multiple GPUs with low memory space.

image

from minigpt-4.

JainitBITW avatar JainitBITW commented on August 21, 2024 1

Yes i just restarted my cuda.

from minigpt-4.

JainitBITW avatar JainitBITW commented on August 21, 2024 1

Nope exactly same

from minigpt-4.

JainitBITW avatar JainitBITW commented on August 21, 2024 1

I think you van go ahead

from minigpt-4.

CyberTimon avatar CyberTimon commented on August 21, 2024

I also would like to know how to do this?
I have 2x3060 12gb so I could load the 13b model but it doesn't seem to be implemented

from minigpt-4.

taomanwai avatar taomanwai commented on August 21, 2024

I have same request.

from minigpt-4.

wJc-cn avatar wJc-cn commented on August 21, 2024

I have same request too.

from minigpt-4.

thcheung avatar thcheung commented on August 21, 2024
  1. Set the parameter device_map='auto' when load the LlamaForCausalLM.from_pretrained()

  2. Replace the line in demo.py as: chat = Chat(model, vis_processor, device='cuda')

It can run on two RTX 2080Ti in my computer.

from minigpt-4.

sinsauzero avatar sinsauzero commented on August 21, 2024

Set the parameter device_map='auto' when load the LlamaForCausalLM.from_pretrained()
Replace the line in demo.py as: chat = Chat(model, vis_processor, device='cuda')
It can run on two RTX 2080Ti in my computer.

It seems the model is implemented in two devices. But when doing the inference, the tensor flowed in two deivces and it will throw the two devices error.
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

from minigpt-4.

JainitBITW avatar JainitBITW commented on August 21, 2024

Traceback (most recent call last):
File "/home2/jainit/MiniGPT-4/demo.py", line 61, in
model = model_cls.from_config(model_config)
File "/home2/jainit/MiniGPT-4/minigpt4/models/mini_gpt4.py", line 243, in from_config
model = cls(
File "/home2/jainit/MiniGPT-4/minigpt4/models/mini_gpt4.py", line 90, in init
self.llama_model = LlamaForCausalLM.from_pretrained(
File "/home2/jainit/torchy/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2722, in from_pretrained
max_memory = get_balanced_memory(
File "/home2/jainit/torchy/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 731, in get_balanced_memory
max_memory = get_max_memory(max_memory)
File "/home2/jainit/torchy/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 624, in get_max_memory
_ = torch.tensor([0], device=i)
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

I did all of these steps but i still get

from minigpt-4.

sushilkhadkaanon avatar sushilkhadkaanon commented on August 21, 2024

@JainitBITW Is it working now for you?

from minigpt-4.

sushilkhadkaanon avatar sushilkhadkaanon commented on August 21, 2024

@JainitBITW Did you do anything apart from @thcheung 's instruction?
Thanks anyway!

from minigpt-4.

JainitBITW avatar JainitBITW commented on August 21, 2024

What error you are getting

from minigpt-4.

sushilkhadkaanon avatar sushilkhadkaanon commented on August 21, 2024

I'm trying to run the 13 B model on multiple GPUs. The author has written they currently don't support multi-GPU inference. So , I want to be sure that it's possible to do inference on multiple GPUs before provisioning the ec2 instance.

from minigpt-4.

sushilkhadkaanon avatar sushilkhadkaanon commented on August 21, 2024

@JainitBITW @thcheung thanks it worked for me (8 bit). Have any idea how to do it for 16 bit (low resource = False) ?
It is throwing this error:
RuntimeError: Input type (float) and bias type (c10::Half) should be the same

from minigpt-4.

daniellandau avatar daniellandau commented on August 21, 2024

RuntimeError: Input type (float) and bias type (c10::Half) should be the same

I got through this error by setting vit_precision: "fp32" in minigpt_v2.yaml, but I didn't figure out what would need to be done to get the new input to also be fp16 (half precision) instead of making everything fp32.

from minigpt-4.

uiyo avatar uiyo commented on August 21, 2024

My solution is:
CUDA_VISIBLE_DEVICES=1 python demo_v2.py --cfg-path eval_configs/minigptv2_eval.yaml --gpu-id 0

from minigpt-4.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.