The llama_index_ray from amogkam

Error while loading the model from Huggingface with Ray Serve

@amogkam I was trying to implement ray serve to enhance the inference speed.But during implementation got the error about loading the model saying load_in_8_bit requires accelerate and bitsandbytes which I already have installed.

Below is the code:

serve_llm.py

from ray import serve
from starlette.requests import Request
import ray
from llama_index import (
StorageContext,
ServiceContext,
load_index_from_storage,
set_global_service_context

)
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index.embeddings import LangchainEmbedding
from initialize_query_engine import prepare_model

@serve.deployment
class DeployLLM:
def init(self):
llm=prepare_model()
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda"}
embeddings =LangchainEmbedding(
HuggingFaceEmbeddings(model_name=model_name)
)
service_context = ServiceContext.from_defaults(
chunk_size=1024,
llm=llm,
embed_model=embeddings
)
# # And set the service context
set_global_service_context(service_context)
storage_context = StorageContext.from_defaults(persist_dir="doc_index")
# Load index from the storage context
new_index = load_index_from_storage(storage_context)
self.query_engine = new_index.as_query_engine()

def run_index(self,prompt):
    return self.query_engine.query(query)

async def __call__(self,request:Request):
    
    query=request.query_params("text")
    response=self.run_index(query)
    return response["text"]

deployment = DeployLLM.bind()

Code to load model from initialize_query_engine.py:

def prepare_model():
compute_dtype=getattr(torch,"float16")

bnb_config=BitsAndBytesConfig(
  load_in_4bit=True,
  bnb_4bit_quant_type="nf4",
  bnb_4bit_compute_dtype=compute_dtype,
  bnb_4bit_use_double_quant=True,
)

model_name="meta-llama/Llama-2-7b-chat-hf"
system_prompt ="""<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as
helpfully as possible, while being safe. Your answers should not include
any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content.
Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain
why instead of answering something not correct. If you don't know the answer
to a question, please don't share false information.

<</SYS>>
"""
# Throw together the query wrapper
query_wrapper_prompt = SimpleInputPrompt("{query_str} [/INST]")

llm = HuggingFaceLLM(
  context_window=3800,
  max_new_tokens=800,
  generate_kwargs={
      "temperature": 0.5,
      # "return_full_text":True,
      "do_sample": True,
      "repetition_penalty":1.1,
      "top_p":0.7,
      "top_k":50,
      # "return_dict_in_generate":True,
      },
  query_wrapper_prompt=query_wrapper_prompt,
  tokenizer_name=model_name,
  model_name=model_name,
  device_map="auto",
  # change these settings below depending on your GPU
  model_kwargs={"quantization_config":bnb_config},
  tokenizer_outputs_to_remove=["token_type_ids"]
)

return llm

Error

2023-10-23 08:06:09,375 INFO scripts.py:471 -- Running import path: 'serve_llm:deployment'.
2023-10-23 08:06:17,019 INFO worker.py:1633 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
(ServeController pid=30685) INFO 2023-10-23 08:06:20,341 controller 30685 deployment_state.py:1390 - Deploying new version of deployment DeployLLM in application 'default'.
(HTTPProxyActor pid=30737) INFO 2023-10-23 08:06:20,296 http_proxy 172.16.76.141 http_proxy.py:1433 - Proxy actor d36cd5783a9a7e95dee6f00601000000 starting on node 186eb3c957b3138fddc979d5862985edc066388e74fedbfc3e77a83e.
(HTTPProxyActor pid=30737) INFO 2023-10-23 08:06:20,303 http_proxy 172.16.76.141 http_proxy.py:1617 - Starting HTTP server on node: 186eb3c957b3138fddc979d5862985edc066388e74fedbfc3e77a83e listening on port 8000
(HTTPProxyActor pid=30737) INFO: Started server process [30737]
(ServeController pid=30685) INFO 2023-10-23 08:06:20,444 controller 30685 deployment_state.py:1679 - Adding 1 replica to deployment DeployLLM in application 'default'.
(ServeController pid=30685) ERROR 2023-10-23 08:06:27,590 controller 30685 deployment_state.py:617 - Exception in replica 'default#DeployLLM#Ekjjks', the replica will be stopped.
(ServeController pid=30685) Traceback (most recent call last):
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/deployment_state.py", line 615, in check_ready
(ServeController pid=30685) _, self._version = ray.get(self._ready_obj_ref)
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
(ServeController pid=30685) return fn(*args, **kwargs)
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
(ServeController pid=30685) return func(*args, **kwargs)
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/worker.py", line 2547, in get
(ServeController pid=30685) raise value.as_instanceof_cause()
(ServeController pid=30685) ray.exceptions.RayTaskError(RuntimeError): ray::ServeReplica:default:DeployLLM.initialize_and_get_metadata() (pid=30776, ip=172.16.76.141, actor_id=2f688bdb2fbeec4f34b4760a01000000, repr=<ray.serve._private.replica.ServeReplica:default:DeployLLM object at 0x7f5f47f1a7a0>)
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py", line 451, in result
(ServeController pid=30685) return self.__get_result()
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
(ServeController pid=30685) raise self._exception
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 442, in initialize_and_get_metadata
(ServeController pid=30685) raise RuntimeError(traceback.format_exc()) from None
(ServeController pid=30685) RuntimeError: Traceback (most recent call last):
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 430, in initialize_and_get_metadata
(ServeController pid=30685) await self._initialize_replica()
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 190, in initialize_replica
(ServeController pid=30685) await sync_to_async(_callable.init)(*init_args, **init_kwargs)
(ServeController pid=30685) File "/home/ec2-user/SageMaker/multi_source_chatbot/./serve_llm.py", line 19, in init
(ServeController pid=30685) llm=prepare_model()
(ServeController pid=30685) File "/home/ec2-user/SageMaker/multi_source_chatbot/initialize_query_engine.py", line 62, in prepare_model
(ServeController pid=30685) llm = HuggingFaceLLM(
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/llama_index/llms/huggingface.py", line 131, in init
(ServeController pid=30685) self._model = model or AutoModelForCausalLM.from_pretrained(
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 565, in from_pretrained
(ServeController pid=30685) return model_class.from_pretrained(
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2634, in from_pretrained
(ServeController pid=30685) raise ImportError(
(ServeController pid=30685) ImportError: Using load_in_8bit=True requires Accelerate: pip install accelerate and the latest version of bitsandbytes pip install -i https://test.pypi.org/simple/ bitsandbytes or pip install bitsandbytes(ServeController pid=30685) INFO 2023-10-23 08:06:27,696 controller 30685 deployment_state.py:2027 - Replica default#DeployLLM#Ekjjks is stopped. (ServeController pid=30685) INFO 2023-10-23 08:06:27,697 controller 30685 deployment_state.py:1679 - Adding 1 replica to deployment DeployLLM in application 'default'. (ServeController pid=30685) ERROR 2023-10-23 08:06:34,719 controller 30685 deployment_state.py:617 - Exception in replica 'default#DeployLLM#WPbqiH', the replica will be stopped. (ServeController pid=30685) Traceback (most recent call last): (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/deployment_state.py", line 615, in check_ready (ServeController pid=30685) _, self._version = ray.get(self._ready_obj_ref) (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper (ServeController pid=30685) return fn(*args, **kwargs) (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper (ServeController pid=30685) return func(*args, **kwargs) (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/worker.py", line 2547, in get (ServeController pid=30685) raise value.as_instanceof_cause() (ServeController pid=30685) ray.exceptions.RayTaskError(RuntimeError): ray::ServeReplica:default:DeployLLM.initialize_and_get_metadata() (pid=30913, ip=172.16.76.141, actor_id=e8bfd47caf158369c340f6d201000000, repr=<ray.serve._private.replica.ServeReplica:default:DeployLLM object at 0x7f8e642ae770>) (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py", line 451, in result (ServeController pid=30685) return self.__get_result() (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result (ServeController pid=30685) raise self._exception (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 442, in initialize_and_get_metadata (ServeController pid=30685) raise RuntimeError(traceback.format_exc()) from None (ServeController pid=30685) RuntimeError: Traceback (most recent call last): (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 430, in initialize_and_get_metadata (ServeController pid=30685) await self._initialize_replica() (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 190, in initialize_replica (ServeController pid=30685) await sync_to_async(_callable.__init__)(*init_args, **init_kwargs) (ServeController pid=30685) File "/home/ec2-user/SageMaker/multi_source_chatbot/./serve_llm.py", line 19, in __init__ (ServeController pid=30685) llm=prepare_model() (ServeController pid=30685) File "/home/ec2-user/SageMaker/multi_source_chatbot/initialize_query_engine.py", line 62, in prepare_model (ServeController pid=30685) llm = HuggingFaceLLM( (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/llama_index/llms/huggingface.py", line 131, in __init__ (ServeController pid=30685) self._model = model or AutoModelForCausalLM.from_pretrained( (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 565, in from_pretrained (ServeController pid=30685) return model_class.from_pretrained( (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2634, in from_pretrained (ServeController pid=30685) raise ImportError( (ServeController pid=30685) ImportError: Usingload_in_8bit=Truerequires Accelerate:pip install accelerateand the latest version of bitsandbytespip install -i https://test.pypi.org/simple/ bitsandbytes or pip install bitsandbytes
(ServeController pid=30685) INFO 2023-10-23 08:06:34,823 controller 30685 deployment_state.py:2027 - Replica default#DeployLLM#WPbqiH is stopped.
(ServeController pid=30685) INFO 2023-10-23 08:06:34,823 controller 30685 deployment_state.py:1679 - Adding 1 replica to deployment DeployLLM in application 'default'.
(ServeController pid=30685) ERROR 2023-10-23 08:06:41,951 controller 30685 deployment_state.py:617 - Exception in replica 'default#DeployLLM#rLAxJo', the replica will be stopped.
(ServeController pid=30685) Traceback (most recent call last):
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/deployment_state.py", line 615, in check_ready
(ServeController pid=30685) _, self._version = ray.get(self._ready_obj_ref)
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
(ServeController pid=30685) return fn(*args, **kwargs)
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
(ServeController pid=30685) return func(*args, **kwargs)
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/worker.py", line 2547, in get
(ServeController pid=30685) raise value.as_instanceof_cause()
(ServeController pid=30685) ray.exceptions.RayTaskError(RuntimeError): ray::ServeReplica:default:DeployLLM.initialize_and_get_metadata() (pid=31032, ip=172.16.76.141, actor_id=7ae516023df7b096d885a12b01000000, repr=<ray.serve._private.replica.ServeReplica:default:DeployLLM object at 0x7fc14717e740>)
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py", line 451, in result
(ServeController pid=30685) return self.__get_result()
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
(ServeController pid=30685) raise self._exception
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 442, in initialize_and_get_metadata
(ServeController pid=30685) raise RuntimeError(traceback.format_exc()) from None
(ServeController pid=30685) RuntimeError: Traceback (most recent call last):
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 430, in initialize_and_get_metadata
(ServeController pid=30685) await self._initialize_replica()
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 190, in initialize_replica
(ServeController pid=30685) await sync_to_async(_callable.init)(*init_args, **init_kwargs)
(ServeController pid=30685) File "/home/ec2-user/SageMaker/multi_source_chatbot/./serve_llm.py", line 19, in init
(ServeController pid=30685) llm=prepare_model()
(ServeController pid=30685) File "/home/ec2-user/SageMaker/multi_source_chatbot/initialize_query_engine.py", line 62, in prepare_model
(ServeController pid=30685) llm = HuggingFaceLLM(
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/llama_index/llms/huggingface.py", line 131, in init
(ServeController pid=30685) self._model = model or AutoModelForCausalLM.from_pretrained(
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 565, in from_pretrained
(ServeController pid=30685) return model_class.from_pretrained(
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2634, in from_pretrained
(ServeController pid=30685) raise ImportError(
(ServeController pid=30685) ImportError: Using load_in_8bit=True requires Accelerate: pip install accelerate and the latest version of bitsandbytes pip install -i https://test.pypi.org/simple/ bitsandbytes or pip install bitsandbytes`
(ServeController pid=30685) WARNING 2023-10-23 08:06:41,953 controller 30685 application_state.py:663 - The deployments ['DeployLLM'] are UNHEALTHY.
(ServeController pid=30685) INFO 2023-10-23 08:06:42,055 controller 30685 deployment_state.py:2027 - Replica default#DeployLLM#rLAxJo is stopped.
(ServeController pid=30685) INFO 2023-10-23 08:06:42,055 controller 30685 deployment_state.py:1679 - Adding 1 replica to deployment DeployLLM in application 'default'.
(ServeController pid=30685) WARNING 2023-10-23 08:06:42,058 controller 30685 application_state.py:663 - The deployments ['DeployLLM'] are UNHEALTHY.
(ServeController pid=30685) WARNING 2023-10-23 08:06:42,160 controller 30685 application_state.py:663 - The deployments ['DeployLLM'] are UNHEALTHY.
(ServeController pid=30685) WARNING 2023-10-23 08:06:42,263 controller 30685 application_state.py:663 - The deployments ['DeployLLM'] are UNHEALTHY.
Traceback (most recent call last):
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/scripts.py", line 518, in run
handle = serve.run(app, host=host, port=port)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/api.py", line 574, in run
client.deploy_application(
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/client.py", line 47, in check
return f(self, *args, **kwargs)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/client.py", line 330, in deploy_application
self._wait_for_application_running(name)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/client.py", line 255, in _wait_for_application_running
raise RuntimeError(
RuntimeError: Deploying application default failed: The deployments ['DeployLLM'] are UNHEALTHY.
2023-10-23 08:06:42,406 ERR scripts.py:564 -- Received unexpected error, see console logs for more details. Shutting down...
(ServeController pid=30685) WARNING 2023-10-23 08:06:42,365 controller 30685 application_state.py:663 - The deployments ['DeployLLM'] are UNHEALTHY.
(ServeController pid=30685) INFO 2023-10-23 08:06:42,473 controller 30685 deployment_state.py:1707 - Removing 1 replica from deployment 'DeployLLM' in application 'default'.
(ServeController pid=30685) INFO 2023-10-23 08:06:48,914 controller 30685 deployment_state.py:2027 - Replica default#DeployLLM#xBCgBJ is stopped.

Can you help me with this ?

amogkam / llama_index_ray Goto Github PK

llama_index_ray's People

Contributors

Stargazers

Watchers

Forkers

llama_index_ray's Issues

Error while loading the model from Huggingface with Ray Serve

Hardware Issue

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent