amogkam / llama_index_ray Goto Github PK
View Code? Open in Web Editor NEWUsing LlamaIndex with Ray for productionizing LLM applications
Using LlamaIndex with Ray for productionizing LLM applications
@amogkam I was trying to implement ray serve to enhance the inference speed.But during implementation got the error about loading the model saying load_in_8_bit requires accelerate and bitsandbytes which I already have installed.
Below is the code:
serve_llm.py
from ray import serve
from starlette.requests import Request
import ray
from llama_index import (
StorageContext,
ServiceContext,
load_index_from_storage,
set_global_service_context
)
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index.embeddings import LangchainEmbedding
from initialize_query_engine import prepare_model
@serve.deployment
class DeployLLM:
def init(self):
llm=prepare_model()
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda"}
embeddings =LangchainEmbedding(
HuggingFaceEmbeddings(model_name=model_name)
)
service_context = ServiceContext.from_defaults(
chunk_size=1024,
llm=llm,
embed_model=embeddings
)
# # And set the service context
set_global_service_context(service_context)
storage_context = StorageContext.from_defaults(persist_dir="doc_index")
# Load index from the storage context
new_index = load_index_from_storage(storage_context)
self.query_engine = new_index.as_query_engine()
def run_index(self,prompt):
return self.query_engine.query(query)
async def __call__(self,request:Request):
query=request.query_params("text")
response=self.run_index(query)
return response["text"]
deployment = DeployLLM.bind()
Code to load model from initialize_query_engine.py:
def prepare_model():
compute_dtype=getattr(torch,"float16")
bnb_config=BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=compute_dtype,
bnb_4bit_use_double_quant=True,
)
model_name="meta-llama/Llama-2-7b-chat-hf"
system_prompt ="""<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as
helpfully as possible, while being safe. Your answers should not include
any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content.
Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain
why instead of answering something not correct. If you don't know the answer
to a question, please don't share false information.
<</SYS>>
"""
# Throw together the query wrapper
query_wrapper_prompt = SimpleInputPrompt("{query_str} [/INST]")
llm = HuggingFaceLLM(
context_window=3800,
max_new_tokens=800,
generate_kwargs={
"temperature": 0.5,
# "return_full_text":True,
"do_sample": True,
"repetition_penalty":1.1,
"top_p":0.7,
"top_k":50,
# "return_dict_in_generate":True,
},
query_wrapper_prompt=query_wrapper_prompt,
tokenizer_name=model_name,
model_name=model_name,
device_map="auto",
# change these settings below depending on your GPU
model_kwargs={"quantization_config":bnb_config},
tokenizer_outputs_to_remove=["token_type_ids"]
)
return llm
Error
2023-10-23 08:06:09,375 INFO scripts.py:471 -- Running import path: 'serve_llm:deployment'.
2023-10-23 08:06:17,019 INFO worker.py:1633 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
(ServeController pid=30685) INFO 2023-10-23 08:06:20,341 controller 30685 deployment_state.py:1390 - Deploying new version of deployment DeployLLM in application 'default'.
(HTTPProxyActor pid=30737) INFO 2023-10-23 08:06:20,296 http_proxy 172.16.76.141 http_proxy.py:1433 - Proxy actor d36cd5783a9a7e95dee6f00601000000 starting on node 186eb3c957b3138fddc979d5862985edc066388e74fedbfc3e77a83e.
(HTTPProxyActor pid=30737) INFO 2023-10-23 08:06:20,303 http_proxy 172.16.76.141 http_proxy.py:1617 - Starting HTTP server on node: 186eb3c957b3138fddc979d5862985edc066388e74fedbfc3e77a83e listening on port 8000
(HTTPProxyActor pid=30737) INFO: Started server process [30737]
(ServeController pid=30685) INFO 2023-10-23 08:06:20,444 controller 30685 deployment_state.py:1679 - Adding 1 replica to deployment DeployLLM in application 'default'.
(ServeController pid=30685) ERROR 2023-10-23 08:06:27,590 controller 30685 deployment_state.py:617 - Exception in replica 'default#DeployLLM#Ekjjks', the replica will be stopped.
(ServeController pid=30685) Traceback (most recent call last):
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/deployment_state.py", line 615, in check_ready
(ServeController pid=30685) _, self._version = ray.get(self._ready_obj_ref)
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
(ServeController pid=30685) return fn(*args, **kwargs)
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
(ServeController pid=30685) return func(*args, **kwargs)
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/worker.py", line 2547, in get
(ServeController pid=30685) raise value.as_instanceof_cause()
(ServeController pid=30685) ray.exceptions.RayTaskError(RuntimeError): ray::ServeReplica:default:DeployLLM.initialize_and_get_metadata() (pid=30776, ip=172.16.76.141, actor_id=2f688bdb2fbeec4f34b4760a01000000, repr=<ray.serve._private.replica.ServeReplica:default:DeployLLM object at 0x7f5f47f1a7a0>)
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py", line 451, in result
(ServeController pid=30685) return self.__get_result()
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
(ServeController pid=30685) raise self._exception
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 442, in initialize_and_get_metadata
(ServeController pid=30685) raise RuntimeError(traceback.format_exc()) from None
(ServeController pid=30685) RuntimeError: Traceback (most recent call last):
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 430, in initialize_and_get_metadata
(ServeController pid=30685) await self._initialize_replica()
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 190, in initialize_replica
(ServeController pid=30685) await sync_to_async(_callable.init)(*init_args, **init_kwargs)
(ServeController pid=30685) File "/home/ec2-user/SageMaker/multi_source_chatbot/./serve_llm.py", line 19, in init
(ServeController pid=30685) llm=prepare_model()
(ServeController pid=30685) File "/home/ec2-user/SageMaker/multi_source_chatbot/initialize_query_engine.py", line 62, in prepare_model
(ServeController pid=30685) llm = HuggingFaceLLM(
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/llama_index/llms/huggingface.py", line 131, in init
(ServeController pid=30685) self._model = model or AutoModelForCausalLM.from_pretrained(
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 565, in from_pretrained
(ServeController pid=30685) return model_class.from_pretrained(
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2634, in from_pretrained
(ServeController pid=30685) raise ImportError(
(ServeController pid=30685) ImportError: Using load_in_8bit=True
requires Accelerate: pip install accelerate
and the latest version of bitsandbytes pip install -i https://test.pypi.org/simple/ bitsandbytes
or pip install bitsandbytes(ServeController pid=30685) INFO 2023-10-23 08:06:27,696 controller 30685 deployment_state.py:2027 - Replica default#DeployLLM#Ekjjks is stopped. (ServeController pid=30685) INFO 2023-10-23 08:06:27,697 controller 30685 deployment_state.py:1679 - Adding 1 replica to deployment DeployLLM in application 'default'. (ServeController pid=30685) ERROR 2023-10-23 08:06:34,719 controller 30685 deployment_state.py:617 - Exception in replica 'default#DeployLLM#WPbqiH', the replica will be stopped. (ServeController pid=30685) Traceback (most recent call last): (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/deployment_state.py", line 615, in check_ready (ServeController pid=30685) _, self._version = ray.get(self._ready_obj_ref) (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper (ServeController pid=30685) return fn(*args, **kwargs) (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper (ServeController pid=30685) return func(*args, **kwargs) (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/worker.py", line 2547, in get (ServeController pid=30685) raise value.as_instanceof_cause() (ServeController pid=30685) ray.exceptions.RayTaskError(RuntimeError): ray::ServeReplica:default:DeployLLM.initialize_and_get_metadata() (pid=30913, ip=172.16.76.141, actor_id=e8bfd47caf158369c340f6d201000000, repr=<ray.serve._private.replica.ServeReplica:default:DeployLLM object at 0x7f8e642ae770>) (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py", line 451, in result (ServeController pid=30685) return self.__get_result() (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result (ServeController pid=30685) raise self._exception (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 442, in initialize_and_get_metadata (ServeController pid=30685) raise RuntimeError(traceback.format_exc()) from None (ServeController pid=30685) RuntimeError: Traceback (most recent call last): (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 430, in initialize_and_get_metadata (ServeController pid=30685) await self._initialize_replica() (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 190, in initialize_replica (ServeController pid=30685) await sync_to_async(_callable.__init__)(*init_args, **init_kwargs) (ServeController pid=30685) File "/home/ec2-user/SageMaker/multi_source_chatbot/./serve_llm.py", line 19, in __init__ (ServeController pid=30685) llm=prepare_model() (ServeController pid=30685) File "/home/ec2-user/SageMaker/multi_source_chatbot/initialize_query_engine.py", line 62, in prepare_model (ServeController pid=30685) llm = HuggingFaceLLM( (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/llama_index/llms/huggingface.py", line 131, in __init__ (ServeController pid=30685) self._model = model or AutoModelForCausalLM.from_pretrained( (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 565, in from_pretrained (ServeController pid=30685) return model_class.from_pretrained( (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2634, in from_pretrained (ServeController pid=30685) raise ImportError( (ServeController pid=30685) ImportError: Using
load_in_8bit=Truerequires Accelerate:
pip install accelerateand the latest version of bitsandbytes
pip install -i https://test.pypi.org/simple/ bitsandbytes or pip install bitsandbytes
(ServeController pid=30685) INFO 2023-10-23 08:06:34,823 controller 30685 deployment_state.py:2027 - Replica default#DeployLLM#WPbqiH is stopped.
(ServeController pid=30685) INFO 2023-10-23 08:06:34,823 controller 30685 deployment_state.py:1679 - Adding 1 replica to deployment DeployLLM in application 'default'.
(ServeController pid=30685) ERROR 2023-10-23 08:06:41,951 controller 30685 deployment_state.py:617 - Exception in replica 'default#DeployLLM#rLAxJo', the replica will be stopped.
(ServeController pid=30685) Traceback (most recent call last):
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/deployment_state.py", line 615, in check_ready
(ServeController pid=30685) _, self._version = ray.get(self._ready_obj_ref)
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
(ServeController pid=30685) return fn(*args, **kwargs)
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
(ServeController pid=30685) return func(*args, **kwargs)
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/worker.py", line 2547, in get
(ServeController pid=30685) raise value.as_instanceof_cause()
(ServeController pid=30685) ray.exceptions.RayTaskError(RuntimeError): ray::ServeReplica:default:DeployLLM.initialize_and_get_metadata() (pid=31032, ip=172.16.76.141, actor_id=7ae516023df7b096d885a12b01000000, repr=<ray.serve._private.replica.ServeReplica:default:DeployLLM object at 0x7fc14717e740>)
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py", line 451, in result
(ServeController pid=30685) return self.__get_result()
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
(ServeController pid=30685) raise self._exception
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 442, in initialize_and_get_metadata
(ServeController pid=30685) raise RuntimeError(traceback.format_exc()) from None
(ServeController pid=30685) RuntimeError: Traceback (most recent call last):
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 430, in initialize_and_get_metadata
(ServeController pid=30685) await self._initialize_replica()
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 190, in initialize_replica
(ServeController pid=30685) await sync_to_async(_callable.init)(*init_args, **init_kwargs)
(ServeController pid=30685) File "/home/ec2-user/SageMaker/multi_source_chatbot/./serve_llm.py", line 19, in init
(ServeController pid=30685) llm=prepare_model()
(ServeController pid=30685) File "/home/ec2-user/SageMaker/multi_source_chatbot/initialize_query_engine.py", line 62, in prepare_model
(ServeController pid=30685) llm = HuggingFaceLLM(
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/llama_index/llms/huggingface.py", line 131, in init
(ServeController pid=30685) self._model = model or AutoModelForCausalLM.from_pretrained(
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 565, in from_pretrained
(ServeController pid=30685) return model_class.from_pretrained(
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2634, in from_pretrained
(ServeController pid=30685) raise ImportError(
(ServeController pid=30685) ImportError: Using load_in_8bit=True
requires Accelerate: pip install accelerate
and the latest version of bitsandbytes pip install -i https://test.pypi.org/simple/ bitsandbytes
or pip install bitsandbytes`
(ServeController pid=30685) WARNING 2023-10-23 08:06:41,953 controller 30685 application_state.py:663 - The deployments ['DeployLLM'] are UNHEALTHY.
(ServeController pid=30685) INFO 2023-10-23 08:06:42,055 controller 30685 deployment_state.py:2027 - Replica default#DeployLLM#rLAxJo is stopped.
(ServeController pid=30685) INFO 2023-10-23 08:06:42,055 controller 30685 deployment_state.py:1679 - Adding 1 replica to deployment DeployLLM in application 'default'.
(ServeController pid=30685) WARNING 2023-10-23 08:06:42,058 controller 30685 application_state.py:663 - The deployments ['DeployLLM'] are UNHEALTHY.
(ServeController pid=30685) WARNING 2023-10-23 08:06:42,160 controller 30685 application_state.py:663 - The deployments ['DeployLLM'] are UNHEALTHY.
(ServeController pid=30685) WARNING 2023-10-23 08:06:42,263 controller 30685 application_state.py:663 - The deployments ['DeployLLM'] are UNHEALTHY.
Traceback (most recent call last):
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/scripts.py", line 518, in run
handle = serve.run(app, host=host, port=port)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/api.py", line 574, in run
client.deploy_application(
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/client.py", line 47, in check
return f(self, *args, **kwargs)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/client.py", line 330, in deploy_application
self._wait_for_application_running(name)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/client.py", line 255, in _wait_for_application_running
raise RuntimeError(
RuntimeError: Deploying application default failed: The deployments ['DeployLLM'] are UNHEALTHY.
2023-10-23 08:06:42,406 ERR scripts.py:564 -- Received unexpected error, see console logs for more details. Shutting down...
(ServeController pid=30685) WARNING 2023-10-23 08:06:42,365 controller 30685 application_state.py:663 - The deployments ['DeployLLM'] are UNHEALTHY.
(ServeController pid=30685) INFO 2023-10-23 08:06:42,473 controller 30685 deployment_state.py:1707 - Removing 1 replica from deployment 'DeployLLM' in application 'default'.
(ServeController pid=30685) INFO 2023-10-23 08:06:48,914 controller 30685 deployment_state.py:2027 - Replica default#DeployLLM#xBCgBJ is stopped.
Can you help me with this ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.