Coder Social home page Coder Social logo

llama_index_ray's People

Contributors

amogkam avatar angelinalg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

llama_index_ray's Issues

Error while loading the model from Huggingface with Ray Serve

@amogkam I was trying to implement ray serve to enhance the inference speed.But during implementation got the error about loading the model saying load_in_8_bit requires accelerate and bitsandbytes which I already have installed.

Below is the code:

serve_llm.py

from ray import serve
from starlette.requests import Request
import ray
from llama_index import (
StorageContext,
ServiceContext,
load_index_from_storage,
set_global_service_context

)
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index.embeddings import LangchainEmbedding
from initialize_query_engine import prepare_model

@serve.deployment
class DeployLLM:
def init(self):
llm=prepare_model()
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda"}
embeddings =LangchainEmbedding(
HuggingFaceEmbeddings(model_name=model_name)
)
service_context = ServiceContext.from_defaults(
chunk_size=1024,
llm=llm,
embed_model=embeddings
)
# # And set the service context
set_global_service_context(service_context)
storage_context = StorageContext.from_defaults(persist_dir="doc_index")
# Load index from the storage context
new_index = load_index_from_storage(storage_context)
self.query_engine = new_index.as_query_engine()

def run_index(self,prompt):
    return self.query_engine.query(query)

async def __call__(self,request:Request):
    
    query=request.query_params("text")
    response=self.run_index(query)
    return response["text"]

deployment = DeployLLM.bind()

Code to load model from initialize_query_engine.py:

def prepare_model():
compute_dtype=getattr(torch,"float16")

bnb_config=BitsAndBytesConfig(
  load_in_4bit=True,
  bnb_4bit_quant_type="nf4",
  bnb_4bit_compute_dtype=compute_dtype,
  bnb_4bit_use_double_quant=True,
)

model_name="meta-llama/Llama-2-7b-chat-hf"
system_prompt ="""<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as
helpfully as possible, while being safe. Your answers should not include
any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content.
Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain
why instead of answering something not correct. If you don't know the answer
to a question, please don't share false information.

<</SYS>>
"""
# Throw together the query wrapper
query_wrapper_prompt = SimpleInputPrompt("{query_str} [/INST]")

llm = HuggingFaceLLM(
  context_window=3800,
  max_new_tokens=800,
  generate_kwargs={
      "temperature": 0.5,
      # "return_full_text":True,
      "do_sample": True,
      "repetition_penalty":1.1,
      "top_p":0.7,
      "top_k":50,
      # "return_dict_in_generate":True,
      },
  query_wrapper_prompt=query_wrapper_prompt,
  tokenizer_name=model_name,
  model_name=model_name,
  device_map="auto",
  # change these settings below depending on your GPU
  model_kwargs={"quantization_config":bnb_config},
  tokenizer_outputs_to_remove=["token_type_ids"]
)

return llm

Error

2023-10-23 08:06:09,375 INFO scripts.py:471 -- Running import path: 'serve_llm:deployment'.
2023-10-23 08:06:17,019 INFO worker.py:1633 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
(ServeController pid=30685) INFO 2023-10-23 08:06:20,341 controller 30685 deployment_state.py:1390 - Deploying new version of deployment DeployLLM in application 'default'.
(HTTPProxyActor pid=30737) INFO 2023-10-23 08:06:20,296 http_proxy 172.16.76.141 http_proxy.py:1433 - Proxy actor d36cd5783a9a7e95dee6f00601000000 starting on node 186eb3c957b3138fddc979d5862985edc066388e74fedbfc3e77a83e.
(HTTPProxyActor pid=30737) INFO 2023-10-23 08:06:20,303 http_proxy 172.16.76.141 http_proxy.py:1617 - Starting HTTP server on node: 186eb3c957b3138fddc979d5862985edc066388e74fedbfc3e77a83e listening on port 8000
(HTTPProxyActor pid=30737) INFO: Started server process [30737]
(ServeController pid=30685) INFO 2023-10-23 08:06:20,444 controller 30685 deployment_state.py:1679 - Adding 1 replica to deployment DeployLLM in application 'default'.
(ServeController pid=30685) ERROR 2023-10-23 08:06:27,590 controller 30685 deployment_state.py:617 - Exception in replica 'default#DeployLLM#Ekjjks', the replica will be stopped.
(ServeController pid=30685) Traceback (most recent call last):
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/deployment_state.py", line 615, in check_ready
(ServeController pid=30685) _, self._version = ray.get(self._ready_obj_ref)
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
(ServeController pid=30685) return fn(*args, **kwargs)
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
(ServeController pid=30685) return func(*args, **kwargs)
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/worker.py", line 2547, in get
(ServeController pid=30685) raise value.as_instanceof_cause()
(ServeController pid=30685) ray.exceptions.RayTaskError(RuntimeError): ray::ServeReplica:default:DeployLLM.initialize_and_get_metadata() (pid=30776, ip=172.16.76.141, actor_id=2f688bdb2fbeec4f34b4760a01000000, repr=<ray.serve._private.replica.ServeReplica:default:DeployLLM object at 0x7f5f47f1a7a0>)
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py", line 451, in result
(ServeController pid=30685) return self.__get_result()
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
(ServeController pid=30685) raise self._exception
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 442, in initialize_and_get_metadata
(ServeController pid=30685) raise RuntimeError(traceback.format_exc()) from None
(ServeController pid=30685) RuntimeError: Traceback (most recent call last):
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 430, in initialize_and_get_metadata
(ServeController pid=30685) await self._initialize_replica()
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 190, in initialize_replica
(ServeController pid=30685) await sync_to_async(_callable.init)(*init_args, **init_kwargs)
(ServeController pid=30685) File "/home/ec2-user/SageMaker/multi_source_chatbot/./serve_llm.py", line 19, in init
(ServeController pid=30685) llm=prepare_model()
(ServeController pid=30685) File "/home/ec2-user/SageMaker/multi_source_chatbot/initialize_query_engine.py", line 62, in prepare_model
(ServeController pid=30685) llm = HuggingFaceLLM(
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/llama_index/llms/huggingface.py", line 131, in init
(ServeController pid=30685) self._model = model or AutoModelForCausalLM.from_pretrained(
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 565, in from_pretrained
(ServeController pid=30685) return model_class.from_pretrained(
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2634, in from_pretrained
(ServeController pid=30685) raise ImportError(
(ServeController pid=30685) ImportError: Using load_in_8bit=True requires Accelerate: pip install accelerate and the latest version of bitsandbytes pip install -i https://test.pypi.org/simple/ bitsandbytes or pip install bitsandbytes(ServeController pid=30685) INFO 2023-10-23 08:06:27,696 controller 30685 deployment_state.py:2027 - Replica default#DeployLLM#Ekjjks is stopped. (ServeController pid=30685) INFO 2023-10-23 08:06:27,697 controller 30685 deployment_state.py:1679 - Adding 1 replica to deployment DeployLLM in application 'default'. (ServeController pid=30685) ERROR 2023-10-23 08:06:34,719 controller 30685 deployment_state.py:617 - Exception in replica 'default#DeployLLM#WPbqiH', the replica will be stopped. (ServeController pid=30685) Traceback (most recent call last): (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/deployment_state.py", line 615, in check_ready (ServeController pid=30685) _, self._version = ray.get(self._ready_obj_ref) (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper (ServeController pid=30685) return fn(*args, **kwargs) (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper (ServeController pid=30685) return func(*args, **kwargs) (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/worker.py", line 2547, in get (ServeController pid=30685) raise value.as_instanceof_cause() (ServeController pid=30685) ray.exceptions.RayTaskError(RuntimeError): ray::ServeReplica:default:DeployLLM.initialize_and_get_metadata() (pid=30913, ip=172.16.76.141, actor_id=e8bfd47caf158369c340f6d201000000, repr=<ray.serve._private.replica.ServeReplica:default:DeployLLM object at 0x7f8e642ae770>) (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py", line 451, in result (ServeController pid=30685) return self.__get_result() (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result (ServeController pid=30685) raise self._exception (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 442, in initialize_and_get_metadata (ServeController pid=30685) raise RuntimeError(traceback.format_exc()) from None (ServeController pid=30685) RuntimeError: Traceback (most recent call last): (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 430, in initialize_and_get_metadata (ServeController pid=30685) await self._initialize_replica() (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 190, in initialize_replica (ServeController pid=30685) await sync_to_async(_callable.__init__)(*init_args, **init_kwargs) (ServeController pid=30685) File "/home/ec2-user/SageMaker/multi_source_chatbot/./serve_llm.py", line 19, in __init__ (ServeController pid=30685) llm=prepare_model() (ServeController pid=30685) File "/home/ec2-user/SageMaker/multi_source_chatbot/initialize_query_engine.py", line 62, in prepare_model (ServeController pid=30685) llm = HuggingFaceLLM( (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/llama_index/llms/huggingface.py", line 131, in __init__ (ServeController pid=30685) self._model = model or AutoModelForCausalLM.from_pretrained( (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 565, in from_pretrained (ServeController pid=30685) return model_class.from_pretrained( (ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2634, in from_pretrained (ServeController pid=30685) raise ImportError( (ServeController pid=30685) ImportError: Usingload_in_8bit=Truerequires Accelerate:pip install accelerateand the latest version of bitsandbytespip install -i https://test.pypi.org/simple/ bitsandbytes or pip install bitsandbytes
(ServeController pid=30685) INFO 2023-10-23 08:06:34,823 controller 30685 deployment_state.py:2027 - Replica default#DeployLLM#WPbqiH is stopped.
(ServeController pid=30685) INFO 2023-10-23 08:06:34,823 controller 30685 deployment_state.py:1679 - Adding 1 replica to deployment DeployLLM in application 'default'.
(ServeController pid=30685) ERROR 2023-10-23 08:06:41,951 controller 30685 deployment_state.py:617 - Exception in replica 'default#DeployLLM#rLAxJo', the replica will be stopped.
(ServeController pid=30685) Traceback (most recent call last):
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/deployment_state.py", line 615, in check_ready
(ServeController pid=30685) _, self._version = ray.get(self._ready_obj_ref)
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
(ServeController pid=30685) return fn(*args, **kwargs)
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
(ServeController pid=30685) return func(*args, **kwargs)
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/worker.py", line 2547, in get
(ServeController pid=30685) raise value.as_instanceof_cause()
(ServeController pid=30685) ray.exceptions.RayTaskError(RuntimeError): ray::ServeReplica:default:DeployLLM.initialize_and_get_metadata() (pid=31032, ip=172.16.76.141, actor_id=7ae516023df7b096d885a12b01000000, repr=<ray.serve._private.replica.ServeReplica:default:DeployLLM object at 0x7fc14717e740>)
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py", line 451, in result
(ServeController pid=30685) return self.__get_result()
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
(ServeController pid=30685) raise self._exception
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 442, in initialize_and_get_metadata
(ServeController pid=30685) raise RuntimeError(traceback.format_exc()) from None
(ServeController pid=30685) RuntimeError: Traceback (most recent call last):
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 430, in initialize_and_get_metadata
(ServeController pid=30685) await self._initialize_replica()
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 190, in initialize_replica
(ServeController pid=30685) await sync_to_async(_callable.init)(*init_args, **init_kwargs)
(ServeController pid=30685) File "/home/ec2-user/SageMaker/multi_source_chatbot/./serve_llm.py", line 19, in init
(ServeController pid=30685) llm=prepare_model()
(ServeController pid=30685) File "/home/ec2-user/SageMaker/multi_source_chatbot/initialize_query_engine.py", line 62, in prepare_model
(ServeController pid=30685) llm = HuggingFaceLLM(
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/llama_index/llms/huggingface.py", line 131, in init
(ServeController pid=30685) self._model = model or AutoModelForCausalLM.from_pretrained(
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 565, in from_pretrained
(ServeController pid=30685) return model_class.from_pretrained(
(ServeController pid=30685) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2634, in from_pretrained
(ServeController pid=30685) raise ImportError(
(ServeController pid=30685) ImportError: Using load_in_8bit=True requires Accelerate: pip install accelerate and the latest version of bitsandbytes pip install -i https://test.pypi.org/simple/ bitsandbytes or pip install bitsandbytes`
(ServeController pid=30685) WARNING 2023-10-23 08:06:41,953 controller 30685 application_state.py:663 - The deployments ['DeployLLM'] are UNHEALTHY.
(ServeController pid=30685) INFO 2023-10-23 08:06:42,055 controller 30685 deployment_state.py:2027 - Replica default#DeployLLM#rLAxJo is stopped.
(ServeController pid=30685) INFO 2023-10-23 08:06:42,055 controller 30685 deployment_state.py:1679 - Adding 1 replica to deployment DeployLLM in application 'default'.
(ServeController pid=30685) WARNING 2023-10-23 08:06:42,058 controller 30685 application_state.py:663 - The deployments ['DeployLLM'] are UNHEALTHY.
(ServeController pid=30685) WARNING 2023-10-23 08:06:42,160 controller 30685 application_state.py:663 - The deployments ['DeployLLM'] are UNHEALTHY.
(ServeController pid=30685) WARNING 2023-10-23 08:06:42,263 controller 30685 application_state.py:663 - The deployments ['DeployLLM'] are UNHEALTHY.
Traceback (most recent call last):
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/scripts.py", line 518, in run
handle = serve.run(app, host=host, port=port)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/api.py", line 574, in run
client.deploy_application(
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/client.py", line 47, in check
return f(self, *args, **kwargs)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/client.py", line 330, in deploy_application
self._wait_for_application_running(name)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/client.py", line 255, in _wait_for_application_running
raise RuntimeError(
RuntimeError: Deploying application default failed: The deployments ['DeployLLM'] are UNHEALTHY.
2023-10-23 08:06:42,406 ERR scripts.py:564 -- Received unexpected error, see console logs for more details. Shutting down...
(ServeController pid=30685) WARNING 2023-10-23 08:06:42,365 controller 30685 application_state.py:663 - The deployments ['DeployLLM'] are UNHEALTHY.
(ServeController pid=30685) INFO 2023-10-23 08:06:42,473 controller 30685 deployment_state.py:1707 - Removing 1 replica from deployment 'DeployLLM' in application 'default'.
(ServeController pid=30685) INFO 2023-10-23 08:06:48,914 controller 30685 deployment_state.py:2027 - Replica default#DeployLLM#xBCgBJ is stopped.

Can you help me with this ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.