googlecloudplatform / localllm Goto Github PK

View Code? Open in Web Editor NEW

1.5K 34.0 114.0 118 KB

License: Apache License 2.0

Dockerfile 3.00% Python 96.63% Shell 0.37%

localllm's People

Contributors

Stargazers

Watchers

Forkers

bobcatfish pandyaved98 techthiyanes architectureofthings wauplin aravindhsiva wzljerry alberto-codes polya20 p3ngu1nzz jaytoday marmikreal windydanwd ailabteam splintersp farhadrafshar lasqai creliafutu glutillii 65execcl cubacken-l i-excillu prorhubheadlinte bolgarnk racergigaeyeecht buffywideangeliche m-zhar ifallenodergooj kadantte billite-jiggyough jordanh mbrukman buffar-e munirabobaker hossein4368260 rainberlohalicket mikehare47 rb1717 gmh5225 huiminlim jitu028 saeed76a heysaeed arturulizko emagweb sk-15 itzik951 rhevin dadudida ssghost heenadata padre33 somayeh021 soubanhchittavilai fateme211 zenizer omarofo selimwaly mrskiffer agriffondirty buggytrauma43 43prorhub godzillayellowquestan chattypyrei o-cookiecere mortoupe-lucyto runninger-y roseswasabiquanttrendy marjellopposolis wisellhenryal gossipak76 mediant-f stevenyawney1969 dimuthnilanjana noeliaerika hanooch74 pourmoeziashkan watnye mrnobody700 yanxg tonywhite11 seelamsreekanthreddy the-user-python tonyonst-t burrofilldrkuro stamahalaoso heinsm scoopenligotiz 85-stamaha sisoccu77sanor gazettekissez-v pattyrobo-s ilovecrypto17 suryatmodulus awhipped comfyrejiggyny tutumomo amirulandalib gurpreetkaurjethra inzy

localllm's Issues

Suggest picking a name other than "llm" for the CLI tool

localllm/llm-tool/setup.py

Lines 19 to 23 in d27376f

    
           setup( 
        
               name='llm', 
        
               version='0.0.1', 
        
               py_modules=[ 
        
                   'llm',

And:

localllm/llm-tool/setup.py

Lines 34 to 38 in d27376f

    
           entry_points={ 
        
               'console_scripts': [ 
        
                   'llm = llm:cli', 
        
               ], 
        
           },

I'm the author of https://pypi.org/project/llm/ which installs a package called llm and a CLI tool called llm as well. My llm tool is similar to localllm in as much as my tool lets you execute prompts in the terminal, against both remote models and local models (using llama-cpp-python).

As it stands using my tool and this tool in the same environment won't work, because of the namespace clash.

If you pick a different name for this you can also publish it to PyPI, which would make for a more convenient installation experience for end users.

https://llm.datasette.io/ has more about how my tool works and what it does.

erro0r: Maven config is not supported f

Could you please check what could be wrong here?
I am running this on MAC M1

MacBook-M1 % cd localllm
-MacBook-M1 localllm % gcloud artifacts repositories create $LOCALLLM_REGISTRY
--repository-format=docker
--location=$REGION
--description="DESCRIPTION"
ERROR: (gcloud.artifacts.repositories.create) INVALID_ARGUMENT: Maven config is not supported for format "DOCKER"

ERROR: gcloud crashed (AttributeError): May not assign arbitrary value disableSsh to message GceInstance

I was running the standard command as given in the tutorial to this error, i tried googling this and didn't find much. Could anyone help me out?

iampoppyxx@iampoppyxx:~/new/localllm$ gcloud workstations configs create $LOCALLLM_WORKSTATION \
--region=$REGION \
--cluster=$CLUSTER \
--machine-type=e2-standard-32 \
--container-custom-image=us-central1-docker.pkg.dev/${PROJECT_ID}/${LOCALLLM_REGISTRY}/${LOCALLLM_IMAGE_NAME}:latest

Capture (stderr) logs from llama-cpp-python cleanly

When we start the process running llama-cpp-python, we provide a pipe for stderr and then promptly close it. This means if llama-cpp-python tries to write to stderr, a broken pipe exception is thrown, which for example happens if there is a prefix cache hit when processing a prompt.

#19 is a quick fix for this but it's a bit icky b/c we're still breaking stderr.

What we need to do here is:

not provide a broken pipe for stderr
actually capture logs from llama-cpp-python so they end up in the same place as the logs from uvicorn (added in #18)

Some ideas for how to do this:

Contribute a fix back to llama-cpp-python that updates writes to stderr to write to a logger instead so the logger can be configured
Instead of just spawning the llama-cpp-python process, fork another process that itself spawns that process, captures stderr as the process runs and streams it to a log

Capture logs from running models

If you run models directly using llama-cpp-python's webserver, helpful output is written to stdout and stderr about what's going on (and we even capture that output to make sure things start up correctly) but once the model is running, nothing is capturing that output and writing it anywhere.

This makes it difficult to debug when things go wrong (like #3), so we should capture this output as logs and make it available.

README: Make cluster creation delay clear

update point 6 in documentation to explicitly state cluster creation takes approximately 20 minutes and need to wait for it before moving forward.

Followed the instruction - running locally. Runs once then fails afterward

Install the tools

pip3 install openai
pip3 install ./llm-tool/.

llm run TheBloke/Llama-2-13B-Ensemble-v5-GGUF 8000

python3 querylocal.py

Actual Result: Works!

Run python3 querylocal.py again

Fails

http://localhost:8000/v1
Traceback (most recent call last):
File "/home/username/localllm/querylocal.py", line 40, in
chat_completion = client.chat.completions.create(
File "/home/username/miniconda3/envs/localllm/lib/python3.10/site-packages/openai/_utils/_utils.py", line 271, in wrapper
return func(*args, **kwargs)
File "/home/username/miniconda3/envs/localllm/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 659, in create
return self._post(
File "/home/username/miniconda3/envs/localllm/lib/python3.10/site-packages/openai/_base_client.py", line 1200, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "/home/username/miniconda3/envs/localllm/lib/python3.10/site-packages/openai/_base_client.py", line 889, in request
return self._request(
File "/home/username/miniconda3/envs/localllm/lib/python3.10/site-packages/openai/_base_client.py", line 965, in _request
return self._retry_request(
File "/home/username/miniconda3/envs/localllm/lib/python3.10/site-packages/openai/_base_client.py", line 1013, in _retry_request
return self._request(
File "/home/username/miniconda3/envs/localllm/lib/python3.10/site-packages/openai/_base_client.py", line 965, in _request
return self._retry_request(
File "/home/username/miniconda3/envs/localllm/lib/python3.10/site-packages/openai/_base_client.py", line 1013, in _retry_request
return self._request(
File "/home/username/miniconda3/envs/localllm/lib/python3.10/site-packages/openai/_base_client.py", line 980, in _request
raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Internal Server Error

llm commands do not gracefully handle zombie processes in ps list

I'll submit a PR shortly for this trivial fix.

Running llm ps or llm kill on my poor, tired development system resulted in:

$ llm ps
Traceback (most recent call last):
  File "/Users/jrhusney/.miniforge3/lib/python3.10/site-packages/psutil/_psosx.py", line 352, in wrapper
    return fun(self, *args, **kwargs)
  File "/Users/jrhusney/.miniforge3/lib/python3.10/site-packages/psutil/_psosx.py", line 413, in environ
    return parse_environ_block(cext.proc_environ(self.pid))
ProcessLookupError: [Errno 3] assume no such process (originated from sysctl(KERN_PROCARGS2) -> EINVAL)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/jrhusney/.miniforge3/bin/llm", line 8, in <module>
    sys.exit(cli())
  File "/Users/jrhusney/.miniforge3/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/Users/jrhusney/.miniforge3/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/Users/jrhusney/.miniforge3/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/jrhusney/.miniforge3/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/jrhusney/.miniforge3/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/Users/jrhusney/.miniforge3/lib/python3.10/site-packages/llm.py", line 115, in ps
    m = modelserving.running_models()
  File "/Users/jrhusney/.miniforge3/lib/python3.10/site-packages/modelserving.py", line 39, in running_models
    env = p.environ()
  File "/Users/jrhusney/.miniforge3/lib/python3.10/site-packages/psutil/__init__.py", line 889, in environ
    return self._proc.environ()
  File "/Users/jrhusney/.miniforge3/lib/python3.10/site-packages/psutil/_psosx.py", line 355, in wrapper
    raise ZombieProcess(self.pid, self._name, self._ppid)
psutil.ZombieProcess: PID still exists but it's a zombie (pid=2599, ppid=1071, name='launcher')

This exception needs to be caught and ignored.

Offending process looked like:

ps ax | grep 2599
 2599   ??  Z      0:00.00 <defunct>

Option to enable the GPU

Hi All,

First of all thank you for this excellent tool which makes it very to easy to run the LLM models without any hassle.

I am aware that the main purpose of the localllm is to eliminate the dependency on GPUs and run the models using CPU. However I wanted to know if there is an option to offload the layers to the GPU.

Machine : Compute engine created in GCP
OS : Ubuntu 22.04 LTS
GPU : Tesla T4

Steps I followed thus far is as given below:

Installed the Nvidia driver in the compute engine. nvidia-smi output as given below
Assuming localllm does not directly provide an option to enable GPU ( I may be wrong here), I clone the llama-cpp-python repository, and updated the n_gpu_layers to 4 in llama_cpp/server/settings.py.
Built the package by running pip install -e ., complete step is given here
Killed the localllm and started again.

However I still see that the GPUs are not being utilized.

Are the above steps correct or did I miss anything here?

Thank you,
KK

googlecloudplatform / localllm Goto Github PK

localllm's People

Contributors

Stargazers

Watchers

Forkers

localllm's Issues

Suggest picking a name other than "llm" for the CLI tool

erro0r: Maven config is not supported f

ERROR: gcloud crashed (AttributeError): May not assign arbitrary value disableSsh to message GceInstance

Capture (stderr) logs from llama-cpp-python cleanly

Capture logs from running models

README: Make cluster creation delay clear

Followed the instruction - running locally. Runs once then fails afterward

Install the tools

llm commands do not gracefully handle zombie processes in ps list

Option to enable the GPU

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent