Coder Social home page Coder Social logo

Comments (7)

snexus avatar snexus commented on August 23, 2024 1

It is the absolute number of layers and depends on the actual model architecture. When the model is loaded, in this case using llamacpp, you can see it in the log (see screenshot attached).

So in the example below, the model consists of 43 layers, and 15 were offloaded to GPU. You can then check VRAM usage and adjust n_gpu_layers accordingly. You potentially will need more memory than it's currently stated, depending on the context length and the embedding model used (which also requires GPU in most cases)

image

from llm-search.

snexus avatar snexus commented on August 23, 2024 1

I've created a demo notebook on how to run it on Google Colab (free tier) - https://github.com/snexus/llm-search/blob/main/notebooks/llmsearch_google_colab_demo.ipynb

from llm-search.

ziptron avatar ziptron commented on August 23, 2024 1

Wow thanks so much! I tried this out this morning and it works well! I may not have been setting the variables (below) correctly, or at all to be honest.

%env CMAKE_ARGS="-DLLAMA_CUBLAS=on"

%env FORCE_CMAKE=1

Thanks for making this project and for your help.

from llm-search.

snexus avatar snexus commented on August 23, 2024

Hi,

I never tried to run it on Google Colab, 15GB should be enough for this model - I can run it locally on 10GB VRAM card (with half of the layers offloaded to CPU).
If you are still stuck - do you mind posting your model's section of config.yaml and I will try to reproduce it?

from llm-search.

ziptron avatar ziptron commented on August 23, 2024

Thanks for responding. I do think this may be a Colab issue, so I'll keep trying today and post results later.

By the way, stupid question, how do you know how many "layers" there are? I've been fiddling with the n_gpu_layers parameter, but I cannot quite understand what that means. Does 50 mean 50% (half), or is that a unit of layers? If you could point me towards some info on that I'd much appreciate it.

Thanks!

from llm-search.

ziptron avatar ziptron commented on August 23, 2024

This screen shot made me realize that I am not offloading anything to the GPU. See mine below.

I had some errors while installing (see below). Should I try to resolve these errors you think? Or is there a different way to diagnose why I'm not offloading to the GPU?

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires requests==2.27.1, but you have requests 2.29.0 which is incompatible.
tensorflow 2.12.0 requires protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3, but you have protobuf 3.20.2 which is incompatible.
tensorflow-metadata 1.13.1 requires protobuf<5,>=3.20.3, but you have protobuf 3.20.2 which is incompatible.
torchaudio 2.0.2+cu118 requires torch==2.0.1, but you have torch 2.0.0 which is incompatible.
torchdata 0.6.1 requires torch==2.0.1, but you have torch 2.0.0 which is incompatible.
torchtext 0.15.2 requires torch==2.0.1, but you have torch 2.0.0 which is incompatible.
Successfully installed InstructorEmbedding-1.0.1 XlsxWriter-3.1.2 accelerate-0.19.0 argilla-1.13.3 auto-gptq-0.3.0 backoff-2.2.1 bitsandbytes-0.41.0 chromadb-0.3.26 clickhouse-connect-0.6.8 coloredlogs-15.0.1 cryptography-41.0.2 dataclasses-json-0.5.14 datasets-2.14.2 deprecated-1.2.14 dill-0.3.7 diskcache-5.6.1 einops-0.6.1 fastapi-0.95.1 filetype-1.2.0 gitdb-4.0.10 gitpython-3.1.32 h11-0.14.0 hnswlib-0.7.0 httpcore-0.16.3 httptools-0.6.0 httpx-0.23.3 huggingface-hub-0.16.4 humanfriendly-10.0 langchain-0.0.219 langchainplus-sdk-0.0.20 llama-cpp-python-0.1.77 llama-index-0.6.9 llmsearch-0.1.dev74+g7207a16.d20230801 loguru-0.7.0 lz4-4.3.2 marshmallow-3.20.1 monotonic-1.6 msg-parser-1.2.0 multiprocess-0.70.15 mypy-extensions-1.0.0 nvidia-cublas-cu11-11.10.3.66 nvidia-cuda-cupti-cu11-11.7.101 nvidia-cuda-nvrtc-cu11-11.7.99 nvidia-cuda-runtime-cu11-11.7.99 nvidia-cudnn-cu11-8.5.0.96 nvidia-cufft-cu11-10.9.0.58 nvidia-curand-cu11-10.2.10.91 nvidia-cusolver-cu11-11.4.0.1 nvidia-cusparse-cu11-11.7.4.91 nvidia-nccl-cu11-2.14.3 nvidia-nvtx-cu11-11.7.91 olefile-0.46 onnxruntime-1.15.1 openai-0.27.8 openapi-schema-pydantic-1.2.4 overrides-7.3.1 pdf2image-1.16.3 pdfminer.six-20221105 peft-0.4.0 posthog-3.0.1 protobuf-3.20.2 pulsar-client-3.2.0 pydeck-0.8.1b0 pympler-1.0.1 pymupdf-1.22.5 pypandoc-1.11 pypdf2-3.0.1 python-docx-0.8.11 python-dotenv-1.0.0 python-magic-0.4.27 python-pptx-0.6.21 pytz-deprecation-shim-0.1.0.post0 requests-2.29.0 rfc3986-1.5.0 rouge-1.0.1 safetensors-0.3.1 sentence-transformers-2.2.2 sentencepiece-0.1.99 smmap-5.0.0 sqlalchemy-1.4.48 starlette-0.26.1 streamlit-1.24.1 threadpoolctl-3.1.0 tiktoken-0.3.3 tokenizers-0.13.3 torch-2.0.0 torchvision-0.15.1 transformers-4.29.2 typer-0.7.0 typing-inspect-0.9.0 tzdata-2023.3 tzlocal-4.3.1 unstructured-0.7.8 uvicorn-0.23.2 uvloop-0.17.0 validators-0.20.0 watchdog-3.0.0 watchfiles-0.19.0 websockets-11.0.3 xxhash-3.3.0 zstandard-0.21.0

WARNING: The following packages were previously imported in this runtime:
  [google]
You must restart the runtime in order to use newly installed versions.

image

from llm-search.

snexus avatar snexus commented on August 23, 2024

Sorry that you are facing problems.

It looks llamacpp was built without GPU support during the installation, that's why you don't see it in the output. Will need to investigate how to enable it in the Colab environment,

On a local GPU-enabled computer, assuming all the prerequisites are installed, llamacpp needs the flags described in https://github.com/ggerganov/llama.cpp#cublas in order to build with GPU support.

In this repository, these flags are set using setvars.sh before the installation (it is also described in README).

from llm-search.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.