Coder Social home page Coder Social logo

Apple silicon support? about fauxpilot HOT 12 OPEN

moyix avatar moyix commented on April 25, 2024 8
Apple silicon support?

from fauxpilot.

Comments (12)

moyix avatar moyix commented on April 25, 2024 4

Yep, unfortunately the FasterTransformer code is very tied to NVIDIA cards (perhaps unsurprising since FasterTransformer is made by... NVIDIA).

However, there have been some really exciting improvements in low-latency inference recently via DeepSpeed and INT8 quantization that might allow us to replace the FT backend with something that works on a wider variety of hardware (with less memory usage too!) without sacrificing performance:

https://huggingface.co/blog/bloom-inference-pytorch-scripts

from fauxpilot.

eous avatar eous commented on April 25, 2024 2

For what it’s worth PyTorch has support for an mps backend that one can query for and set that will drastically improve performance on apple silicon. For most things it’s as simple as setting device(“mps”)

https://pytorch.org/docs/stable/notes/mps.html

from fauxpilot.

thakkarparth007 avatar thakkarparth007 commented on April 25, 2024 1

Looks like nvidia-docker isn't still fully supported on Apple M1 (see this: NVIDIA/nvidia-docker#101 and nathanwbrei/phasm#8 (comment)). I don't have a Mac with me currently, so can't try this out myself unfortunately.

from fauxpilot.

quirtt avatar quirtt commented on April 25, 2024 1

Any progress on this? Or other alternatives?

from fauxpilot.

TechnologyClassroom avatar TechnologyClassroom commented on April 25, 2024

Does the new Apple hardware include NVIDIA graphics cards? If not, this repo will not work for you. For more information, see issue #4

from fauxpilot.

old-syniex avatar old-syniex commented on April 25, 2024

Does the new Apple hardware include NVIDIA graphics cards? If not, this repo will not work for you. For more information, see issue #4

Did you succeed to run it?

I tried never successfully

from fauxpilot.

dslandry avatar dslandry commented on April 25, 2024

Does the new Apple hardware include NVIDIA graphics cards? If not, this repo will not work for you. For more information, see issue #4

Unfortunately, the new macbooks with apple silicon do not contain NVIDIA cards. Maybe we can consider using the neural engine on the M1 chips.

from fauxpilot.

moyix avatar moyix commented on April 25, 2024

Now that we have a Python backend, you be able to get this working?

from fauxpilot.

romainr avatar romainr commented on April 25, 2024

On a MacBook Pro Ventura 13.1 with the Python backend:

Choose your backend:
[1] FasterTransformer backend (faster, but limited models)
[2] Python backend (slower, but more models, and allows loading with int8)
Enter your choice [1]: 2
Models available:
[1] codegen-350M-mono (1GB total VRAM required; Python-only)
[2] codegen-350M-multi (1GB total VRAM required; multi-language)
[3] codegen-2B-mono (4GB total VRAM required; Python-only)
[4] codegen-2B-multi (4GB total VRAM required; multi-language)
Enter your choice [4]: 1

it still seems to fail about libnvidia-ml.so.1:

Attaching to fauxpilot-copilot_proxy-1, fauxpilot-triton-1
Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown

Full trace

./setup.sh                                                                                                                            Node 16.13.1
Checking for curl ...
/usr/bin/curl
Checking for zstd ...
/usr/local/bin/zstd
Checking for docker ...
/usr/local/bin/docker
Enter number of GPUs [1]: 
External port for the API [5000]: 
Address for Triton [triton]: 
Port of Triton host [8001]: 
Where do you want to save your models [/Users/romain.rigaux/projects/fauxpilot/models]? 
Choose your backend:
[1] FasterTransformer backend (faster, but limited models)
[2] Python backend (slower, but more models, and allows loading with int8)
Enter your choice [1]: 2
Models available:
[1] codegen-350M-mono (1GB total VRAM required; Python-only)
[2] codegen-350M-multi (1GB total VRAM required; multi-language)
[3] codegen-2B-mono (4GB total VRAM required; Python-only)
[4] codegen-2B-multi (4GB total VRAM required; multi-language)
Enter your choice [4]: 1
Do you want to share your huggingface cache between host and docker container? y/n [n]: 
Do you want to use int8? y/n [y]: 
Config written to /Users/romain.rigaux/projects/fauxpilot/models/py-Salesforce-codegen-350M-mono/py-model/config.pbtxt
[+] Building 0.0s (0/0)                                                                                                                                                           
[+] Building 0.1s (2/3)                                                                                                                                                           
 => [internal] load build definition from Dockerfile                                                                                                                         0.0s
[+] Building 0.3s (2/3)                                                                                                                                                           
 => [internal] load build definition from Dockerfile                                                                                                                         0.0s
 => => transferring dockerfile: 32B                                                                                                                                          0.0s
[+] Building 2.1s (10/10) FINISHED                                                                                                                                                
 => [internal] load build definition from Dockerfile                                                                                                                         0.0s
 => => transferring dockerfile: 32B                                                                                                                                          0.0s
 => [internal] load .dockerignore                                                                                                                                            0.0s
 => => transferring context: 35B                                                                                                                                             0.0s
 => [internal] load metadata for docker.io/library/python:3.10-slim-buster                                                                                                   2.0s
 => [internal] load build context                                                                                                                                            0.0s
 => => transferring context: 1.15kB                                                                                                                                          0.0s
 => [1/5] FROM docker.io/library/python:3.10-slim-buster@sha256:8c2ff857fff9df7905b299647176e16c2a606ff65fa479ba9cad61acbee3123c                                             0.0s
 => CACHED [2/5] WORKDIR /python-docker                                                                                                                                      0.0s
 => CACHED [3/5] COPY copilot_proxy/requirements.txt requirements.txt                                                                                                        0.0s
[+] Building 2.3s (7/7) FINISHED                                                                                                                                                  
 => [internal] load build definition from Dockerfile                                                                                                                         0.0s
 => => transferring dockerfile: 32B                                                                                                                                          0.0s
 => [internal] load .dockerignore                                                                                                                                            0.0s
 => => transferring context: 35B                                                                                                                                             0.0s
 => [internal] load metadata for docker.io/moyix/triton_with_ft:22.09                                                                                                        2.1s
 => [1/3] FROM docker.io/moyix/triton_with_ft:22.09@sha256:5a15c1f29c6b018967b49c588eb0ea67acbf897abb7f26e509ec21844574c9b1                                                  0.0s
 => CACHED [2/3] RUN python3 -m pip install --disable-pip-version-check -U torch --extra-index-url https://download.pytorch.org/whl/cu116                                    0.0s
 => CACHED [3/3] RUN python3 -m pip install --disable-pip-version-check -U transformers bitsandbytes accelerate                                                              0.0s
 => exporting to image                                                                                                                                                       0.0s
 => => exporting layers                                                                                                                                                      0.0s
 => => writing image sha256:1d22eab54aab4755ffffeb7627dcb8041ebc2be321cb3865d574ec9fb346321b                                                                                 0.0s
 => => naming to docker.io/library/fauxpilot-triton                                                                                                                          0.0s
Config complete, do you want to run FauxPilot? [y/n] 
[+] Running 2/2
 ⠿ Container fauxpilot-copilot_proxy-1  Recreated                                                                                                                            0.4s
 ⠿ Container fauxpilot-triton-1         Recreated                                                                                                                            0.1s
Attaching to fauxpilot-copilot_proxy-1, fauxpilot-triton-1
Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown
[+] Running 1/0
 ⠿ Container fauxpilot-copilot_proxy-1  Running                                                                                                                              0.0s
Attaching to fauxpilot-copilot_proxy-1, fauxpilot-triton-1
Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown

from fauxpilot.

romainr avatar romainr commented on April 25, 2024

About ems

NotImplementedError: The operator 'aten::cumsum.out' is not currently implemented for the MPS device. 
If you want this op to be added in priority during the prototype phase of this feature, please comment on 
https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable 
`PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be 
slower than running natively on MPS.

from fauxpilot.

GergelyH avatar GergelyH commented on April 25, 2024

Any updates as of now?

from fauxpilot.

qidian99 avatar qidian99 commented on April 25, 2024

Any updates as of now?

from fauxpilot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.