Hi, I discovered this project from a blog and I find it interesting. Only issue is tha

On a MacBook Pro Ventura 13.1 with the Python backend: <div class="snippet-clipboa

About ems <div class="snippet-clipboard-content n

Apple silicon support? about fauxpilot HOT 12 OPEN

moyix commented on April 25, 2024 8

Apple silicon support?

from fauxpilot.

Comments (12)

moyix commented on April 25, 2024 4

Yep, unfortunately the FasterTransformer code is very tied to NVIDIA cards (perhaps unsurprising since FasterTransformer is made by... NVIDIA).

However, there have been some really exciting improvements in low-latency inference recently via DeepSpeed and INT8 quantization that might allow us to replace the FT backend with something that works on a wider variety of hardware (with less memory usage too!) without sacrificing performance:

https://huggingface.co/blog/bloom-inference-pytorch-scripts

from fauxpilot.

eous commented on April 25, 2024 2

For what it’s worth PyTorch has support for an mps backend that one can query for and set that will drastically improve performance on apple silicon. For most things it’s as simple as setting device(“mps”)

https://pytorch.org/docs/stable/notes/mps.html

from fauxpilot.

thakkarparth007 commented on April 25, 2024 1

Looks like nvidia-docker isn't still fully supported on Apple M1 (see this: NVIDIA/nvidia-docker#101 and nathanwbrei/phasm#8 (comment)). I don't have a Mac with me currently, so can't try this out myself unfortunately.

from fauxpilot.

quirtt commented on April 25, 2024 1

Any progress on this? Or other alternatives?

from fauxpilot.

TechnologyClassroom commented on April 25, 2024

Does the new Apple hardware include NVIDIA graphics cards? If not, this repo will not work for you. For more information, see issue #4

from fauxpilot.

old-syniex commented on April 25, 2024

Does the new Apple hardware include NVIDIA graphics cards? If not, this repo will not work for you. For more information, see issue #4

Did you succeed to run it?

I tried never successfully

from fauxpilot.

dslandry commented on April 25, 2024

Does the new Apple hardware include NVIDIA graphics cards? If not, this repo will not work for you. For more information, see issue #4

Unfortunately, the new macbooks with apple silicon do not contain NVIDIA cards. Maybe we can consider using the neural engine on the M1 chips.

from fauxpilot.

moyix commented on April 25, 2024

Now that we have a Python backend, you be able to get this working?

from fauxpilot.

romainr commented on April 25, 2024

On a MacBook Pro Ventura 13.1 with the Python backend:

Choose your backend:
[1] FasterTransformer backend (faster, but limited models)
[2] Python backend (slower, but more models, and allows loading with int8)
Enter your choice [1]: 2
Models available:
[1] codegen-350M-mono (1GB total VRAM required; Python-only)
[2] codegen-350M-multi (1GB total VRAM required; multi-language)
[3] codegen-2B-mono (4GB total VRAM required; Python-only)
[4] codegen-2B-multi (4GB total VRAM required; multi-language)
Enter your choice [4]: 1

it still seems to fail about libnvidia-ml.so.1:

Attaching to fauxpilot-copilot_proxy-1, fauxpilot-triton-1
Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown

Full trace

./setup.sh                                                                                                                            Node 16.13.1
Checking for curl ...
/usr/bin/curl
Checking for zstd ...
/usr/local/bin/zstd
Checking for docker ...
/usr/local/bin/docker
Enter number of GPUs [1]: 
External port for the API [5000]: 
Address for Triton [triton]: 
Port of Triton host [8001]: 
Where do you want to save your models [/Users/romain.rigaux/projects/fauxpilot/models]? 
Choose your backend:
[1] FasterTransformer backend (faster, but limited models)
[2] Python backend (slower, but more models, and allows loading with int8)
Enter your choice [1]: 2
Models available:
[1] codegen-350M-mono (1GB total VRAM required; Python-only)
[2] codegen-350M-multi (1GB total VRAM required; multi-language)
[3] codegen-2B-mono (4GB total VRAM required; Python-only)
[4] codegen-2B-multi (4GB total VRAM required; multi-language)
Enter your choice [4]: 1
Do you want to share your huggingface cache between host and docker container? y/n [n]: 
Do you want to use int8? y/n [y]: 
Config written to /Users/romain.rigaux/projects/fauxpilot/models/py-Salesforce-codegen-350M-mono/py-model/config.pbtxt
[+] Building 0.0s (0/0)                                                                                                                                                           
[+] Building 0.1s (2/3)                                                                                                                                                           
 => [internal] load build definition from Dockerfile                                                                                                                         0.0s
[+] Building 0.3s (2/3)                                                                                                                                                           
 => [internal] load build definition from Dockerfile                                                                                                                         0.0s
 => => transferring dockerfile: 32B                                                                                                                                          0.0s
[+] Building 2.1s (10/10) FINISHED                                                                                                                                                
 => [internal] load build definition from Dockerfile                                                                                                                         0.0s
 => => transferring dockerfile: 32B                                                                                                                                          0.0s
 => [internal] load .dockerignore                                                                                                                                            0.0s
 => => transferring context: 35B                                                                                                                                             0.0s
 => [internal] load metadata for docker.io/library/python:3.10-slim-buster                                                                                                   2.0s
 => [internal] load build context                                                                                                                                            0.0s
 => => transferring context: 1.15kB                                                                                                                                          0.0s
 => [1/5] FROM docker.io/library/python:3.10-slim-buster@sha256:8c2ff857fff9df7905b299647176e16c2a606ff65fa479ba9cad61acbee3123c                                             0.0s
 => CACHED [2/5] WORKDIR /python-docker                                                                                                                                      0.0s
 => CACHED [3/5] COPY copilot_proxy/requirements.txt requirements.txt                                                                                                        0.0s
[+] Building 2.3s (7/7) FINISHED                                                                                                                                                  
 => [internal] load build definition from Dockerfile                                                                                                                         0.0s
 => => transferring dockerfile: 32B                                                                                                                                          0.0s
 => [internal] load .dockerignore                                                                                                                                            0.0s
 => => transferring context: 35B                                                                                                                                             0.0s
 => [internal] load metadata for docker.io/moyix/triton_with_ft:22.09                                                                                                        2.1s
 => [1/3] FROM docker.io/moyix/triton_with_ft:22.09@sha256:5a15c1f29c6b018967b49c588eb0ea67acbf897abb7f26e509ec21844574c9b1                                                  0.0s
 => CACHED [2/3] RUN python3 -m pip install --disable-pip-version-check -U torch --extra-index-url https://download.pytorch.org/whl/cu116                                    0.0s
 => CACHED [3/3] RUN python3 -m pip install --disable-pip-version-check -U transformers bitsandbytes accelerate                                                              0.0s
 => exporting to image                                                                                                                                                       0.0s
 => => exporting layers                                                                                                                                                      0.0s
 => => writing image sha256:1d22eab54aab4755ffffeb7627dcb8041ebc2be321cb3865d574ec9fb346321b                                                                                 0.0s
 => => naming to docker.io/library/fauxpilot-triton                                                                                                                          0.0s
Config complete, do you want to run FauxPilot? [y/n] 
[+] Running 2/2
 ⠿ Container fauxpilot-copilot_proxy-1  Recreated                                                                                                                            0.4s
 ⠿ Container fauxpilot-triton-1         Recreated                                                                                                                            0.1s
Attaching to fauxpilot-copilot_proxy-1, fauxpilot-triton-1
Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown
[+] Running 1/0
 ⠿ Container fauxpilot-copilot_proxy-1  Running                                                                                                                              0.0s
Attaching to fauxpilot-copilot_proxy-1, fauxpilot-triton-1
Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown

from fauxpilot.

romainr commented on April 25, 2024

About ems

NotImplementedError: The operator 'aten::cumsum.out' is not currently implemented for the MPS device. 
If you want this op to be added in priority during the prototype phase of this feature, please comment on 
https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable 
`PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be 
slower than running natively on MPS.

from fauxpilot.

GergelyH commented on April 25, 2024

Any updates as of now?

from fauxpilot.

qidian99 commented on April 25, 2024

Any updates as of now?

from fauxpilot.

Apple silicon support? about fauxpilot HOT 12 OPEN

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent