Comments (7)
@nv-kmcgill53 @mc-nv do you know if it is possible to compile the ARM version of Triton (i.e. for raspberry, jetson, ...) on a x86 machine?
from server.
It should be possible to compile, using emulation, on x86
to obtain arm64
binaries as long as all necessary dependencies are been satisfied.
Tip
Customer may try to use docker QEMU emulator for such purpose.
from server.
In the build.py script there is an option for platform architecture aarch64, but the script must run on the target platform.
You can change the script if some checks are blocking you from moving forward with the build.
compiling on the jetson (natively) will end up saturating the RAM usage and rebooting the board
We usually build it on ARM machines. If you have an Apple Silicon machine, maybe you can try it there. Otherwise, the docker QEMU emulator seems to be the way to go.
from server.
Thank you for your answers @kthui and @mc-nv ,
In the build.py script there is an option for platform architecture aarch64, but the script must run on the target platform.
However, compiling on the jetson (natively) will end up saturating the RAM usage and rebooting the board. I believe it is still under development and this is the reason why this section here about compiling for jetson devices is empty.
If you have any other suggestions simpler than docker QEMU which is not 100% sure it will works I will be happy.
from server.
Just to give you an update.
docker QEMU emulator was my only option and it did worked only for CPU after installing all the needed dependencies and it took a lot of time to compile using 24 CPUs.
However, the backends failed. The problem is that the build.py script does not provide the arm64 backend so it failed with the error of collect2: error: ld returned 1 exit status
which I believe for x86 and not for amr64/aarch64 architechture.
So now, I am trying to find a way to compile the backends for arm64/aarch64 architechture.
If you have any suggestions, I will be happy. : )
from server.
I cannot compile triton with --enable-gpu option in docker QEMU. because there is no CUDA for arm64.
Here below is my error:
CMake Error at /usr/share/cmake-3.27/Modules/FindCUDA.cmake:883 (message):
Specify CUDA_TOOLKIT_ROOT_DIR
Call Stack (most recent call first):
CMakeLists.txt:6 (find_package)
Here below is my command to compile in docker QEMU, (I am compiling without docker as docker cannot run inside docker with diferent architechture, so I cannot forward the docker.sock)
./build.py --target-platform linux --target-machine aarch64 -j 12 --enable-logging --enable-stats --enable-metrics --enable-gpu-metrics --enable-cpu-metrics --enable-tracing --enable-nvtx --enable-gpu --enable-mali-gpu --endpoint grpc -v --no-container-build --build-dir /home/qemu/build_triton_all/triton_onnxruntime/server/build --backend onnxruntime
Here below is my docker QEMU envirement:
Linux 2c1b2a62f0a8 6.5.0-18-generic #18~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb 7 11:40:03 UTC 2 aarch64 aarch64 aarch64 GNU/Linux
Another question, could you please confirm me that Triton Inference Server latest version cannot run on Nvidia Jetson Nano?
Thank you
from server.
Hello @kthui @mc-nv @dyastremsky
Actually, you cannot compile full Triton and its backends with docker QEMU.
You cannot compile it neither on target hardware like raspberry or jetson nano device.
the problem is that it needs a lot of ressources (RAM, CPU core/threads) to compile only Triton, then for the backends you need docker to compile them see here https://github.com/triton-inference-server/onnxruntime_backend/blob/0825c357a226c9e4657a24895302557a211b13d8/CMakeLists.txt#L320
So you need to forward the docker.sock into docker QEMU, but this is not going to work as docker.sock is for my host x86 machine, therefore, you cannot full compile Triton.
Could you please confirm me that? @kthui @dyastremsky @mc-nv
Thank you : )
from server.
Related Issues (20)
- Significant latency between COMPUTE_END and REQUEST_END HOT 1
- Issue on page /user_guide/response_cache.html HOT 4
- triton can provide request transmission in the form of a file stream? HOT 1
- In Triton, multiple instances of the same model load multiple copies of the model file into memory, leading to CUDA out of memory. Why can't multiple instances share the same model file? HOT 2
- Will tensorRT backend be compatible with tensorRT 9.1+ ? HOT 8
- Sidecar Container CPU Throttling when Deploying using Triton with ONNX Backend on Kubernetes HOT 2
- Missing :te header when using envoy proxy with grpc-web filter HOT 5
- Dynamic batching does not work properly with python backend HOT 1
- [400] 'MODEL' version 1 is not at ready state even if /v2/health/ready has succeeded HOT 6
- Conda Package for Inference Server HOT 4
- Incomplete LLM response HOT 3
- After load a model, Triton server suddenly not work that it shows CUDA failed to initialize. Unknown error (error 999). HOT 2
- After load a model, Triton server suddenly not work that it shows CUDA failed to initialize. Unknown error (error 999). HOT 7
- [CMake error] Building Triton on arm64 machine using build.py HOT 4
- Errors from tutorial : Deploying a vLLM model in Triton HOT 3
- c++ developer tools API - segmentation fault with multithreaded calls of AsyncInfer HOT 10
- only cpu have a error
- How to generate rawInputContents with multiple dimensions and multiple input parameters in GRPC? HOT 8
- Set cuda_memory_pool_byte_size to solve CNMEM_STATUS_OUT_OF_MEMORY HOT 3
- Exception serializing request - dealing with large input HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from server.