Comments (6)
I won't be able to help you with Modal, but I'll just say that our goal is ultimately to have custom CUDA kernels that outperform cuDNN. So you should, ideally, be able to just run our code without external dependencies.
One can dream. :)
from llm.c.
@vyom1611 Hi, I try to run the demo
GPU_MEM=80 modal run benchmark_on_modal.py \
--compile-command "nvcc -O3 --use_fast_math attention_forward.cu -o attention_forward -lcublas" \
--run-command "./attention_forward 1
, but failed. Could you please give me some suggestions? Did I miss some steps? Thank you very much!
from llm.c.
Hi, try running the compile command with the -lcublast option.
from llm.c.
@vyom1611 Thank you very much! It worked after adding the -lcublasLt option.
nvcc -O3 --use_fast_math attention_forward.cu -o attention_forward -lcublas -lcublasLt
Have you considered adding support for "ncu"? I tried, but encountered an error and no profile file was generated.
from llm.c.
using ncu is weird, because it likes to profile kernels very deeply. There is a linux_kernel_paranoid level if too high, then nsys cannot profile cpu and os parts during profiling, and for ncu: it seems impossible to fix this on modal containers: ERR_NVGPUCTRPERM The user running <tool_name/application_name> does not have permission to access NVIDIA GPU Performance Counters on the target device.
And to fix it you have to:
Enable access permanently
- To allow access for any user: create a file with the .conf extension containing options nvidia NVreg_RestrictProfilingToAdminUsers=0 in /etc/modprobe.d.
- To restrict access to admin users (CAP_SYS_ADMIN capability set), create a file with the .conf extension containing options nvidia NVreg_RestrictProfilingToAdminUsers=1 in /etc/modprobe.d.
which seems impossible on modal containers since you need root privilege to edit modprobe.d and create .conf files in /etc/.
and even if you managed to change it, then you have to reload modrobe using rmmod nvidia
and update-initramfs -u
which again needs sudo access, and reboot the system,
So currently we cannot run ncu on modal.
This is the link for reference: https://developer.nvidia.com/nvidia-development-tools-solutions-err_nvgpuctrperm-permission-issue-performance-counters
from llm.c.
using ncu is weird, because it likes to profile kernels very deeply. There is a linux_kernel_paranoid level if too high, then nsys cannot profile cpu and os parts during profiling, and for ncu: it seems impossible to fix this on modal containers:
ERR_NVGPUCTRPERM The user running <tool_name/application_name> does not have permission to access NVIDIA GPU Performance Counters on the target device.
And to fix it you have to:
Enable access permanently
- To allow access for any user: create a file with the .conf extension containing options nvidia NVreg_RestrictProfilingToAdminUsers=0 in /etc/modprobe.d.
- To restrict access to admin users (CAP_SYS_ADMIN capability set), create a file with the .conf extension containing options nvidia NVreg_RestrictProfilingToAdminUsers=1 in /etc/modprobe.d.
which seems impossible on modal containers since you need root privilege to edit modprobe.d and create .conf files in /etc/.
and even if you managed to change it, then you have to reload modrobe using
rmmod nvidia
andupdate-initramfs -u
which again needs sudo access, and reboot the system,So currently we cannot run ncu on modal.
This is the link for reference: https://developer.nvidia.com/nvidia-development-tools-solutions-err_nvgpuctrperm-permission-issue-performance-counters
Got it! Thank you! It is a pity since modal offer some free A100 quota :(
from llm.c.
Related Issues (20)
- Mismatch of dweight at layernorm_backward.cu
- Recalculating the activations in the backwards pass to conserve memory HOT 3
- Deleting Conda/Python as a dependency entirely to dramatically decrease "latency to step" HOT 4
- python dev/data/fineweb.py --version 10B HOT 2
- BitNet (b1.58) support HOT 2
- Cudnn error cudnn_att.cpp on train_gptcu HOT 4
- Model Export & Inference HOT 3
- ERROR on the AMD GPU HOT 4
- apparent compatibility issues with earlier c++ versions after recent pushes HOT 3
- I can not understand the `cublasGemmStridedBatchedEx` call in the `attention_forward`
- LLM.c in google colab HOT 1
- Running `quick start on CPU` on Macbook Pro M2 HOT 7
- OSError: Memory mapping file failed: Cannot allocate memory HOT 1
- is max_seq_len configurable or hardcoded parameter? HOT 2
- sel4 + llm.c > path to putting these llms in any mission critical system
- Windows issue with Cuda Toolkit 12.5 and latest MSVC compiler 17.10
- Specify torch version number in requirements.txt ? HOT 2
- Pretraining (with CPUs) HOT 4
- Getting "Floating point exception (core dumped)" Error HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llm.c.