Comments (9)
Given the effort that you put in to creating a minimal reproducible example, I would suggest that you contact support for this system. Generally, on a large system, the support team would be thrilled to see a ~10 line reproducer. Please let us know what they say (or if that path is not fruitful).
from cuquantum.
Hi @danlkv ... you're using Polaris, right? I believe this system runs SS11 with a special CUDA-aware MPICH. Referencing this.
I'd make sure that the correct MPI implementation is being loaded by mpi4py.
I'm making a few assumptions about which compute resource you're using, but keep us looped in with what their support teams say.
from cuquantum.
Thanks for getting back to me. Yes, I'm using a single node of Polaris for these tests. I did verify that the data after allgather is correct. Does ipcOpenMemHandle
use MPI? I thought this was a custom Nvidia p2p communication protocol.
from cuquantum.
@danlkv No, ipcOpenMemHandle does not rely on MPI. Can you build and run the standard CUDA IPC sample on Polaris? https://github.com/NVIDIA/cuda-samples/tree/master/Samples/0_Introduction/simpleIPC
from cuquantum.
Thanks @DmitryLyakh
@danlkv there are two things happening:
- You're trying to create IPC handles.
- You're passing the IPC handles to MPI.
(2) would only work if the MPI implementation loaded by mpi4py is CUDA aware (unless data is being moved to the host prior to the actual allgather
collective).
Regarding (1), you'd need to confirm that all GPUs are actually peer accessible, and that there isn't an issue with your cupy installation.
from cuquantum.
@DmitryLyakh I can build the sample using cudatoolkit 11.8, but not smaller versions.
nvcc fatal : Unsupported gpu architecture 'compute_89'
I did not previously use the 11.8 cuda, since I'm not sure it is compatible with the device (Driver cuda version 11.4). I checked compute capability with deviceQuery
example compiled with 11.8 cuda:
-> % ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA A100 80GB PCIe"
CUDA Driver Version / Runtime Version 11.4 / 11.8
CUDA Capability Major/Minor version number: 8.0
Total amount of global memory: 80995 MBytes (84929216512 bytes)
[...]
Does 8.0 means that it will not be compatible with 8.9? Should I try to run everything with 11.8 or try to change the Makefile so it uses a smaller compute capability?
P.S. samples make
doesn't look in PATH
for nvcc, so I had to export CUDA_PATH
to make it work.
from cuquantum.
If you run on A100 GPU, you only need compute capability 8.0, and you can use CUDA-11.8. You do not need compute capability 89 in this case (you can just remove those flags). I would recommend trying with CUDA-11.8 regardless.
from cuquantum.
Update: the problem just magically disappeared and everything now works on all cuda versions. I now use 11.4 and checked it up to 64 nodes. I guess there was some problem on ALCF side.
from cuquantum.
Update: the problem just magically disappeared and everything now works on all cuda versions. I now use 11.4 and checked it up to 64 nodes. I guess there was some problem on ALCF side.
Thanks for keeping us posted.
from cuquantum.
Related Issues (20)
- Multithreaded cutn optimization issue HOT 1
- Disable slicing fails
- Releasing `qsim_mgpu` source on GitHub instead of only binaries the Docker container HOT 2
- cuQuantum MPS Simulator vs Qiskit Aer HOT 3
- gpu issue on qiskit - aer method , docker - cuquantum-appliance:23.10 HOT 1
- Website is down HOT 3
- Functions for arithmetic operations of two tensor networks HOT 3
- `cudaq` never giving correct result for `maxcut` QAOA problem. HOT 3
- `state_compute()` leading to kernel dying.
- Wrong sign in a single-gate-circuit statevector? HOT 3
- Sudo permission issue for cuquantum-appliance:23.10 container HOT 7
- 3XTF32 issue with the most recent cuquantum HOT 1
- Can't run pennyLane benchmarks in 23.10 cuQuantum Appliance HOT 8
- The output of apply_matrix_batched in cuQuantum Python is same as input HOT 4
- Small typo in README
- [Feature] [Unprioritized] CircuitToEinsum: batched expectation values
- [Feature] [Unprioritized] Rust Language Support HOT 1
- [Question] Issues building cuquantum-python from source
- Request for releasing a new version of cuQuantum Appliance HOT 4
- `CircuitToEinsum` fails for some qiskit `QuantumCircuit` HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cuquantum.