nvidia / cuquantum Goto Github PK

View Code? Open in Web Editor NEW

310.0 20.0 63.0 4.15 MB

Home for cuQuantum Python & NVIDIA cuQuantum SDK C++ samples

Home Page: https://docs.nvidia.com/cuda/cuquantum/

License: BSD 3-Clause "New" or "Revised" License

Python 19.41% Cython 13.89% Jupyter Notebook 66.52% C 0.17% Shell 0.01%

nvidia cuda quantum-computing cuquantum custatevec cutensornet

cuquantum's Introduction

Welcome to the cuQuantum repository!

This public repository contains a few sets of files related to the NVIDIA cuQuantum SDK:

benchmarks: NVIDIA cuQuantum Performance Benchmark Suite (v0.3.0), see README for detail.
extra: Files to help utilize the cuQuantum SDK and the cuQuantum Appliance container.
python: The open-sourced cuQuantum Python project.
- Available for download on
  - conda-forge:
    - cuquantum
      - custatevec
      - cutensornet
    - cuquantum-python
  - PyPI:
    - cuquantum
      - cuquantum-cu11
        
        custatevec-cu11
        
        cutensornet-cu11
      - cuquantum-cu12
        
        custatevec-cu12
        
        cutensornet-cu12
    - cuquantum-python
      - cuquantum-python-cu11
      - cuquantum-python-cu12
samples: All C/C++ sample codes for the cuQuantum SDK.

Installation

The instructions for how to build and install these files are given in both the subfolders and the cuQuantum documentation.

License

All files hosted in this repository are subject to the BSD-3-Clause license.

Citing cuQuantum

This repository is uploaded to Zenodo automatically. Click the badge below to see citation formats.

cuquantum's People

Contributors

Stargazers

Watchers

Forkers

leofang emildi awennersteen rickyhong zamorays faraimazh chaoxianhu kuangllbnu mtjrider nakatamaho stjordanis sam-stanwyck laplacekorea python-repository-hub drmaruyama manuelmorgado khlaifiabilel samkenxstream inarikami qweszxc7410 orsa-classiq 1tnguyen justcherie fosstheory thessal kyrie-zhao 5l1v3r1 shyamalschandra yiming-physics jracevedob ssgantayat smtsjhr oieieio 00mjk zyzhang1992 scq-cloud chinasoul tlubowe aaelsharkawy ares201005 gopal-dahale tvanh512 marziovallero cliffburdick citibankdemobusiness jobs-git variationalmonke veenaiyuri higher-level-systems 01barryq degerli nortagemliahlla fieldofnodes hmotea antikytheraltd thomascherickal dlyongemallo kangle2017 schroedingerdog awasmcloud genericp3rson j-c-q

cuquantum's Issues

Disable slicing fails

Hi, when I want to see the contraction cost without slicing (even though it cannot fit on a single GPU), I am told to use:

optimizer_options = configuration.OptimizerOptions(samples=64, threads=8, slicing=SlicerOptions(disable_slicing=True))

This results in:

File "cuquantum/cutensornet/cutensornet.pyx", line 1329, in cuquantum.cutensornet.cutensornet.contraction_optimize
  File "cuquantum/cutensornet/cutensornet.pyx", line 240, in cuquantum.cutensornet.cutensornet.check_status
cuquantum.cutensornet.cutensornet.cuTensorNetError: CUTENSORNET_STATUS_ALL_HYPER_SAMPLES_FAILED

It works if I remove the slicing argument input, but then slices of course.

gpu issue on qiskit - aer method , docker - cuquantum-appliance:23.10

Hi,
I'm facing an issue with my AER-State Vector simulation.

I'm using the docker image <nvcr.io/nvidia/cuquantum-appliance:23.10>
and run this file [https://github.com/qiskit-community/qiskit-community-tutorials/blob/master/aer/qv_cuStateVec.ipynb]

If I use < sim = AerSimulator(method='statevector', device='GPU') >, I can see the utilization of GPUs.
On the other hand, if I modify <sim = Aer.get_backend('statevector_simulator', device='GPU')> it points to cpu, NOT the GPUs.

Any idea, why is it?
@sam-stanwyck

Can't run pennyLane benchmarks in 23.10 cuQuantum Appliance

The issue was also present in the 23.06 container as well.

Example to reproduce error:

cuquantum-benchmarks circuit --frontend pennylane --backend pennylane-lightning-gpu --benchmark qaoa --nqubits 16

Source of error:
Line 46
https://github.com/NVIDIA/cuQuantum/blob/main/benchmarks/cuquantum_benchmarks/backends/backend_pny.py

Full trace:

Traceback (most recent call last):
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/backends/backend_pny.py", line 46, in find_version
    import pennylane_lightning_gpu
ModuleNotFoundError: No module named 'pennylane_lightning_gpu'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/cuquantum/conda/envs/cuquantum-23.10/bin/cuquantum-benchmarks", line 8, in <module>
    sys.exit(run())
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/run.py", line 335, in run
    runner.run()
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/run_interface.py", line 90, in run
    self._run()
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/run_interface.py", line 267, in _run
    backend = createBackend(
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/backends/__init__.py", line 34, in createBackend
    return backends[backend_name](ngpus, ncpu_threads, precision, *args, **kwargs)
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/backends/backend_pny.py", line 40, in __init__
    self.version = self.find_version(identifier) 
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/backends/backend_pny.py", line 48, in find_version
    raise RuntimeError("PennyLane-Lightning-GPU plugin is not installed") from e
RuntimeError: PennyLane-Lightning-GPU plugin is not installed

There is a similar error present when trying to run the kokkos backend as well.

The libraries are present when doing conda list but not when importing in a script (or with interactive Python).

conda list

pennylane                 0.35.1                   pypi_0    pypi
pennylane-lightning       0.35.1                   pypi_0    pypi
pennylane-lightning-gpu   0.35.1                   pypi_0    pypi

Here is the docker file I used for building the image:

FROM  nvcr.io/nvidia/cuquantum-appliance:23.10
RUN git clone https://github.com/NVIDIA/cuQuantum.git \
&& cd cuQuantum/benchmarks \
&& pip install .[all]

(Also had the same error using docker commit route).
I also tried rebuilding cuquantum-benchmarks to fix the problem. This didn’t help.

Suspected solution:

Maybe the import statements are wrong. From a brief look at the pennyLane documentation I found a different method for importing.

import pennylane as qml
dev = qml.device("lightning.gpu", wires=2)

This at least throws me a cuda error:

/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/pennylane_lightning/lightning_gpu/lightning_gpu.py:72: UserWarning: libcudart.so.12: cannot open shared object file: No such file or directory
  warn(str(e), UserWarning)
/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/pennylane_lightning/lightning_gpu/lightning_gpu.py:1014: UserWarning: 
                "Pre-compiled binaries for lightning.gpu are not available. Falling back to "
                "using the Python-based default.qubit implementation. To manually compile from "
                "source, follow the instructions at "
                "https://pennylane-lightning.readthedocs.io/en/latest/installation.html.",
            
  warn(

This is being solved here:
https://discuss.pennylane.ai/t/pennylane-lightning-gpu-0-35-on-cuquantum-appliance-23-10/4393

Related bug:

I also noticed the CPU backend does not work in the 23.10 container either (This ran for me in the 23.06 container):

cuquantum-benchmarks circuit --frontend pennylane --backend pennylane --benchmark qaoa --nqubits 16

024-04-05 13:04:25,345 INFO     * Running qaoa with 1 CPU threads, and 16 qubits [pennylane-v0.35.1 | pennylane-v0.35.1]:
Traceback (most recent call last):
  File "/home/cuquantum/conda/envs/cuquantum-23.10/bin/cuquantum-benchmarks", line 8, in <module>
    sys.exit(run())
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/run.py", line 335, in run
    runner.run()
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/run_interface.py", line 90, in run
    self._run()
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/run_interface.py", line 301, in _run
    preprocess_data = backend.preprocess_circuit(
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/backends/backend_pny.py", line 114, in preprocess_circuit
    self.circuit = self._make_qnode(circuit, nshots, **kwargs)
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/backends/backend_pny.py", line 101, in _make_qnode
    dev = pennylane.device("default.qubit", wires=self.nqubits, shots=nshots, c_dtype=self.dtype)
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/pennylane/__init__.py", line 378, in device
    dev = plugin_device_class(*args, **options)
TypeError: DefaultQubit.__init__() got an unexpected keyword argument 'c_dtype'

`CircuitToEinsum` fails for some qiskit `QuantumCircuit`

Informations

Python 3.10.12 (Google Colab)
qiskit 0.45.0
qiskit-terra 0.45.0
cuquantum-python 23.10.0
cuquantum-python-cu11 23.10.0

What is the current behavior?

Recently, qiskit 0.45.0 was released. According to its release note:

Starting in this release, all unparametrized gates in the Qiskit standard circuit library are now singletons.

This changed the type of CXGate from <class 'qiskit.circuit.library.standard_gates.x.CXGate'> to <class '_SingletonCXGate'>.

So, get_decomposed_gates of circuit_parser_utils_qiskit.py fails if CXGate is used since the condition of #L42 is now False: if 'standard_gate' in str(type(operation)) or isinstance(operation, UnitaryGate):

Steps to reproduce the problem

Executing snippets below leads to AttributeError: 'NoneType' object has no attribute 'qubits' at #L56.

from qiskit import QuantumCircuit
from cuquantum import CircuitToEinsum


qc = QuantumCircuit(2)
qc.cx(0, 1)
converter = CircuitToEinsum(qc)

Jupyter notebooks in the NVIDIA cuQuantum Appliance

I pulled the image from here and started a container.

In the following directory: root@c976cba7949e:/workspace/examples# I created a test.ipynb file.

Multiple kernels are available to me in here as shown in the image below:

Why is the container shipped with multiple versions of python?

When I try run the code snippet from here), with the conda env shown in the image above, it asks me to install ipykernel which I do as shown below:


conda install -n base ipykernel --update-deps --force-reinstall
Collecting package metadata (current_repodata.json): done
Solving environment: \ 
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - conda-forge/linux-64::sqlite==3.35.3=h74cdb3f_0
  - conda-forge/linux-64::cutensornet==2.0.0=mpi_openmpi_h80f0c51_0
  - conda-forge/linux-64::cupy==11.3.0=py38h405e1b6_1
  - conda-forge/linux-64::numpy==1.23.5=py38h7042d01_0
  - conda-forge/linux-64::custatevec==1.2.0=h0800d71_0
  - conda-forge/linux-64::cutensor==1.6.1.5=h12f7317_0
  - conda-forge/linux-64::cuquantum==22.11.0.13=h2b087ed_0
  - conda-forge/linux-64::fastrlock==0.8=py38hfa26641_3
  - conda-forge/linux-64::cuquantum-python==22.11.0=py38hca921df_0
- Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: - 
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - conda-forge/linux-64::cutensornet==2.0.0=mpi_openmpi_h80f0c51_0
  - conda-forge/linux-64::cupy==11.3.0=py38h405e1b6_1
  - conda-forge/linux-64::numpy==1.23.5=py38h7042d01_0
  - conda-forge/linux-64::custatevec==1.2.0=h0800d71_0
  - conda-forge/linux-64::cutensor==1.6.1.5=h12f7317_0
  - conda-forge/linux-64::cuquantum==22.11.0.13=h2b087ed_0
  - conda-forge/linux-64::fastrlock==0.8=py38hfa26641_3
  - conda-forge/linux-64::cuquantum-python==22.11.0=py38hca921df_0
| Solving environment: done
done

## Package Plan ##

  environment location: /opt/conda

  added / updated specs:
    - _libgcc_mutex
    - _openmp_mutex
    - asttokens
    - backcall
    - backports
    - backports.functools_lru_cache
    - bzip2
    - ca-certificates
    - comm
    - debugpy
    - decorator
    - entrypoints
    - executing
    - ipykernel
    - ipython
    - jedi
    - jupyter_client
    - jupyter_core
    - ld_impl_linux-64
    - libffi
    - libgcc-ng
    - libgomp
    - libnsl
    - libsodium
    - libsqlite
    - libstdcxx-ng
    - libuuid
    - libzlib
    - matplotlib-inline
    - ncurses
    - nest-asyncio
    - openssl
    - packaging
    - parso
    - pexpect
    - pickleshare
    - pip
    - platformdirs
    - prompt-toolkit
    - psutil
    - ptyprocess
    - pure_eval
    - pygments
    - python-dateutil
    - python=3.8
    - python_abi
    - pyzmq
    - readline
    - setuptools
    - six
    - stack_data
    - tk
    - tornado
    - traitlets
    - typing-extensions
    - typing_extensions
    - wcwidth
    - wheel
    - xz
    - zeromq


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    _libgcc_mutex-0.1          |      conda_forge           3 KB  conda-forge
    _openmp_mutex-4.5          |            2_gnu          23 KB  conda-forge
    asttokens-2.2.1            |     pyhd8ed1ab_0          27 KB  conda-forge
    backcall-0.2.0             |     pyh9f0ad1d_0          13 KB  conda-forge
    backports-1.0              |     pyhd8ed1ab_3           6 KB  conda-forge
    backports.functools_lru_cache-1.6.4|     pyhd8ed1ab_0           9 KB  conda-forge
    brotlipy-0.7.0             |py38h0a891b7_1005         342 KB  conda-forge
    bzip2-1.0.8                |       h7f98852_4         484 KB  conda-forge
    ca-certificates-2022.12.7  |       ha878542_0         143 KB  conda-forge
    certifi-2022.12.7          |     pyhd8ed1ab_0         147 KB  conda-forge
    cffi-1.15.1                |   py38h4a40e3a_3         230 KB  conda-forge
    charset-normalizer-2.1.1   |     pyhd8ed1ab_0          36 KB  conda-forge
    colorama-0.4.6             |     pyhd8ed1ab_0          25 KB  conda-forge
    comm-0.1.2                 |     pyhd8ed1ab_0          11 KB  conda-forge
    conda-22.11.1              |   py38h578d9bd_1         905 KB  conda-forge
    conda-package-handling-2.0.2|     pyh38be061_0         247 KB  conda-forge
    conda-package-streaming-0.7.0|     pyhd8ed1ab_1          17 KB  conda-forge
    cryptography-39.0.0        |   py38h3d167d9_0         1.4 MB  conda-forge
    cudatoolkit-11.8.0         |      h37601d7_11       635.9 MB  conda-forge
    debugpy-1.6.5              |   py38h8dc9893_0         1.8 MB  conda-forge
    decorator-5.1.1            |     pyhd8ed1ab_0          12 KB  conda-forge
    entrypoints-0.4            |     pyhd8ed1ab_0           9 KB  conda-forge
    executing-1.2.0            |     pyhd8ed1ab_0          24 KB  conda-forge
    idna-3.4                   |     pyhd8ed1ab_0          55 KB  conda-forge
    ipykernel-6.20.2           |     pyh210e3f2_0         108 KB  conda-forge
    ipython-8.8.0              |     pyh41d4057_0         555 KB  conda-forge
    jedi-0.18.2                |     pyhd8ed1ab_0         786 KB  conda-forge
    jupyter_client-7.4.9       |     pyhd8ed1ab_0          97 KB  conda-forge
    jupyter_core-5.1.3         |   py38h578d9bd_0          87 KB  conda-forge
    ld_impl_linux-64-2.39      |       hcc3a1bd_1         675 KB  conda-forge
    libffi-3.4.2               |       h7f98852_5          57 KB  conda-forge
    libgcc-ng-12.2.0           |      h65d4601_19         931 KB  conda-forge
    libgomp-12.2.0             |      h65d4601_19         455 KB  conda-forge
    libnsl-2.0.0               |       h7f98852_0          31 KB  conda-forge
    libsodium-1.0.18           |       h36c2ea0_1         366 KB  conda-forge
    libsqlite-3.40.0           |       h753d276_0         791 KB  conda-forge
    libstdcxx-ng-12.2.0        |      h46fd767_19         4.3 MB  conda-forge
    libuuid-2.32.1             |    h7f98852_1000          28 KB  conda-forge
    libzlib-1.2.13             |       h166bdaf_4          64 KB  conda-forge
    matplotlib-inline-0.1.6    |     pyhd8ed1ab_0          12 KB  conda-forge
    mpi-1.0                    |          openmpi           4 KB  conda-forge
    ncurses-6.3                |       h27087fc_1        1002 KB  conda-forge
    nest-asyncio-1.5.6         |     pyhd8ed1ab_0          10 KB  conda-forge
    openmpi-4.1.4              |     ha1ae619_102         3.6 MB  conda-forge
    openssl-3.0.7              |       h0b41bf4_1         2.5 MB  conda-forge
    packaging-23.0             |     pyhd8ed1ab_0          40 KB  conda-forge
    parso-0.8.3                |     pyhd8ed1ab_0          69 KB  conda-forge
    pexpect-4.8.0              |     pyh1a96a4e_2          48 KB  conda-forge
    pickleshare-0.7.5          |          py_1003           9 KB  conda-forge
    pip-22.3.1                 |     pyhd8ed1ab_0         1.5 MB  conda-forge
    platformdirs-2.6.2         |     pyhd8ed1ab_0          17 KB  conda-forge
    pluggy-1.0.0               |     pyhd8ed1ab_5          16 KB  conda-forge
    prompt-toolkit-3.0.36      |     pyha770c72_0         265 KB  conda-forge
    psutil-5.9.4               |   py38h0a891b7_0         348 KB  conda-forge
    ptyprocess-0.7.0           |     pyhd3deb0d_0          16 KB  conda-forge
    pure_eval-0.2.2            |     pyhd8ed1ab_0          14 KB  conda-forge
    pycosat-0.6.4              |   py38h0a891b7_1         108 KB  conda-forge
    pycparser-2.21             |     pyhd8ed1ab_0         100 KB  conda-forge
    pygments-2.14.0            |     pyhd8ed1ab_0         805 KB  conda-forge
    pyopenssl-23.0.0           |     pyhd8ed1ab_0         124 KB  conda-forge
    pysocks-1.7.1              |     pyha2e5f31_6          19 KB  conda-forge
    python-3.8.15              |h4a9ceb5_0_cpython        19.9 MB  conda-forge
    python-dateutil-2.8.2      |     pyhd8ed1ab_0         240 KB  conda-forge
    python_abi-3.8             |           3_cp38           6 KB  conda-forge
    pyzmq-25.0.0               |   py38he24dcef_0         431 KB  conda-forge
    readline-8.1.2             |       h0f457ee_0         291 KB  conda-forge
    requests-2.28.2            |     pyhd8ed1ab_0          55 KB  conda-forge
    ruamel.yaml-0.17.21        |   py38h0a891b7_2         172 KB  conda-forge
    ruamel.yaml.clib-0.2.7     |   py38h1de0b5d_1         143 KB  conda-forge
    setuptools-66.1.1          |     pyhd8ed1ab_0         630 KB  conda-forge
    six-1.16.0                 |     pyh6c4a22f_0          14 KB  conda-forge
    stack_data-0.6.2           |     pyhd8ed1ab_0          26 KB  conda-forge
    tk-8.6.12                  |       h27826a3_0         3.3 MB  conda-forge
    tornado-6.2                |   py38h0a891b7_1         654 KB  conda-forge
    tqdm-4.64.1                |     pyhd8ed1ab_0          82 KB  conda-forge
    traitlets-5.8.1            |     pyhd8ed1ab_0          96 KB  conda-forge
    typing-extensions-4.4.0    |       hd8ed1ab_0           8 KB  conda-forge
    typing_extensions-4.4.0    |     pyha770c72_0          29 KB  conda-forge
    urllib3-1.26.14            |     pyhd8ed1ab_0         110 KB  conda-forge
    wcwidth-0.2.6              |     pyhd8ed1ab_0          28 KB  conda-forge
    wheel-0.38.4               |     pyhd8ed1ab_0          32 KB  conda-forge
    xz-5.2.6                   |       h166bdaf_0         409 KB  conda-forge
    zeromq-4.3.4               |       h9c3ff4c_1         351 KB  conda-forge
    zstandard-0.19.0           |   py38h0a891b7_0         671 KB  conda-forge
    ------------------------------------------------------------
                                           Total:       689.1 MB

The following NEW packages will be INSTALLED:

  asttokens          conda-forge/noarch::asttokens-2.2.1-pyhd8ed1ab_0 None
  backcall           conda-forge/noarch::backcall-0.2.0-pyh9f0ad1d_0 None
  backports          conda-forge/noarch::backports-1.0-pyhd8ed1ab_3 None
  backports.functoo~ conda-forge/noarch::backports.functools_lru_cache-1.6.4-pyhd8ed1ab_0 None
  charset-normalizer conda-forge/noarch::charset-normalizer-2.1.1-pyhd8ed1ab_0 None
  colorama           conda-forge/noarch::colorama-0.4.6-pyhd8ed1ab_0 None
  comm               conda-forge/noarch::comm-0.1.2-pyhd8ed1ab_0 None
  conda-package-str~ conda-forge/noarch::conda-package-streaming-0.7.0-pyhd8ed1ab_1 None
  cudatoolkit        conda-forge/linux-64::cudatoolkit-11.8.0-h37601d7_11 None
  debugpy            conda-forge/linux-64::debugpy-1.6.5-py38h8dc9893_0 None
  decorator          conda-forge/noarch::decorator-5.1.1-pyhd8ed1ab_0 None
  entrypoints        conda-forge/noarch::entrypoints-0.4-pyhd8ed1ab_0 None
  executing          conda-forge/noarch::executing-1.2.0-pyhd8ed1ab_0 None
  ipykernel          conda-forge/noarch::ipykernel-6.20.2-pyh210e3f2_0 None
  ipython            conda-forge/noarch::ipython-8.8.0-pyh41d4057_0 None
  jedi               conda-forge/noarch::jedi-0.18.2-pyhd8ed1ab_0 None
  jupyter_client     conda-forge/noarch::jupyter_client-7.4.9-pyhd8ed1ab_0 None
  jupyter_core       conda-forge/linux-64::jupyter_core-5.1.3-py38h578d9bd_0 None
  libsodium          conda-forge/linux-64::libsodium-1.0.18-h36c2ea0_1 None
  matplotlib-inline  conda-forge/noarch::matplotlib-inline-0.1.6-pyhd8ed1ab_0 None
  mpi                conda-forge/linux-64::mpi-1.0-openmpi None
  nest-asyncio       conda-forge/noarch::nest-asyncio-1.5.6-pyhd8ed1ab_0 None
  openmpi            conda-forge/linux-64::openmpi-4.1.4-ha1ae619_102 None
  packaging          conda-forge/noarch::packaging-23.0-pyhd8ed1ab_0 None
  parso              conda-forge/noarch::parso-0.8.3-pyhd8ed1ab_0 None
  pexpect            conda-forge/noarch::pexpect-4.8.0-pyh1a96a4e_2 None
  pickleshare        conda-forge/noarch::pickleshare-0.7.5-py_1003 None
  platformdirs       conda-forge/noarch::platformdirs-2.6.2-pyhd8ed1ab_0 None
  pluggy             conda-forge/noarch::pluggy-1.0.0-pyhd8ed1ab_5 None
  prompt-toolkit     conda-forge/noarch::prompt-toolkit-3.0.36-pyha770c72_0 None
  psutil             conda-forge/linux-64::psutil-5.9.4-py38h0a891b7_0 None
  ptyprocess         conda-forge/noarch::ptyprocess-0.7.0-pyhd3deb0d_0 None
  pure_eval          conda-forge/noarch::pure_eval-0.2.2-pyhd8ed1ab_0 None
  pygments           conda-forge/noarch::pygments-2.14.0-pyhd8ed1ab_0 None
  python-dateutil    conda-forge/noarch::python-dateutil-2.8.2-pyhd8ed1ab_0 None
  pyzmq              conda-forge/linux-64::pyzmq-25.0.0-py38he24dcef_0 None
  ruamel.yaml        conda-forge/linux-64::ruamel.yaml-0.17.21-py38h0a891b7_2 None
  ruamel.yaml.clib   conda-forge/linux-64::ruamel.yaml.clib-0.2.7-py38h1de0b5d_1 None
  stack_data         conda-forge/noarch::stack_data-0.6.2-pyhd8ed1ab_0 None
  tornado            conda-forge/linux-64::tornado-6.2-py38h0a891b7_1 None
  traitlets          conda-forge/noarch::traitlets-5.8.1-pyhd8ed1ab_0 None
  typing-extensions  conda-forge/noarch::typing-extensions-4.4.0-hd8ed1ab_0 None
  typing_extensions  conda-forge/noarch::typing_extensions-4.4.0-pyha770c72_0 None
  wcwidth            conda-forge/noarch::wcwidth-0.2.6-pyhd8ed1ab_0 None
  zeromq             conda-forge/linux-64::zeromq-4.3.4-h9c3ff4c_1 None
  zstandard          conda-forge/linux-64::zstandard-0.19.0-py38h0a891b7_0 None

The following packages will be UPDATED:

  _openmp_mutex                                   4.5-1_gnu --> 4.5-2_gnu None
  brotlipy                          0.7.0-py38h497a2fe_1001 --> 0.7.0-py38h0a891b7_1005 None
  ca-certificates                      2022.9.24-ha878542_0 --> 2022.12.7-ha878542_0 None
  certifi                            2022.9.24-pyhd8ed1ab_0 --> 2022.12.7-pyhd8ed1ab_0 None
  cffi                                1.15.1-py38h4a40e3a_2 --> 1.15.1-py38h4a40e3a_3 None
  conda                               22.9.0-py38h578d9bd_2 --> 22.11.1-py38h578d9bd_1 None
  conda-package-han~ conda-forge/linux-64::conda-package-h~ --> conda-forge/noarch::conda-package-handling-2.0.2-pyh38be061_0 None
  cryptography                         3.4.7-py38ha5dfef3_0 --> 39.0.0-py38h3d167d9_0 None
  idna                                    2.10-pyh9f0ad1d_0 --> 3.4-pyhd8ed1ab_0 None
  libstdcxx-ng                            9.3.0-h6de172a_18 --> 12.2.0-h46fd767_19 None
  openssl                                 1.1.1s-h0b41bf4_1 --> 3.0.7-h0b41bf4_1 None
  pip                                   21.2.4-pyhd8ed1ab_0 --> 22.3.1-pyhd8ed1ab_0 None
  pycosat                           0.6.3-py38h497a2fe_1006 --> 0.6.4-py38h0a891b7_1 None
  pycparser                               2.20-pyh9f0ad1d_2 --> 2.21-pyhd8ed1ab_0 None
  pyopenssl                             20.0.1-pyhd8ed1ab_0 --> 23.0.0-pyhd8ed1ab_0 None
  pysocks            conda-forge/linux-64::pysocks-1.7.1-p~ --> conda-forge/noarch::pysocks-1.7.1-pyha2e5f31_6 None
  python_abi                                     3.8-1_cp38 --> 3.8-3_cp38 None
  requests                              2.25.1-pyhd3deb0d_0 --> 2.28.2-pyhd8ed1ab_0 None
  setuptools         conda-forge/linux-64::setuptools-49.6~ --> conda-forge/noarch::setuptools-66.1.1-pyhd8ed1ab_0 None
  six                                   1.15.0-pyh9f0ad1d_0 --> 1.16.0-pyh6c4a22f_0 None
  tqdm                                  4.59.0-pyhd8ed1ab_0 --> 4.64.1-pyhd8ed1ab_0 None
  urllib3                               1.26.4-pyhd8ed1ab_0 --> 1.26.14-pyhd8ed1ab_0 None
  wheel                                 0.36.2-pyhd3deb0d_0 --> 0.38.4-pyhd8ed1ab_0 None

The following packages will be DOWNGRADED:

  python                          3.8.15-h257c98d_0_cpython --> 3.8.15-h4a9ceb5_0_cpython None


Proceed ([y]/n)? y


Downloading and Extracting Packages
traitlets-5.8.1      | 96 KB     | ###################################################################################################################### | 100% 
backports-1.0        | 6 KB      | ###################################################################################################################### | 100% 
openssl-3.0.7        | 2.5 MB    | ###################################################################################################################### | 100% 
ipykernel-6.20.2     | 108 KB    | ###################################################################################################################### | 100% 
pyzmq-25.0.0         | 431 KB    | ###################################################################################################################### | 100% 
_libgcc_mutex-0.1    | 3 KB      | ###################################################################################################################### | 100% 
asttokens-2.2.1      | 27 KB     | ###################################################################################################################### | 100% 
tk-8.6.12            | 3.3 MB    | ###################################################################################################################### | 100% 
prompt-toolkit-3.0.3 | 265 KB    | ###################################################################################################################### | 100% 
entrypoints-0.4      | 9 KB      | ###################################################################################################################### | 100% 
libffi-3.4.2         | 57 KB     | ###################################################################################################################### | 100% 
psutil-5.9.4         | 348 KB    | ###################################################################################################################### | 100% 
typing_extensions-4. | 29 KB     | ###################################################################################################################### | 100% 
idna-3.4             | 55 KB     | ###################################################################################################################### | 100% 
wcwidth-0.2.6        | 28 KB     | ###################################################################################################################### | 100% 
requests-2.28.2      | 55 KB     | ###################################################################################################################### | 100% 
openmpi-4.1.4        | 3.6 MB    | ###################################################################################################################### | 100% 
pickleshare-0.7.5    | 9 KB      | ###################################################################################################################### | 100% 
comm-0.1.2           | 11 KB     | ###################################################################################################################### | 100% 
cffi-1.15.1          | 230 KB    | ###################################################################################################################### | 100% 
pygments-2.14.0      | 805 KB    | ###################################################################################################################### | 100% 
ncurses-6.3          | 1002 KB   | ###################################################################################################################### | 100% 
stack_data-0.6.2     | 26 KB     | ###################################################################################################################### | 100% 
ld_impl_linux-64-2.3 | 675 KB    | ###################################################################################################################### | 100% 
readline-8.1.2       | 291 KB    | ###################################################################################################################### | 100% 
_openmp_mutex-4.5    | 23 KB     | ###################################################################################################################### | 100% 
packaging-23.0       | 40 KB     | ###################################################################################################################### | 100% 
colorama-0.4.6       | 25 KB     | ###################################################################################################################### | 100% 
libsqlite-3.40.0     | 791 KB    | ###################################################################################################################### | 100% 
wheel-0.38.4         | 32 KB     | ###################################################################################################################### | 100% 
conda-22.11.1        | 905 KB    | ###################################################################################################################### | 100% 
ruamel.yaml.clib-0.2 | 143 KB    | ###################################################################################################################### | 100% 
pycparser-2.21       | 100 KB    | ###################################################################################################################### | 100% 
python_abi-3.8       | 6 KB      | ###################################################################################################################### | 100% 
cryptography-39.0.0  | 1.4 MB    | ###################################################################################################################### | 100% 
tornado-6.2          | 654 KB    | ###################################################################################################################### | 100% 
pluggy-1.0.0         | 16 KB     | ###################################################################################################################### | 100% 
bzip2-1.0.8          | 484 KB    | ###################################################################################################################### | 100% 
cudatoolkit-11.8.0   | 635.9 MB  | ###################################################################################################################### | 100% 
tqdm-4.64.1          | 82 KB     | ###################################################################################################################### | 100% 
urllib3-1.26.14      | 110 KB    | ###################################################################################################################### | 100% 
mpi-1.0              | 4 KB      | ###################################################################################################################### | 100% 
pycosat-0.6.4        | 108 KB    | ###################################################################################################################### | 100% 
libsodium-1.0.18     | 366 KB    | ###################################################################################################################### | 100% 
executing-1.2.0      | 24 KB     | ###################################################################################################################### | 100% 
libzlib-1.2.13       | 64 KB     | ###################################################################################################################### | 100% 
libstdcxx-ng-12.2.0  | 4.3 MB    | ###################################################################################################################### | 100% 
brotlipy-0.7.0       | 342 KB    | ###################################################################################################################### | 100% 
pyopenssl-23.0.0     | 124 KB    | ###################################################################################################################### | 100% 
platformdirs-2.6.2   | 17 KB     | ###################################################################################################################### | 100% 
python-3.8.15        | 19.9 MB   | ###################################################################################################################### | 100% 
ca-certificates-2022 | 143 KB    | ###################################################################################################################### | 100% 
setuptools-66.1.1    | 630 KB    | ###################################################################################################################### | 100% 
six-1.16.0           | 14 KB     | ###################################################################################################################### | 100% 
python-dateutil-2.8. | 240 KB    | ###################################################################################################################### | 100% 
backcall-0.2.0       | 13 KB     | ###################################################################################################################### | 100% 
libuuid-2.32.1       | 28 KB     | ###################################################################################################################### | 100% 
jupyter_core-5.1.3   | 87 KB     | ###################################################################################################################### | 100% 
pexpect-4.8.0        | 48 KB     | ###################################################################################################################### | 100% 
libgomp-12.2.0       | 455 KB    | ###################################################################################################################### | 100% 
zeromq-4.3.4         | 351 KB    | ###################################################################################################################### | 100% 
libnsl-2.0.0         | 31 KB     | ###################################################################################################################### | 100% 
typing-extensions-4. | 8 KB      | ###################################################################################################################### | 100% 
xz-5.2.6             | 409 KB    | ###################################################################################################################### | 100% 
parso-0.8.3          | 69 KB     | ###################################################################################################################### | 100% 
pip-22.3.1           | 1.5 MB    | ###################################################################################################################### | 100% 
zstandard-0.19.0     | 671 KB    | ###################################################################################################################### | 100% 
ipython-8.8.0        | 555 KB    | ###################################################################################################################### | 100% 
certifi-2022.12.7    | 147 KB    | ###################################################################################################################### | 100% 
ruamel.yaml-0.17.21  | 172 KB    | ###################################################################################################################### | 100% 
ptyprocess-0.7.0     | 16 KB     | ###################################################################################################################### | 100% 
matplotlib-inline-0. | 12 KB     | ###################################################################################################################### | 100% 
conda-package-handli | 247 KB    | ###################################################################################################################### | 100% 
jupyter_client-7.4.9 | 97 KB     | ###################################################################################################################### | 100% 
debugpy-1.6.5        | 1.8 MB    | ###################################################################################################################### | 100% 
decorator-5.1.1      | 12 KB     | ###################################################################################################################### | 100% 
nest-asyncio-1.5.6   | 10 KB     | ###################################################################################################################### | 100% 
pure_eval-0.2.2      | 14 KB     | ###################################################################################################################### | 100% 
jedi-0.18.2          | 786 KB    | ###################################################################################################################### | 100% 
pysocks-1.7.1        | 19 KB     | ###################################################################################################################### | 100% 
backports.functools_ | 9 KB      | ###################################################################################################################### | 100% 
libgcc-ng-12.2.0     | 931 KB    | ###################################################################################################################### | 100% 
conda-package-stream | 17 KB     | ###################################################################################################################### | 100% 
charset-normalizer-2 | 36 KB     | ###################################################################################################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: | By downloading and using the CUDA Toolkit conda packages, you accept the terms and conditions of the CUDA End User License Agreement (EULA): https://docs.nvidia.com/cuda/eula/index.html

|  
For Linux 64, Open MPI is built with CUDA awareness but this support is disabled by default.
To enable it, please set the environment variable OMPI_MCA_opal_cuda_support=true before
launching your MPI processes. Equivalently, you can set the MCA parameter in the command line:
mpiexec --mca opal_cuda_support 1 ...
 
In addition, the UCX support is also built but disabled by default.
To enable it, first install UCX (conda install -c conda-forge ucx). Then, set the environment
variables OMPI_MCA_pml="ucx" OMPI_MCA_osc="ucx" before launching your MPI processes.
Equivalently, you can set the MCA parameters in the command line:
mpiexec --mca pml ucx --mca osc ucx ...
Note that you might also need to set UCX_MEMTYPE_CACHE=n for CUDA awareness via UCX.
Please consult UCX's documentation for detail.
 

done
Retrieving notices: ...working... done

This I believe changes the path for mpirun which causes issues in executing the code.

root@c976cba7949e:/workspace/examples# which mpirun
/opt/conda/bin/mpirun

How do I circumvent this to make the appliance support .ipynb files?

Thanks

Functions for arithmetic operations of two tensor networks

Does cuQuantum include efficient functions for implementing arithmetic operations, i.e., addition and subtraction, on two tensor networks?

Compiling cuStateVec with CMake

Hi there,

I am incorporating cuStateVec into a C++ project as a dynamic library using CMake, and although it works, I'm apprehensive I'm not using best practice. For context, I've installed cuquantum-cuda-11 via apt-get on Ubuntu 20.04 as per these instructions, which has created (among many others) relevant files:

/usr/include/custatevec.h, which #include's existing files in /usr/local/cuda/include/ (such as library_types.h)
/usr/lib/x86_64-linux-gnu/libcuquantum/11/libcustatevec.so (and .a, etc)

I have an existing CMake project which uses CUDA (without cuStateVec). Since it intends to support (very) older CMake versions, the root CMakeLists.txt does not declare CUDA as a language, i.e. via project(myProject LANGUAGES CUDA), but instead does an old fashioned:

find_package(CUDA REQUIRED)
cuda_add_library(myProject mySourceFiles)
target_link_libraries(QuEST ${CUDA_LIBRARIES})

Already inadvisable of course; please forgive me, I'm stuck with it at the moment!

I now wish to adapt this to include cuStateVec. I expected it to be as simple as replacing the final line above with:

find_library(CUQUANTUM_LIBRARIES custatevec)
target_link_libraries(myProject ${CUDA_LIBRARIES} ${CUQUANTUM_LIBRARIES})

Alas, compiling yields the below error

/usr/include/custatevec.h:134:10: fatal error: library_types.h: No such file or directory
  134 | #include <library_types.h>

indicating that CMake has not included directory /usr/local/cuda/include/.
I can hackily remedy this by adding:

target_include_directories(myProject PUBLIC "/usr/local/cuda/include")

and everything compiles fine, but I'm doubtful this is really addressing the issue.

So; is there an example of compiling cuStateVec with CMake, or additional CMakeLists.txt files I've overlooked?
It's possible the issue is really with my understanding of CMake, and irrelevant to cuStateVec's build, but I'm surprised I didn't have to point other CUDA libraries to the headers in /usr/local/cuda/include.

cuQuantum Python v22.03.0: `ModuleNotFoundError: No module named 'typing_extensions'`

Workaround: please install `typing_extensions` via pip or conda:

pip install typing_extensions

conda install -c conda-forge typing_extensions

Symptom:

leof:~$ python
Python 3.9.10 | packaged by conda-forge | (main, Feb  1 2022, 21:24:37) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cuquantum
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/leof/miniforge3/envs/ppqwqq/lib/python3.9/site-packages/cuquantum/__init__.py", line 2, in <module>
    from cuquantum import cutensornet
  File "/home/leof/miniforge3/envs/ppqwqq/lib/python3.9/site-packages/cuquantum/cutensornet/__init__.py", line 2, in <module>
    from cuquantum.cutensornet.configuration import *
  File "/home/leof/miniforge3/envs/ppqwqq/lib/python3.9/site-packages/cuquantum/cutensornet/configuration.py", line 19, in <module>
    from .memory import BaseCUDAMemoryManager
  File "/home/leof/miniforge3/envs/ppqwqq/lib/python3.9/site-packages/cuquantum/cutensornet/memory.py", line 7, in <module>
    from typing_extensions import Protocol, runtime_checkable
ModuleNotFoundError: No module named 'typing_extensions'

Small typo in README

#136

Strange behavior with diagonal gates.

Hi,

I constructed a simple 6-qubit circuit with a brickwork pattern, and generate the corresponding expression for calculating an amplitude. The alternating two-qubit gates are generally not diagonal, but some can be decomposed into local single-qubit gates followed by a diagonal gate. Diagonal decomposition, with the introduction of hyperedges, should reduce the computational cost. Although the example provided below should not result in a reduction of cost that is very noticeable, we are still seeing some very unexpected behaviors. The contraction path results in a very large number of open indices in intermediate tensors, even though the graph should be almost a ring graph, which has a tree width of 2.

Code:

import numpy as np
from cuquantum import contract_path

# Filler for operands, values don't matter
value = np.zeros(2, dtype=complex)
cz = np.zeros([2, 2], dtype=complex)
single_qubit_gate = np.zeros([2, 2], dtype=complex)
two_qubit_gate = np.zeros([2, 2, 2, 2], dtype=complex)

# Regular no diagonal decomposition
exp_str = 'a,b,c,d,e,f,ag,bh,ci,dj,ek,fl,ghmn,ijop,klqr,mrsx,notu,pqvw,sy,tz,uA,vB,wC,xD,y,z,A,B,C,D->'
operands = [value] * 6 + [single_qubit_gate] * 6 + [two_qubit_gate] * 6 + [single_qubit_gate] * 6 + [value] * 6
path, info = contract_path(exp_str, *operands)
cost = info.opt_cost
largest_intermediate = info.largest_intermediate
intermediate_modes = info.intermediate_modes
print(f'No diagonal gate: cost {cost}, largest_intermediate {largest_intermediate}.')
print('Intermediate modes: ', intermediate_modes)

# Diagonal decomposition
exp_str = 'a,b,c,d,e,f,ag,bh,ci,dj,ek,fl,gh,ij,kl,gm,hn,io,jp,kq,lr,mr,no,pq,ms,nt,ou,pv,qw,rx,s,t,u,v,w,x'
operands = [value] * 6 + [single_qubit_gate] * 6 + [cz] * 3 + [single_qubit_gate] * 6 + [cz] * 3 + [single_qubit_gate] * 6 + [value] * 6
path, info = contract_path(exp_str, *operands)
cost = info.opt_cost
largest_intermediate = info.largest_intermediate
intermediate_modes = info.intermediate_modes
print(f'Diagonal gate: cost {cost}, largest_intermediate {largest_intermediate}.')
print('Intermediate modes: ', intermediate_modes)

Output:

No diagonal gate: cost 724.0, largest_intermediate 16.0.
Intermediate modes: ('jopc', 'jop', 'g', 'hmn', 'h', 'j', 'elqr', 'l', 'ymrx', 'znou', 'u', 'Bpqw', 'w', 'x', 'lqr', 'qr', 'pqw', 'pq', 'op', 'mn', 'nou', 'no', 'pn', 'mp', 'qm', 'rm', 'yx', 'x', '')
Diagonal gate: cost 5072.0, largest_intermediate 256.0.
Intermediate modes: ('i', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 'ji', 'hg', 'nh', 'pj', 'lk', 'rl', 'gm', 'on', 'io', 'qp', 'kq', 'mr', 'jimr', 'hglk', 'gmon', 'rlqp', 'nhjimr', 'pjhglk', 'kqgmon', 'iorlqp', 'nimrpglk', 'kgmnirlp', '')

distributed_reset_configuration failed: python: distributed_interfaces/cutensornet_distributed_interface_mpi.c:44: unpackMpiCommunicator: Assertion `sizeof(MPI_Comm) == comm->commSize' failed.

Under the following setup.

Hardware: INSPUR NF5488M5 (V100 version)
environments:
Ubuntu 22.04.1 LTS
Python 3.9.15
Nvidia driver: 525.60.13
cuda_12.0.r12.0
mpich-4.0.3
mpi4py 3.1.4
cuquantum 22.11.0

When I run /cuQuantum/python/samples/cutensornet/tensornet_example_mpi.py , I got. It works .

*** Printing is done only from the root process to prevent jumbled messages ***
The number of processes is 1
cuTensorNet-vers: 20000
===== root process device info ======
GPU-name: Tesla V100-SXM3-32GB
GPU-clock: 1597000
GPU-memoryClock: 958000
GPU-nSM: 80
GPU-major: 7
GPU-minor: 0
========================
Include headers and define data types.
Define network, modes, and extents.
Initialize the cuTensorNet library and create a network descriptor.
Process 0 has the path with the lowest FLOP count 4299161600.0.
Find an optimized contraction path with cuTensorNet optimizer.
Allocate workspace.
Create a contraction plan for cuTENSOR and optionally auto-tune it.
Contract the network, each slice uses the same contraction plan.
Check cuTensorNet result against that of cupy.einsum().
num_slices: 1
0.8309440016746521 ms / slice
5173.82831013358 GFLOPS/s
Free resource and exit.

But when I run /cuQuantum/python/samples/cutensornet/tensornet_example_mpi_auto.py I got the following error.

*** Printing is done only from the root process to prevent jumbled messages ***
The number of processes is 1
cuTensorNet-vers: 20000
===== root process device info ======
GPU-name: Tesla V100-SXM3-32GB
GPU-clock: 1597000
GPU-memoryClock: 958000
GPU-nSM: 80
GPU-major: 7
GPU-minor: 0
========================
Include headers and define data types.
Define network, modes, and extents.
Initialize the cuTensorNet library and create a network descriptor.
python: distributed_interfaces/cutensornet_distributed_interface_mpi.c:44: unpackMpiCommunicator: Assertion `sizeof(MPI_Comm) == comm->commSize' failed.
[suneo:06467] *** Process received signal ***
[suneo:06467] Signal: Aborted (6)
[suneo:06467] Signal code:  (-6)
[suneo:06467] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f55bbd22520]
[suneo:06467] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f55bbd76a7c]
[suneo:06467] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f55bbd22476]
[suneo:06467] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f55bbd087f3]
[suneo:06467] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x2871b)[0x7f55bbd0871b]
[suneo:06467] [ 5] /lib/x86_64-linux-gnu/libc.so.6(+0x39e96)[0x7f55bbd19e96]
[suneo:06467] [ 6] /home/tsujino/anaconda3/envs/cu/lib/libcutensornet_distributed_interface_mpi.so(+0x123c)[0x7f553de1223c]
[suneo:06467] [ 7] /home/tsujino/anaconda3/envs/cu/lib/libcutensornet_distributed_interface_mpi.so(cutensornetMpiCommRank+0x23)[0x7f553de122ae]
[suneo:06467] [ 8] /home/tsujino/anaconda3/envs/cu/lib/python3.9/site-packages/cuquantum/cutensornet/../../../../libcutensornet.so.2(+0x105462)[0x7f554c705462]
[suneo:06467] [ 9] /home/tsujino/anaconda3/envs/cu/lib/python3.9/site-packages/cuquantum/cutensornet/../../../../libcutensornet.so.2(+0x1056bd)[0x7f554c7056bd]
[suneo:06467] [10] /home/tsujino/anaconda3/envs/cu/lib/python3.9/site-packages/cuquantum/cutensornet/../../../../libcutensornet.so.2(+0x1058ed)[0x7f554c7058ed]
[suneo:06467] [11] /home/tsujino/anaconda3/envs/cu/lib/python3.9/site-packages/cuquantum/cutensornet/../../../../libcutensornet.so.2(cutensornetDistributedResetConfiguration+0xd3)[0x7f554c703633]
[suneo:06467] [12] /home/tsujino/anaconda3/envs/cu/lib/python3.9/site-packages/cuquantum/cutensornet/cutensornet.cpython-39-x86_64-linux-gnu.so(+0x26063)[0x7f554e65c063]
[suneo:06467] [13] python[0x507457]
[suneo:06467] [14] python(_PyObject_MakeTpCall+0x2ec)[0x4f068c]
[suneo:06467] [15] python(_PyEval_EvalFrameDefault+0x525b)[0x4ec9fb]
[suneo:06467] [16] python[0x4e689a]
[suneo:06467] [17] python(_PyEval_EvalCodeWithName+0x47)[0x4e6527]
[suneo:06467] [18] python(PyEval_EvalCodeEx+0x39)[0x4e64d9]
[suneo:06467] [19] python(PyEval_EvalCode+0x1b)[0x59329b]
[suneo:06467] [20] python[0x5c0ad7]
[suneo:06467] [21] python[0x5bcb00]
[suneo:06467] [22] python[0x4566f4]
[suneo:06467] [23] python(PyRun_SimpleFileExFlags+0x1a2)[0x5b67e2]
[suneo:06467] [24] python(Py_RunMain+0x37e)[0x5b3d5e]
[suneo:06467] [25] python(Py_BytesMain+0x39)[0x587349]
[suneo:06467] [26] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f55bbd09d90]
[suneo:06467] [27] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f55bbd09e40]
[suneo:06467] [28] python[0x5871fe]
[suneo:06467] *** End of error message ***
Aborted (core dumped)

I have tried other smaples and those works.

Discussion: Change to link statically to cudart?

I've meant to raise this discussion but didn't get time to do so until now, sorry for delay.

I would like to gauge the team and users' interest in changing the way CuPy links to cudart (CUDA Runtime). Currently, CuPy dynamically links to cudart, likely under the assumption that there is one unique libcudart.so (say from the system package manager) throughout the system. But, with either many module-based HPC clusters or Conda it is often not the case. Furthermore, not all Python libraries do this, for example PyTorch has been linking to cudart statically for very long time. The same story goes even for CUDA Python (which re-implements cudart for a different technical reason). This then raises a semantic question: When I query the cudart version, am I guaranteed that this is the same version seen by all DSOs loaded in a user process?

The answer is no. Even CUDA libraries (cuBLAS & co) do not link to cudart dynamically, but statically (see this CUDA Q&A). This is done so that CUDA minor version compatibility (MVC) can fully kick in. It eliminates the need of comparing the build-time and run-time versions of cudart, and ensures as long as the user-mode driver (UMD, meaning libcuda.so) is newer than the minimal version required by the CUDA major version, the generated binary would work on regardless of the CUDA Toolkit (CTK) version. Therefore, at least for modules that could work without any CUDA libraries (like cupy.cuda.cub), they automatically gain forward (and certain backward) compatibility by just depending on the UMD.

Also, there is no size concern in static linking to cudart, which is very tiny in size.

A significant obstacle for CuPy to make this change would be that in many places CuPy calls runtime.runtimeGetVersion(), assuming this is the canonical way to check the current CTK version. It may no longer be the case (probably it has never been the case?), so this is not an easy change. Also, CuPy still has to link to all libraries dynamically. Even the "core" part of CuPy (kernel compilation) requires at least NVRTC and not just the UMD. As a result, unlike some libraries that could see the immediate benefit (by only depending on UMD) I understand this change could be less attractive to CuPy, but at least this would be a discussed choice.

Website is down

The webpage for the docker container appears to be down: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuquantum-appliance.

Why QSimOptions is throwing error when use_sampler disable_gpu are used ?

`TypeError Traceback (most recent call last)
Cell In [9], line 2
1 ngpus = 1
----> 2 qsim_options = qsimcirq.QSimOptions(
3 max_fused_gate_size = 2
4 , cpu_threads = 1
5 , gpu_mode = ngpus
6 , use_sampler = True
7 , disable_gpu = False
8 )
9 qsim_simulator = qsimcirq.QSimSimulator(qsim_options)

TypeError: init() got an unexpected keyword argument 'use_sampler'`

It's working fine when these 2 options are removed. I am using windows wsl to run it locally using conda installation.
The documentation clearly has these options. Link to QSimOptions

Demo of setting a basic memory handler

Hi there,

I am intending to bind cuStateVec to a mempool to avoid having to explicitly manage workspaces, as described in this doc. I wish to do this for code hygiene (of a "draft" cuQuantum implementation) rather than for performance, so I seek something simple. That doc shows the boilerplate necessary for using a custom memory pool in a very general way, but I am wondering whether this can be reduced/simplified if using a device's default pool.

Here is how the doc might suggest using the default pool:

// get the default mem pool (assuming single GPU)
int device = 0;
cudaMemPool_t memPool;
cudaDeviceGetMemPool(&memPool, device);

// optionally tweak it here (affecting existing pool), e.g.
cudaMemPoolSetAttribute (memPool, cudaMemPoolAttrReleaseThreshold, 16*(1LL<<10)); 

// make mem pool alloc & dealloc callbacks for mem handler
int my_alloc(void* ctx, void** ptr, size_t size, cudaStream_t stream) {
    cudaMemPool_t pool = * reinterpret_cast<cudaMemPool_t*>(ctx);
    return cudaMallocFromPoolAsync(ptr, size, pool, stream); 
}
int my_dealloc(void* ctx, void* ptr, size_t size, cudaStream_t stream) {
    return cudaFreeAsync(ptr, stream); 
}

// create a mem handler around the mem pool
custatevecDeviceMemHandler_t handler;
handler.ctx = reinterpret_cast<void*>(&memPool);
handler.device_alloc = my_alloc;
handler.device_free = my_dealloc;

// set mem handler to auto-manage workspaces (handle = custatevecHandle_t)
custatevecSetDeviceMemHandler(handle, &handler);

This seems unnecessarily tedious to me, given I'm not really specifying any custom behavior; just mapping device_alloc to cudaMallocFromPoolAsync and device_free to cudaFreeAsync.
I sort of imagine that this setup (using the default pool) is what should happen if one calls custatevecApplyMatrix (for example) with extraWorkspace=nullptr, without having previously called custatevecSetDeviceMemHandler().

Is above the right way to use the default mem pool? Or is there a reason I should avoid using an existing pool?
If this is all fine, and I understand correctly this will be a common use-case (especially for new cuQuantum users not intimately familiar with CUDA), could there exist a bespoke function to avoid this boilerplate?
E.g.

custatevecSetDeviceMemHandlerToDefaultMemPool();

(and of course such a function would be expected to error if the user's device does not support stream-ordered memory)

In any case, it might be helpful to mention in the cuStateVec doc that the default mem pool (rather than a custom one) can be used.

compile tensornet_example.cu

Hi,
I'm trying to follow the instruction on how to compile tensornet_example.cu .
https://github.com/NVIDIA/cuQuantum/tree/main/samples/cutensornet
I think it is inconsistent because it says:
export CUTENSORNET_ROOT=<path_to_custatevec_root>

but currently, custatevec is not part of CUTENSORNET anymore. It is a part of CUQUANTUM.
I'm referring to this pair:

https://developer.download.nvidia.com/compute/cuquantum/redist/cuquantum/linux-x86_64/cuquantum-linux-x86_64-0.1.0.30-archive.tar.xz

https://developer.download.nvidia.com/compute/cutensor/redist/libcutensor/linux-x86_64/libcutensor-linux-x86_64-1.4.0.6-archive.tar.xz
Can you please clarify how to configure the Makefile to work with these 2 libs and compile tensornet_example.cu ?
Thanks
Jan Balewski, NERSC

Dockerfile install

Hi,

I am trying to build an image with cuquantum and the code samples installed. Here is what I have so far, compiled from the README here and in the documentation :

FROM nvcr.io/nvidia/pytorch:22.01-py3

# Get cuquantum
ENV CUQUANTUM_ROOT=/opt/cuquantum0.1.0.30
ARG TARFILE=cuquantum-linux-x86_64-0.1.0.30-archive.tar.xz
RUN wget -O /tmp/${TARFILE} \
    https://developer.download.nvidia.com/compute/cuquantum/redist/linux-x86_64/${TARFILE} && \
    mkdir -p ${CUQUANTUM_ROOT} && \
    tar xvf /tmp/${TARFILE} -C ${CUQUANTUM_ROOT} --strip-components=1 && \
    #lib64/ is missing, symlink it to lib/
    ln -s ${CUQUANTUM_ROOT}/lib ${CUQUANTUM_ROOT}/lib64 && \
    rm /tmp/${TARFILE}
ENV LD_LIBRARY_PATH=${CUQUANTUM_ROOT}/lib:${LD_LIBRARY_PATH}

# Install cuquantum python bindings, remove previous cupy version
# TODO verify
RUN pip uninstall -y cupy-cuda115 && \
    conda install -c conda-forge cuquantum-python


ENV CUSTATEVEC_ROOT=${CUQUANTUM_ROOT}
ENV CUTENSORNET_ROOT=${CUQUANTUM_ROOT}
ENV PATH=/usr/local/cuda/bin/:${PATH}

# Get samples repo
ARG TARFILE=v0.1.0.0.tar.gz
RUN wget -O /tmp/${TARFILE} https://github.com/NVIDIA/cuQuantum/archive/refs/tags/${TARFILE} && \
    mkdir -p ${CUSTATEVEC_ROOT}/code_samples && \
    tar xvf /tmp/${TARFILE} -C ${CUSTATEVEC_ROOT}/code_samples --strip-components=1 && \
    rm /tmp/${TARFILE}

The image has cupy-cuda115, the conda install of cuquantum-python installs another version of cupy as a dependency so I uninstall the old one (it will complain during import if both are available). make all builds successfully (though the lib64->lib symlink is needed for it to work), but I am unable to run the python samples without hitting import errors.

I am running on an intel-chip mac, just trying to clear up the import errors before we run this on a cloud instance with an nvidia GPU mounted in.

Before posting any stacktraces, am I on the right track here? Maybe I should use a different base image that has an equivalent version of cupy. I'm also not sure if the cuda version is incompatible.

I am happy to submit a PR with the working Dockerfile once we figure this all out :)

Autograd rules for Pytorch inputs

As someone who benefits a lot from having high-performance tensor network methods, a huge thanks for putting this excellent library together (and for creating Python bindings for greater ease of use)!

I'm attempting to use these tools (in particular, cuquantum.contract) to accelerate GPU tensor network contraction with Pytorch, in a setting where automatic differentiation is used to optimize model parameters. Everything works great in the forward pass, but as soon I try to compute gradients, I get the error RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn.

I'm including a minimum working example below for reproducing this error, but in general I wanted to ask if implementing autograd rules for Pytorch inputs is something that is planned for cuQuantum. I understand this isn't a trivial thing to add, but for anyone using these tools for ML (and also for many physics users) being able to efficiently contract and backpropagate would be a huge bonus for the library.

And of course, if all of this functionality is available already and I'm just doing something wrong, that would be wonderful news 😁

MWE:

import torch
from cuquantum import contract

# Dummy class whose parameters will get trained
class Contractor(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.param = torch.nn.Parameter(torch.ones(2, 3))

    def forward(self, x):
        return contract("ab,ba->", self.param, x)

# Initialization of model and evaluation on input
model = Contractor()
data = torch.ones(3, 2)
loss = model(data)

# This is where things break
loss.backward()

`state_compute()` leading to kernel dying.

Hi,

I was trying to use the high-level state API to compute a quantum state. I used the same code as in https://github.com/NVIDIA/cuQuantum/blob/main/python/samples/cutensornet/high_level/expectation_example.py, just instead of building an operator and expectation value, I tried to compute the state. Every time I run the following script, I get a kernel dying (on Perlmutter, with my setup working for other cuTN tasks):

import cupy as cp
import numpy as np

import cuquantum
from cuquantum import cutensornet as cutn

dev = cp.cuda.Device()  # get current device

num_qubits = 16
dim = 2
qubits_dims = (dim, ) * num_qubits

handle = cutn.create()
stream = cp.cuda.Stream()
data_type = cuquantum.cudaDataType.CUDA_C_64F

# Define quantum gate tensors on device
gate_h = 2**-0.5 * cp.asarray([[1,1], [1,-1]], dtype='complex128', order='F')
gate_h_strides = 0

gate_cx = cp.asarray([[1, 0, 0, 0],
                      [0, 1, 0, 0],
                      [0, 0, 0, 1],
                      [0, 0, 1, 0]], dtype='complex128').reshape(2,2,2,2, order='F')
gate_cx_strides = 0

free_mem = dev.mem_info[0]
scratch_size = free_mem // 2
scratch_space = cp.cuda.alloc(scratch_size)

# Create the initial quantum state
quantum_state = cutn.create_state(handle, cutn.StatePurity.PURE, num_qubits, qubits_dims, data_type)
print("Created the initial quantum state")

# Construct the quantum circuit state with gate application
tensor_id = cutn.state_apply_tensor(
        handle, quantum_state, 1, (0, ), 
        gate_h.data.ptr, gate_h_strides, 1, 0, 1)

for i in range(1, num_qubits):
    tensor_id = cutn.state_apply_tensor(
        handle, quantum_state, 2, (i-1, i),  # target on i-1 while control on i
        gate_cx.data.ptr, gate_cx_strides, 1, 0, 1)
print("Quantum gates applied")

# Configure the quantum circuit expectation value computation
num_hyper_samples_dtype = cutn.state_get_attribute_dtype(cutn.ExpectationAttribute.OPT_NUM_HYPER_SAMPLES)
num_hyper_samples = np.asarray(8, dtype=num_hyper_samples_dtype)
cutn.state_configure(handle, quantum_state, 
cutn.StateAttribute.NUM_HYPER_SAMPLES, 
num_hyper_samples.ctypes.data, num_hyper_samples.dtype.itemsize)

# Prepare the computation of the specified quantum circuit expectation value
work_desc = cutn.create_workspace_descriptor(handle)
cutn.state_prepare(handle, quantum_state, scratch_size, work_desc, stream.ptr)
print("Prepare the computation of the specified quantum circuit expectation value")

workspace_size_d = cutn.workspace_get_memory_size(handle, 
    work_desc, cutn.WorksizePref.RECOMMENDED, cutn.Memspace.DEVICE, cutn.WorkspaceKind.SCRATCH)

if workspace_size_d <= scratch_size:
    cutn.workspace_set_memory(handle, work_desc, cutn.Memspace.DEVICE, cutn.WorkspaceKind.SCRATCH, scratch_space.ptr, workspace_size_d)
else:
    print("Error:Insufficient workspace size on Device")
    cutn.destroy_workspace_descriptor(work_desc)
    cutn.destroy_state(quantum_state)
    cutn.destroy(handle)
    del scratch
    print("Free resource and exit.")

state_vector = np.empty(pow(16, 2), dtype="complex128")
cutn.state_compute(
            handle,
            quantum_state,
            work_desc,
            state_vector.ctypes.data,
            stream.ptr,
        )

The only two steps that are different from the example above are the last two lines. Is something wrong with the state_vector allocation?

If this is helpful, here is the logger output I am getting before the crash:

[2024-02-22 08:59:35][cuTensorNet][362076][Api][cutensornetGetOutputStateDetails] handle=0X55CD4EE7E3C0 tensorNetworkState=0X55CD4FFF5C00 numTensorsOut=0X7FFEFCE2B3FC numModesOut=0X0 extentsOut=0X0 stridesOut=0X0
[2024-02-22 08:59:35][cuTensorNet][362076][Api][cutensornetGetOutputStateDetails] handle=0X55CD4EE7E3C0 tensorNetworkState=0X55CD4FFF5C00 numTensorsOut=0X7FFEFCE2B3FC numModesOut=0X55CD4E1F0070 extentsOut=0X0 stridesOut=0X0
[2024-02-22 08:59:35][cuTensorNet][362076][Api][cutensornetStateCompute] handle=0X55CD4EE7E3C0 tensorNetworkState=0X55CD4FFF5C00 workDesc=0X55CD4D9EF040, extentsOut=0X55CD5004E2C0 stridesOut=0X55CD5004E2E0 stateTensorsOut=0X55CD50A3D510 cudaStream=0X55CD4E2F2AE0
[2024-02-22 08:59:35][cuTensorNet][362076][Api][cutensornetContractSlices] handle=0X55CD4EE7E3C0 plan=0X55CD4EACE980 rawDataIn=0X55CD5026A9A0 rawDataOut=0X2000 accumulateOutput=0 workDesc=0X55CD4D9EF040 sliceGroup=0X0 stream=0X55CD4E2F2AE0
[2024-02-22 08:59:35][cuTensorNet][362076][Trace][cutensornetContractSlices] Provided scratchWorkspace=0X7F6766000000 scratchWorkspaceSize=17875456 cacheWorkspace=0X0 cacheWorkspaceSize=0

Could it be cacheWorkspaceSize=0?

Many thanks!

CircuitToEinsum for QFT: batched_amplitudes slower than qsim full statevector simulation

I was trying to reproduce the statement in https://developer.nvidia.com/blog/nvidia-announces-cuquantum-beta-availability-record-quantum-benchmark-and-quantum-container/, in particular

Quantum Fourier Transform – accelerated from 29 mins down to 19 secs

I'm running the following minimal reproducer on Python 3.8, cuQuantum 22.11, NVIDIA A100 40 GB (on a GCP instance)

import time

import cirq
import qsimcirq
import cupy
from cuquantum import contract
from cuquantum import CircuitToEinsum

simulator = qsimcirq.QSimSimulator()

# See https://quantumai.google/cirq/experiments/textbook_algorithms
def make_qft(qubits):
    """Generator for the QFT on a list of qubits."""
    qreg = list(qubits)
    while len(qreg) > 0:
        q_head = qreg.pop(0)
        yield cirq.H(q_head)
        for i, qubit in enumerate(qreg):
            yield (cirq.CZ ** (1 / 2 ** (i + 1)))(qubit, q_head)


def simulate_and_measure(nqubits):
    qubits = cirq.LineQubit.range(nqubits)
    qft = cirq.Circuit(make_qft(qubits))

    myconverter = CircuitToEinsum(qft, backend=cupy)

    tic = time.time()
    simulator.simulate(qft)
    elapsed_qsim = time.time() - tic
    out = {"qsim": elapsed_qsim}

    # CUDA expectation
    pauli_string = {qubits[0]: 'Z'}
    expression, operands = myconverter.expectation(pauli_string, lightcone=True)
    tic = time.time()
    contract(expression, *operands)
    elapsed = time.time() - tic
    out["cu_expectation"] = elapsed

    # CUDA Batched amplitudes
    # Fix everything but last qubit
    fixed_states = "0" * (nqubits - 1)
    fixed_index = tuple(map(int, fixed_states))
    num_fixed = len(fixed_states)
    fixed = dict(zip(myconverter.qubits[:num_fixed], fixed_states))
    expression, operands = myconverter.batched_amplitudes(fixed)
    tic = time.time()
    contract(expression, *operands)
    elapsed = time.time() - tic
    out["cu_batched"] = elapsed

    return out

for i in [10, 15, 20, 25, 30]:
    print(i, simulate_and_measure(i))

Output (the numbers are elapsed in seconds; 10, 15, ... are number of qubits for QFT):

10 {'qsim': 0.9677999019622803, 'cu_expectation': 0.29337143898010254, 'cu_batched': 0.07590365409851074}
15 {'qsim': 0.023270368576049805, 'cu_expectation': 0.019628524780273438, 'cu_batched': 0.3687710762023926}
20 {'qsim': 0.03504538536071777, 'cu_expectation': 0.023822784423828125, 'cu_batched': 0.9347813129425049}
25 {'qsim': 0.14235782623291016, 'cu_expectation': 0.02486586570739746, 'cu_batched': 2.39030122756958}
30 {'qsim': 3.4044816493988037, 'cu_expectation': 0.028923749923706055, 'cu_batched': 4.6819908618927}
35 {'cu_expectation': 1.0615959167480469, 'cu_batched': 10.964831829071045}
40 {'cu_expectation': 0.03381609916687012, 'cu_batched': 82.43729209899902}

I wasn't able to go to 35 qubits for qsim, because I got CUDA OOM for qsim. The much reduced memory usage alone is sufficient to prefer cuQuantum for this use case.

But, I was hoping that batched_amplitudes is going to be faster than a full statevector simulation, because some qubits are fixed. But it doesn't seem to be the case. I have also tried reduced_density_matrix (not shown, so that the code snippet is short). The only one that is consistently fast is expectation. I wonder if I did it wrongly?

Calling `cutn.distributed_reset_configuration()` with MPICH might fail with `CUTENSORNET_STATUS_DISTRIBUTED_FAILURE`

MPICH users running this sample might see the following error:

$ mpiexec -n 2 python example22_mpi_auto.py
Traceback (most recent call last):
  File "/home/leof/dev/cuquantum/python/samples/cutensornet/coarse/example22_mpi_auto.py", line 60, in <module>
    cutn.distributed_reset_configuration(
  File "cuquantum/cutensornet/cutensornet.pyx", line 2306, in cuquantum.cutensornet.cutensornet.distributed_reset_configuration
  File "cuquantum/cutensornet/cutensornet.pyx", line 2328, in cuquantum.cutensornet.cutensornet.distributed_reset_configuration
  File "cuquantum/cutensornet/cutensornet.pyx", line 229, in cuquantum.cutensornet.cutensornet.check_status
cuquantum.cutensornet.cutensornet.cuTensorNetError: CUTENSORNET_STATUS_DISTRIBUTED_FAILURE
Traceback (most recent call last):
  File "/home/leof/dev/cuquantum/python/samples/cutensornet/coarse/example22_mpi_auto.py", line 60, in <module>
    cutn.distributed_reset_configuration(
  File "cuquantum/cutensornet/cutensornet.pyx", line 2306, in cuquantum.cutensornet.cutensornet.distributed_reset_configuration
  File "cuquantum/cutensornet/cutensornet.pyx", line 2328, in cuquantum.cutensornet.cutensornet.distributed_reset_configuration
  File "cuquantum/cutensornet/cutensornet.pyx", line 229, in cuquantum.cutensornet.cutensornet.check_status
cuquantum.cutensornet.cutensornet.cuTensorNetError: CUTENSORNET_STATUS_DISTRIBUTED_FAILURE

This is a known issue for the automatic MPI support using cuQuantum Python 22.11 / cuTensorNet 2.0.0 + mpi4py + MPICH.

The reason is that Python by default dynamically loads shared libraries in the private mode (see, e.g., the documentation for ctypes.DEFAULT_MODE)), which breaks the assumption of libcutensornet_distributed_interface_mpi.so (whose path is set via $CUTENSORNET_COMM_LIB) that MPI symbols would be loaded to the public scope.

Open MPI is immune to this problem because mpi4py had to "break" this assumption due to a few old Open MPI issues.

There are multiple workarounds that users can choose:

Load the MPI symbols via LD_PRELOAD, e.g., mpiexec -n 2 -env LD_PRELOAD=$MPI_HOME/lib/libmpi.so python example22_mpi_auto.py
Change Python's default loading mode to public (global) before any other imports

import os, sys
sys.setdlopenflags(os.RTLD_LAZY | os.RTLD_GLOBAL)
import ...

If compiling libcutensornet_distributed_interface_mpi.so manually, link the MPI library to it via -lmpi

In a future release, we will add a fix to work around this limitation. See also #30 for discussion.

Sudo permission issue for cuquantum-appliance:23.10 container

Hi All,

I am trying to use cuquantum-appliance:23.10 with shifter on NERSC Perlmutter system.
I am facing the following sudo permission issue with this container:

namehta4@perlmutter:login36:~> salloc -N 1 -G 4 -C gpu -t 120 -c 64 -A nstaff -q interactive --image=nvcr.io/nvidia/cuquantum-appliance:23.10
salloc: Granted job allocation 22896843
salloc: Waiting for resource configuration
salloc: Nodes nid200432 are ready for job
namehta4@nid200432:~> shifter /bin/bash
(base) namehta4@nid200432:~$ cd /home/cuquantum/
bash: cd: /home/cuquantum/: Permission denied
(base) namehta4@nid200432:~$ sudo cd /home/cuquantum
sudo: The "no new privileges" flag is set, which prevents sudo from running as root.
sudo: If sudo is running in a container, you may need to adjust the container configuration to disable the flag.

As far as I know, this is a new issue as the behavior is different compared to the previous imae (23.03)

namehta4@perlmutter:login36:~> salloc -N 1 -G 4 -C gpu -t 120 -c 64 -A nstaff -q interactive --image=nvcr.io/nvidia/cuquantum-appliance:23.03
salloc: Pending job allocation 22896859
salloc: job 22896859 queued and waiting for resources
salloc: job 22896859 has been allocated resources
salloc: Granted job allocation 22896859
salloc: Waiting for resource configuration
salloc: Nodes nid200436 are ready for job
namehta4@nid200436:~> shifter /bin/bash
(base) namehta4@nid200436:~$ cd /home/cuquantum/
(base) namehta4@nid200436:/home/cuquantum$ ls
LICENSE  conda	examples

May I please use your help in resolving this issue?

Thank you!
Neil Mehta

noisy circuit simulation using cuquantum

Hi I am trying to simulate a 30 qubit noisy circuit using the NVIDIA cuQuantum Appliance - nvcr.io/nvidia/cuquantum-appliance:22.11.

I encounter the error: "CUDA error: an illegal memory access was encountered vector_mgpu.h 129"

Does cuQuantum support noisy circuit simulations?

Multithreaded cutn optimization issue

I have the following code for contraction optimization using multiple threads. I set the number of samples to 64. What I observe is that if I set the number of threads to 1, the optimization takes 219 seconds. If I set the threads to 64, it takes 89 seconds. The resulting quality is not very different. I expect a much faster time to solution with 64 threads, and I can see that the CPU utilization indeed goes above 6000% for a substantial amount of time.

The machine has a single AMD Zen 3 (Milan) 32 core 64 thread CPU, and an A100 GPU. Even if I set the number of threads to 32, the CPU utilization is above 3100%, and the time for 64 samples is 65 seconds with similar performance. For 8 threads, the time is 47 seconds.

Versions:

cuquantum-python-cu11     23.3.0                   pypi_0    pypi
custatevec-cu11           1.5.0                    pypi_0    pypi
cutensor-cu11             1.7.0                    pypi_0    pypi
cutensornet-cu11          2.3.0                    pypi_0    pypi

Code:

import cuquantum.cutensornet as cutn
from cuquantum.cutensornet import configuration
from cuquantum import Network
import numpy as np
import time

expression = 'a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,À,Á,Â,Ã,yÜ,rÕ,ÜÕ,Jç,mÐ,çÐ,Iæ,Pí,æí,qÔ,vÙ,ÔÙ,Mê,lÏ,êÏ,Àø,Rï,øï,Ãû,Hå,ûå,Tñ,AÞ,ñÞ,uØ,oÒ,ØÒ,jÍ,Lé,Íé,cÆ,Bß,Æß,zÝ,kÎ,ÝÎ,Áù,xÛ,ùÛ,Wô,iÌ,ôÌ,nÑ,Gä,Ñä,Eâ,fÉ,âÉ,Uò,wÚ,òÚ,eÈ,sÖ,ÈÖ,t×,Xõ,×õ,Z÷,pÓ,÷Ó,Yö,Kè,öè,aÄ,hË,ÄË,dÇ,bÅ,ÇÅ,Oì,Fã,ìã,Sð,Âú,ðú,gÊ,Cà,Êà,Dá,Vó,áó,Qî,Në,îë,éď,øĆ,ďĆ,Éě,êĄ,ěĄ,ßđ,ùĔ,đĔ,ÇĨ,ÑĘ,ĨĘ,ÈĞ,Åĩ,Ğĩ,Ùă,ôĖ,ăĖ,ûĈ,ðĬ,ĈĬ,úĭ,æĀ,ĭĀ,èĥ,õġ,ĥġ,ãī,åĉ,īĉ,Òč,âĚ,čĚ,Ìė,Ûĕ,ėĕ,Ëħ,ØČ,ħČ,ïć,óı,ćı,Ðÿ,äę,ÿę,Öğ,íā,ğā,Óģ,Îē,ģē,Üü,ñĊ,üĊ,ìĪ,ÆĐ,ĪĐ,Úĝ,Õý,ĝý,ÍĎ,áİ,Ďİ,çþ,ÄĦ,þĦ,Þċ,ëĳ,ċĳ,ÊĮ,àį,Įį,×Ġ,÷Ģ,ĠĢ,Ïą,îĲ,ąĲ,öĤ,ÝĒ,ĤĒ,ÔĂ,òĜ,ĂĜ,Ĉŀ,ÿŐ,ŀŐ,Ēũ,ĀŃ,ũŃ,Ĝū,ğŒ,ūŒ,ýś,ēŕ,śŕ,đĸ,ĔĹ,ĸĹ,ĮŢ,Ěŉ,Ţŉ,Ċŗ,ĂŪ,ŗŪ,ĎŜ,Đř,Ŝř,ĘĻ,ăľ,Ļľ,ıŏ,ĝŚ,ŏŚ,ęő,īņ,őņ,þŞ,Ĩĺ,Şĺ,ĉŇ,ėŊ,ŇŊ,İŝ,ćŎ,ŝŎ,ĭł,ĬŁ,łŁ,ěĶ,ċŠ,ĶŠ,ĕŋ,ĥń,ŋń,üŖ,Ħş,Ŗş,čň,ĤŨ,ňŨ,ġŅ,ďĴ,ŅĴ,ĩĽ,Ĳŧ,Ľŧ,Ćĵ,ąŦ,ĵŦ,ħŌ,Ģť,Ōť,āœ,ĖĿ,œĿ,ĪŘ,ģŔ,ŘŔ,Ğļ,ĳš,ļš,Čō,ĠŤ,ōŤ,Ąķ,įţ,ķţ,ŘƜ,Ŋƅ,Ɯƅ,Ńů,şƏ,ůƏ,ŦƗ,ľŽ,ƗŽ,Ŝź,ŀŬ,źŬ,ŠƋ,ļƞ,Ƌƞ,ŗŸ,ūŰ,ŸŰ,ŇƄ,ŎƇ,ƄƇ,śŲ,ţƣ,Ųƣ,ņƁ,Ŀƛ,Ɓƛ,Ťơ,ŋƌ,ơƌ,ĴƓ,ōƠ,ƓƠ,œƚ,Śſ,ƚſ,ŖƎ,Ĺŵ,Ǝŵ,ŉŷ,ĵƖ,ŷƖ,ńƍ,šƟ,ƍƟ,ĸŴ,łƈ,Ŵƈ,ŪŹ,ŔƝ,ŹƝ,ŁƉ,řŻ,ƉŻ,ŕų,ĶƊ,ųƊ,ŝƆ,őƀ,Ɔƀ,ĺƃ,ķƢ,ƃƢ,ũŮ,Œű,Ůű,Ļż,ŞƂ,żƂ,ŧƕ,ňƐ,ƕƐ,Őŭ,ŢŶ,ŭŶ,ŏž,ťƙ,žƙ,ŌƘ,Ņƒ,Ƙƒ,ŨƑ,ĽƔ,ƑƔ,ŹǄ,Ɵǁ,Ǆǁ,ƈǃ,Ɲǅ,ǃǅ,ƛƵ,ųǈ,Ƶǈ,Ŵǂ,ƀǋ,ǂǋ,ŲƲ,ŷƾ,Ʋƾ,ƇƱ,ƎƼ,ƱƼ,Ɗǉ,űǏ,ǉǏ,ƏƧ,ƣƳ,ƧƳ,Ơƹ,ƅƥ,ƹƥ,Ɩƿ,ƓƸ,ƿƸ,ŻǇ,ƙǗ,ǇǗ,žǖ,ƌƷ,ǖƷ,ƐǓ,ŰƯ,ǓƯ,ƔǛ,Ůǎ,Ǜǎ,Ƒǚ,Ɨƨ,ǚƨ,Ƅư,ŸƮ,ưƮ,ơƶ,ƞƭ,ƶƭ,ŵƽ,ƍǀ,ƽǀ,Ŭƫ,żǐ,ƫǐ,ŶǕ,źƪ,Ǖƪ,ƚƺ,ƃǌ,ƺǌ,ŭǔ,ƢǍ,ǔǍ,ƒǙ,Ɓƴ,Ǚƴ,ƋƬ,Ƙǘ,Ƭǘ,ŽƩ,ſƻ,Ʃƻ,ƕǒ,ƆǊ,ǒǊ,ƂǑ,Ɖǆ,Ǒǆ,ůƦ,ƜƤ,ƦƤ,ƴȉ,ƿǮ,ȉǮ,Ǎȇ,ǒȎ,ȇȎ,ƶǼ,ƫȀ,ǼȀ,ǑȐ,Ǐǩ,Ȑǩ,ǗǱ,ƪȃ,Ǳȃ,ƵǠ,ǆȑ,Ǡȑ,ƹǬ,ǈǡ,Ǭǡ,ƨǹ,ƾǥ,ǹǥ,ƲǤ,ǚǸ,ǤǸ,ǛǶ,Ƹǯ,Ƕǯ,Ƥȓ,ǐȁ,ȓȁ,ǘȋ,ƺȄ,ȋȄ,Ƽǧ,Ƴǫ,ǧǫ,ưǺ,ǌȅ,Ǻȅ,ƬȊ,ǖǲ,Ȋǲ,ǀǿ,ƧǪ,ǿǪ,Ưǵ,ǉǨ,ǵǨ,Ʈǻ,ǕȂ,ǻȂ,ǔȆ,ǓǴ,ȆǴ,ƩȌ,ǁǝ,Ȍǝ,ƽǾ,ƻȍ,Ǿȍ,Ǆǜ,ǅǟ,ǜǟ,ǎǷ,ƱǦ,ǷǦ,ǂǢ,ǙȈ,ǢȈ,Ǌȏ,Ʒǳ,ȏǳ,ƭǽ,ǃǞ,ǽǞ,ƥǭ,ǋǣ,ǭǣ,Ǉǰ,ƦȒ,ǰȒ,ǫȭ,Ǫȳ,ȭȳ,ǥȣ,ȓȨ,ȣȨ,ǼȘ,ǽɆ,ȘɆ,ǧȬ,ǩț,Ȭț,Ǩȵ,ǤȤ,ȵȤ,ȅȯ,ǣɉ,ȯɉ,ǱȜ,ȍȽ,ȜȽ,ȇȖ,Ǣɂ,Ȗɂ,ǵȴ,Ǹȥ,ȴȥ,ǬȠ,ǰɊ,ȠɊ,ǺȮ,ȉȔ,ȮȔ,ǡȡ,ǳɅ,ȡɅ,ȋȪ,Ǿȼ,Ȫȼ,ȃȝ,ǟȿ,ȝȿ,Ȏȗ,ȌȺ,ȗȺ,ǶȦ,ǠȞ,ȦȞ,ȁȩ,ǹȢ,ȩȢ,Ȃȷ,Ȇȸ,ȷȸ,ȏɄ,ǯȧ,Ʉȧ,ȊȰ,ǻȶ,Ȱȶ,ȑȟ,Ǯȕ,ȟȕ,Ȓɋ,ǝȻ,ɋȻ,ȐȚ,Ǵȹ,Țȹ,ǲȱ,ȈɃ,ȱɃ,ǦɁ,ǭɈ,ɁɈ,Ƿɀ,Ȁș,ɀș,Ǟɇ,ǿȲ,ɇȲ,Ȅȫ,ǜȾ,ȫȾ,ȗɨ,ȹɹ,ɨɹ,ȣɎ,Ȟɫ,Ɏɫ,Ȧɪ,Ʌɣ,ɪɣ,Ɉɽ,ȯɖ,ɽɖ,Ȯɠ,ȷɮ,ɠɮ,Țɸ,Ȱɲ,ɸɲ,Ȭɒ,Ȼɷ,ɒɷ,ȳɍ,Ɇɑ,ɍɑ,Ƞɞ,ȥɝ,ɞɝ,ȭɌ,ɂɛ,Ɍɛ,Ⱦʃ,ȡɢ,ʃɢ,Ȩɏ,Ɂɼ,ɏɼ,Ȗɚ,ȴɜ,ɚɜ,ȧɱ,Ⱥɩ,ɱɩ,Ȝɘ,ɉɗ,ɘɗ,ȕɵ,ȸɯ,ɵɯ,Ȳʁ,Ʉɰ,ʁɰ,ɇʀ,Ȥɕ,ʀɕ,ȟɴ,ȫʂ,ɴʂ,ȶɳ,ȝɦ,ɳɦ,ɀɾ,Ȕɡ,ɾɡ,Ƚə,țɓ,əɓ,ȿɧ,Ȣɭ,ɧɭ,Șɐ,șɿ,ɐɿ,ȱɺ,ȵɔ,ɺɔ,Ɋɟ,ȼɥ,ɟɥ,Ȫɤ,ȩɬ,ɤɬ,ɋɶ,Ƀɻ,ɶɻ,ɾʬ,ɿʳ,ʬʳ,ɯʣ,ɟʶ,ʣʶ,ɧʰ,ɷʑ,ʰʑ,ɱʞ,ɗʡ,ʞʡ,ɦʫ,ɲʏ,ʫʏ,ɰʥ,ɑʓ,ʥʓ,ɖʋ,ɓʯ,ʋʯ,ɨʄ,ɣʉ,ʄʉ,ɽʊ,ɐʲ,ʊʲ,ɍʒ,ʁʤ,ʒʤ,Ɏʆ,ɢʙ,ʆʙ,əʮ,ɥʷ,ʮʷ,ɬʹ,ɞʔ,ʹʔ,ɤʸ,ɻʻ,ʸʻ,ʃʘ,ɵʢ,ʘʢ,ɴʨ,ɭʱ,ʨʱ,ɛʗ,ɮʍ,ʗʍ,ɏʚ,ɕʧ,ʚʧ,ɸʎ,ɚʜ,ʎʜ,ɒʐ,ɝʕ,ʐʕ,Ɍʖ,ɪʈ,ʖʈ,ɩʟ,ɹʅ,ʟʅ,ɫʇ,ɶʺ,ʇʺ,ɡʭ,ʂʩ,ʭʩ,ɔʵ,ɜʝ,ʵʝ,ɳʪ,ɺʴ,ʪʴ,ɼʛ,ɠʌ,ʛʌ,ɘʠ,ʀʦ,ʠʦ,ʒˎ,ʈ˥,ʶʿ,ʋˈ,ʓˇ,ʵˬ,ʍ˝,ʸ˖,ʊˌ,ʫ˄,ʙˑ,ʱ˛,ʣʾ,ʹ˔,ʅ˧,ʰˀ,ʘ˘,ʧ˟,ʯˉ,ʲˍ,ʔ˕,ʟ˦,ʮ˒,ʉˋ,ʐˢ,ʬʼ,ʏ˅,ʗ˜,ʖˤ,ʥˆ,ʆː,ʭ˪,ʡ˃,ʌ˱,ʺ˩,ʎˠ,ʤˏ,ʞ˂,ʜˡ,ʑˁ,ʚ˞,ʩ˫,ʄˊ,ʛ˰,ʨ˚,ʪˮ,ʝ˭,ʴ˯,ʇ˨,ʻ˗,ʕˣ,ʠ˲,ʢ˙,ʷ˓,ʦ˳,ʳʽ,ˎ,˥,ʿ,ˈ,ˇ,ˬ,˝,˖,ˌ,˄,ˑ,˛,ʾ,˔,˧,ˀ,˘,˟,ˉ,ˍ,˕,˦,˒,ˋ,ˢ,ʼ,˅,˜,ˤ,ˆ,ː,˪,˃,˱,˩,ˠ,ˏ,˂,ˡ,ˁ,˞,˫,ˊ,˰,˚,ˮ,˭,˯,˨,˗,ˣ,˲,˙,˓,˳,ʽ->'

tensor_list = expression.split(',')
tensor_list[-1] = tensor_list[-1][:-2]
operands = [np.zeros([2] * len(tensor)) for tensor in tensor_list]

threads = 1 # This is the only difference when I change the number of threads
network = Network(expression, *operands)
network.optimizer_config_ptr = cutn.create_contraction_optimizer_config(network.handle)
network._set_opt_config_option('SIMPLIFICATION_DISABLE_DR', cutn.ContractionOptimizerConfigAttribute.SIMPLIFICATION_DISABLE_DR, 1)
optimizer_options = configuration.OptimizerOptions(samples=64, threads=threads)
print('start')
start = time.time()
path, info = network.contract_path(optimize=optimizer_options)
print(f'Time: {time.time() - start}. Cost: {info.opt_cost}')

Request for releasing a new version of cuQuantum Appliance

qsim v0.17.0 has just been released, which contains a new method to run statevector simulation beyond 32 qubits (quantumlib/qsim#623). Currently, we (BlueQubit cc: @hthayko) are blocked from using the latest cuQuantum Appliance due to this 32 qubits limitation. A new release that includes v0.17.0 would be appreciated!

test

Wrong sign in a single-gate-circuit statevector?

Hi,

I stumbled upon a strange case of State object returning a seemingly wrong statevector on compute, for a two-qubit circuit with a single (single-qubit) gate. The gate in question is a parameterised Ry gate (such as described here, and I am anyway providing a specific unitary for it, so the nature of the gate should not matter.

First, I calculate statevector manually as follows:

import cupy as cp
import numpy as np

gate_ry = np.asarray([[ 0.33873792+0.j, -0.94088077+0.j],
                      [ 0.94088077+0.j,  0.33873792+0.j]], dtype='complex128')
I = np.array([[1. +0.j, 0. +0.j],[0. +0.j, 1. +0.j]])
mat = np.kron(gate_ry, I)  # So I put the Ry gate on the first qubit
sv_ini = np.asarray([1, 0, 0, 0])
mat @ sv_ini

and get the following result:

array([0.33873792+0.j, 0.        +0.j, 0.94088077+0.j, 0.        +0.j])

(I get the same result if I use cupy instead of numpy, and also with the pytket package).

Now, I try running the following script (as per your example, also discussed in another discussion previously with you):

import cupy as cp
import numpy as np

import cuquantum
from cuquantum import cutensornet as cutn

dev = cp.cuda.Device()  # get current device
props = cp.cuda.runtime.getDeviceProperties(dev.id)

num_qubits = 2
dim = 2
qubits_dims = (dim, ) * num_qubits

handle = cutn.create()
stream = cp.cuda.Stream()
data_type = cuquantum.cudaDataType.CUDA_C_64F

gate_ry = cp.asarray([[ 0.33873792+0.j, -0.94088077+0.j],
                      [ 0.94088077+0.j,  0.33873792+0.j]], dtype='complex128', order='F')
gate_ry_strides = 0

free_mem = dev.mem_info[0]
scratch_size = free_mem // 2
scratch_space = cp.cuda.alloc(scratch_size)

# Create the initial quantum state
quantum_state = cutn.create_state(handle, cutn.StatePurity.PURE, num_qubits, qubits_dims, data_type)
print("Created the initial quantum state")

# Construct the quantum circuit state with gate application
tensor_id = cutn.state_apply_tensor(  # here I am also applying the gate to the first (zeroth) qubit
        handle, quantum_state, 1, (0, ), 
        gate_ry.data.ptr, gate_ry_strides, 1, 0, 1)

# Configure the quantum circuit expectation value computation
num_hyper_samples_dtype = cutn.state_get_attribute_dtype(cutn.ExpectationAttribute.OPT_NUM_HYPER_SAMPLES)
num_hyper_samples = np.asarray(8, dtype=num_hyper_samples_dtype)
cutn.state_configure(handle, quantum_state, 
cutn.StateAttribute.NUM_HYPER_SAMPLES, 
num_hyper_samples.ctypes.data, num_hyper_samples.dtype.itemsize)

# Prepare the computation of the specified quantum circuit expectation value
work_desc = cutn.create_workspace_descriptor(handle)
cutn.state_prepare(handle, quantum_state, scratch_size, work_desc, stream.ptr)
print("Prepare the computation of the specified quantum circuit expectation value")

workspace_size_d = cutn.workspace_get_memory_size(handle, 
    work_desc, cutn.WorksizePref.RECOMMENDED, cutn.Memspace.DEVICE, cutn.WorkspaceKind.SCRATCH)

if workspace_size_d <= scratch_size:
    cutn.workspace_set_memory(handle, work_desc, cutn.Memspace.DEVICE, cutn.WorkspaceKind.SCRATCH, scratch_space.ptr, workspace_size_d)
else:
    print("Error:Insufficient workspace size on Device")
    cutn.destroy_workspace_descriptor(work_desc)
    cutn.destroy_state(quantum_state)
    cutn.destroy(handle)
    del scratch
    print("Free resource and exit.")

sv = cp.empty((2,) * 2, dtype="complex128", order="F")
statevec = cutn.state_compute(
            handle,
            quantum_state,
            work_desc,
            (sv.data.ptr, ), # note this should be a sequence and this sequence is on host
            stream.ptr,
)

sv.flatten()

...  # destroy stuff here

and I am getting the following vector:

array([ 0.33873792+0.j,  0.        +0.j, -0.94088077+0.j,  0.        +0.j])

As you can see, the third element sign is different to the reference result. It puzzles me a lot, because the same script (and a more structured API interface that I've done) give correct results for other types of gates.

Am I missing something, or is there a strange bug in State?

Many thanks!
Iakov

Cannot import dependencies from cuquantum in colab

When I ran the file :https://github.com/NVIDIA/cuQuantum/blob/main/python/samples/cutensornet/circuit_converter/cirq_advanced.ipynb
in colab i am getting this error when I import any dependencies of cuquantum

This is the version of cuquantum installed in colab:

Releasing `qsim_mgpu` source on GitHub instead of only binaries the Docker container

Dear cuQuantum developers,

After long struggles in trying to get qsim to compile with multi-GPU support, I found out that one cannot accomplish this using stock qsim. IMO this wasn't very clear from the documentation.

Ultimately, I found out by running the NVIDIA cuQuantum Appliance Docker container and checking the git diff --no-index between the original qsim_simulator.py and the ~/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/qsimcirq/qsim_circuit.py file in the Docker container.

It turns out that this changed version uses a module qsim_mgpu which appears to be unavailable outside the said Docker container.

Unfortunately, my current software stack cannot integrate Docker containers (we need full control over the entire environment), yet the multi-GPU support offered by qsim_mgpu is essential for our work. We would greatly benefit from the ability to compile qsim_mgpu independently.

Could you please consider releasing the modified qsim code/fork with qsim_mgpu for standalone use? This would be immensely beneficial for us and potentially for others in the community facing similar challenges.

If there are reasons for keeping this code exclusive to the Docker environment, understanding them could help us explore alternative solutions.

Tagging the core maintainers: @leofang @ahehn-nv @mtjrider @Takuma-Yamaguchi

[Question] Issues building cuquantum-python from source

I have the cuquantum-23.06 appliance container mounted with a volume containing my project directories and I'm trying to build cuquantum-python from source (specifically using cutensornet) but I run into errors following the given instructions.

First I pull the container with my mount docker run --gpus all -it --rm -v /home/ubuntu:/home/ubuntu -e HOME=/home/ubuntu nvcr.io/nvidia/cuquantum-appliance:23.06
Clone the repo: git clone https://github.com/NVIDIA/cuQuantum
Set CUDA_PATH: export CUDA_PATH=/usr/local/cuda
CD into the pkg lvl dir cd cuQuantum/python
Run pip install: pip install -e .

But returns this error saying it couldn't find cutensornet or custatevec:

    g++ -pthread -B /home/cuquantum/conda/envs/cuquantum-23.06/compiler_compat -shared -Wl,--allow-shlib-undefined -Wl,-rpath,/home/cuquantum/conda/envs/cuquantum-23.06/lib -Wl,-rpath-link,/home/cuquantum/conda/envs/cuquantum-23.06/lib -L/home/cuquantum/conda/envs/cuquantum-23.06/lib -Wl,--allow-shlib-undefined -Wl,-rpath,/home/cuquantum/conda/envs/cuquantum-23.06/lib -Wl,-rpath-link,/home/cuquantum/conda/envs/cuquantum-23.06/lib -L/home/cuquantum/conda/envs/cuquantum-23.06/lib build/temp.linux-x86_64-cpython-39/cuquantum/cutensornet/cutensornet.o -L/tmp/pip-build-env-h7eauy9b/normal/lib/python3.9/site-packages/cuquantum/lib -L/tmp/pip-build-env-h7eauy9b/normal/lib/python3.9/site-packages/cuquantum/lib64 -lcutensornet -o build/lib.linux-x86_64-cpython-39/cuquantum/cutensornet/cutensornet.cpython-39-x86_64-linux-gnu.so
    /home/cuquantum/conda/envs/cuquantum-23.06/compiler_compat/ld: cannot find -lcutensornet: No such file or directory
    collect2: error: ld returned 1 exit status
    g++ -pthread -B /home/cuquantum/conda/envs/cuquantum-23.06/compiler_compat -shared -Wl,--allow-shlib-undefined -Wl,-rpath,/home/cuquantum/conda/envs/cuquantum-23.06/lib -Wl,-rpath-link,/home/cuquantum/conda/envs/cuquantum-23.06/lib -L/home/cuquantum/conda/envs/cuquantum-23.06/lib -Wl,--allow-shlib-undefined -Wl,-rpath,/home/cuquantum/conda/envs/cuquantum-23.06/lib -Wl,-rpath-link,/home/cuquantum/conda/envs/cuquantum-23.06/lib -L/home/cuquantum/conda/envs/cuquantum-23.06/lib build/temp.linux-x86_64-cpython-39/cuquantum/custatevec/custatevec.o -L/tmp/pip-build-env-h7eauy9b/normal/lib/python3.9/site-packages/cuquantum/lib -L/tmp/pip-build-env-h7eauy9b/normal/lib/python3.9/site-packages/cuquantum/lib64 -lcustatevec -o build/lib.linux-x86_64-cpython-39/cuquantum/custatevec/custatevec.cpython-39-x86_64-linux-gnu.so
    /home/cuquantum/conda/envs/cuquantum-23.06/compiler_compat/ld: cannot find -lcustatevec: No such file or directory
    collect2: error: ld returned 1 exit status
    error: command '/usr/bin/g++' failed with exit code 1
    [end of output]

Shouldn't they be built and installed in the process of running this command?

Distributed MPI simulation: cudaErrorInvalidResourceHandle

I'm following the example for distributed multi-GPU simulation using MPI: https://github.com/NVIDIA/cuQuantum/blob/main/python/samples/custatevec/distributed_index_bit_swap_mpi.py

When I run it with mpiexec -n 2 python distributed_index_bit_swap_mpi.py I get the following stacktrace:

  File "...distributed_index_bit_swap_mpi.py", line 266, in <module>
    run_distributed_index_bit_swaps(
  File "...distributed_index_bit_swap_mpi.py", line 166, in run_distributed_index_bit_swaps
    d_sub_sv_p2p = cp.cuda.runtime.ipcOpenMemHandle(dst_mem_handle)
  File "cupy_backends/cuda/api/runtime.pyx", line 456, in cupy_backends.cuda.api.runtime.ipcOpenMemHandle
  File "cupy_backends/cuda/api/runtime.pyx", line 462, in cupy_backends.cuda.api.runtime.ipcOpenMemHandle
  File "cupy_backends/cuda/api/runtime.pyx", line 143, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidResourceHandle: invalid resource handle
x3005c0s37b0n0.hsn.cm.polaris.alcf.anl.gov: rank 1 exited with code 1

I also created a minimal reproducible example that gives the same error:

import cupy as cp
from mpi4py import MPI

rank = MPI.COMM_WORLD.Get_rank()
cp.cuda.runtime.setDevice(rank)
X = cp.zeros(100)
ipc_mem_handle = cp.cuda.runtime.ipcGetMemHandle(X.data.ptr)
# This line will also raise:
#local_open_handle = cp.cuda.runtime.ipcOpenMemHandle(ipc_mem_handle)
ipc_mem_handles = MPI.COMM_WORLD.allgather(ipc_mem_handle)
other = (rank + 1) % MPI.COMM_WORLD.Get_size()
remote_handle = ipc_mem_handles[other]
remote_open_handle = cp.cuda.runtime.ipcOpenMemHandle(remote_handle)

Notably, calling the ipcOpenMemHandle on the local handle raises cudaErrorDeviceUninitialized: invalid device context, similar to this issue

My config is:

In [1]: import cupy as cp; cp.show_config()
OS                           : Linux-5.3.18-150300.59.115-default-x86_64-with-glibc2.31
Python Version               : 3.10.9
CuPy Version                 : 11.5.0
CuPy Platform                : NVIDIA CUDA
NumPy Version                : 1.23.5
SciPy Version                : 1.11.1
Cython Build Version         : 0.29.32
Cython Runtime Version       : 0.29.33
CUDA Root                    : /soft/compilers/cudatoolkit/cuda-11.4.4
nvcc PATH                    : /soft/compilers/cudatoolkit/cuda-11.4.4/bin/nvcc
CUDA Build Version           : 11080
CUDA Driver Version          : 11040
CUDA Runtime Version         : 11040
cuBLAS Version               : (available)
cuFFT Version                : 10502
cuRAND Version               : 10205
cuSOLVER Version             : (11, 2, 0)
cuSPARSE Version             : (available)
NVRTC Version                : (11, 4)
Thrust Version               : 101501
CUB Build Version            : 101501
Jitify Build Version         : 4a37de0
cuDNN Build Version          : 8700
cuDNN Version                : 8600
NCCL Build Version           : 21602
NCCL Runtime Version         : 21602
cuTENSOR Version             : 10700
cuSPARSELt Build Version     : None
Device 0 Name                : NVIDIA A100-SXM4-40GB
Device 0 Compute Capability  : 80
Device 0 PCI Bus ID          : 0000:07:00.0
Device 1 Name                : NVIDIA A100-SXM4-40GB
Device 1 Compute Capability  : 80
Device 1 PCI Bus ID          : 0000:46:00.0
Device 2 Name                : NVIDIA A100-SXM4-40GB
Device 2 Compute Capability  : 80
Device 2 PCI Bus ID          : 0000:85:00.0
Device 3 Name                : NVIDIA A100-SXM4-40GB
Device 3 Compute Capability  : 80
Device 3 PCI Bus ID          : 0000:C7:00.0

Appreciate any help!

Is speed up using multiple GPUs on qsim-mgpu available?

Hi, I'm using NVIDIA docker container 23.03 cuquantum appliance and trying to see the multi-gpu speed up.
I used frontend with cirq and backend with qsim-mgpu and tested pretty much all of the benchmark provided in the github with qubits ranging 10 to 30+.
However, I see performance degradation with multiple GPUs compared to using only single GPU.

I assume this is based on data communication time among GPUs, but I would like to see the performance improvement as stated in NVIDIA cuStatevec blog.

Can anyone help?

cuQuantum MPS Simulator vs Qiskit Aer

I was playing with the MPS simulator from https://github.com/NVIDIA/cuQuantum/blob/main/python/samples/cutensornet/tn_algorithms/mps_algorithms.ipynb, specifically trying to test its performance when reducing the bond dimension when simulating some Trotterized time evolution circuits. Following that notebook, I transform my qiskit circuit to tensor form with CircuitToEinsum, and then apply the gates with the apply_gate function (modifying the exact_gate_algorithm as exact_gate_algorithm = {'qr_method': False, 'svd_method':{'partition': 'UV', 'abs_cutoff':1e-12, 'max_extent': bondDim}}, with bondDim being the parameter that controls the bond dimension), and finally compute some observable with mps_helper.contract_expectation. What I'm seeing is that if I compare with the MPS simulator from Qiskit Aer, for the same value of bond dimension (controlled as AerEstimator(run_options= {"method": "matrix_product_state", "shots": None, "matrix_product_state_max_bond_dimension": bondDim}, approximation=True)), the cuQuantum result is further away from the "true" value than the qiskit-aer one (for circuits with around 100 qubits and depth between 100 and 700, with bondDim=50, qiskit results already look they have converged, but not the cuQuantum ones).
Am I doing something wrong on the cuQuantum side, or I cannot compare the results from the two methods? (I am using cuquantum-appliance:23.10)
Thank you!

[Feature] [Unprioritized] CircuitToEinsum: batched expectation values

Discussed in #52

See #52 (reply in thread)

As an example, if the workload required many batches of expectation values be calculated for the reduction, that would make a difference.

Regarding "sum of Paulis," associative binary reduction is implemented by many other libraries.

Issue with using the Cirq frontend simulate(program=...)

Referencing this issue: quantumlib/qsim#618

cc @paaige

Pytorch and cuQuantum

Hi,

This lib looks exciting!

Just wondering if it would be possible to integrate cuQuantum into a standard PyTorch program to build hybrid classical and quantum models which are able to use both CUDA and cuQuantum accelerations in one program seamlessly, e.g., using the Python cuStateVec API?

Looked at the Python API samples it seems they demonstrate mostly standalone usages. Any pointers or comments much appreciated.

Thanks!

[Performance] cuTN circuit2einsum slower than opt_einsum

Discussed in #53

^{Originally posted by rht May 12, 2023}
I was benchmarking cuTensorNet and opt_einsum on the QFT and QAOA circuit, and I found the former to be consistently slower than the latter.
I used the same code in #23 for cuTensorNet, and for opt_einsum:

bitstring = "0" * len(qubits)
# https://optimized-einsum.readthedocs.io/en/stable/autosummary/opt_einsum.contract_path.html#opt_einsum.contract_path
expression, operands = myconverter.amplitude(bitstring=bitstring)
tic = time.time()
path, path_info = oe.contract_path(expression, *operands)
elapsed1 = time.time() - tic
print("Elapsed opt_einsum path finding", elapsed1)
tic = time.time()
output = oe.contract(expression, *operands, optimize=path)
elapsed2 = time.time() - tic
print("Elapsed opt_einsum contract", elapsed2)

Plot for QFT:

I think it has something to do with cuTensorNet's path finding being closer to the global optimum. Is there a way to tweak the hyperoptimizer to either stop early, or to have a larger error tolerance, so as to minimize the overall time?

Using cuQuantum Appliance 23.03 with Apptainer/Singularity

I see. I am using the cuquantum-appliance-23.03 and this is my workflow to recreate this issue:

$ apptainer pull docker://nvcr.io/nvidia/cuquantum-appliance:23.03

Then create a file ghz.py as in the documentation, run the container and execute the file

$ apptainer shell --nv -B $PWD cuquantum-appliance_23.03.sif
$ python ghz.py

Traceback (most recent call last):
  File "ghz.py", line 2, in <module>
    from cusvaer.backends import StatevectorSimulator
  File "/scratch/jookare/ghz.py", line 2, in <module>
    from cusvaer.backends import StatevectorSimulator
ModuleNotFoundError: No module named 'cusvaer.backends'; 'cusvaer' is not a package

Originally posted by @Jookare in #50 (reply in thread)

pip module search for libcublas.so.11

Environment:

Ubuntu 20.04 LTS
Python 3.8

Symptom:

>>> import cuquantum
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/cuquantum/__init__.py", line 1, in <module>
    from cuquantum import custatevec
  File "/usr/local/lib/python3.8/dist-packages/cuquantum/custatevec/__init__.py", line 1, in <module>
    from cuquantum.custatevec.custatevec import *
ImportError: libcublas.so.11: cannot open shared object file: No such file or directory

I guess that pyculib should be converted to pip, too.

Build issues with top-level cutensornet.h symlink

When installing the latest version of cuquantum via apt-get, /usr/include/cutensornet.h is a symlink to /usr/include/libcuquantum/12/cutensornet.h (I'm using CUDA12).

However, this header file refers to <cutensornet/types.h> in cutensornet sub-directory, which is not symlinked to /usr/include/. This would create some difficulties in setting up CMake whereby it will fail to build if /usr/include/ is detected as the cutensornet include directory, for example.

[Question/Issue] How to install cuquantum on WSL2 Ubuntu-20.04?

I'm unsure whether this is an issue on my side, or just a missing compatibility.
**Problem: ** When installing custatevec on Ubuntu 20.04 (on WSL2), it fails on the last of the following commands:

$ wget https://developer.download.nvidia.com/compute/cuquantum/22.07.1/local_installers/cuquantum-local-repo-ubuntu2004-22.07.1_1.0-1_amd64.deb
$ sudo dpkg -i cuquantum-local-repo-ubuntu2004-22.07.1_1.0-1_amd64.deb
$ sudo cp /var/cuquantum-local-repo-ubuntu2004-22.07.1/cuquantum-*-keyring.gpg /usr/share/keyrings/
$ sudo apt-get update
$ sudo apt-get -y install cuquantum cuquantum-dev cuquantum-doc

with the following cuBLAS error:

$ sudo apt-get -y install cuquantum cuquantum-dev cuquantum-doc
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 cuquantum : Depends: libcublaslt.so.11 but it is not installable or
                      libcublas-11-0 but it is not installable or
                      libcublas-11-1 but it is not installable or
                      libcublas-11-2 but it is not installable or
                      libcublas-11-3 but it is not installable or
                      libcublas-11-4 but it is not installable or
                      libcublas-11-5 but it is not installable or
                      libcublas-11-6 but it is not installable
             Depends: libcutensor1 but it is not installable
E: Unable to correct problems, you have held broken packages.

Any idea what may be causing this? I looked into cublas directly but I thought it may be too invasive for such a simple issue. I already have custatevec working on an ubuntu GCP instance so I'm wondering if this is due to an incompatibility with Window's Subsystem for Linux 2.

Not able to install in M1

I am trying to install cuquantum in an m1 machine. I get this:

conda install -c conda-forge cuquantum-python

Retrieving notices: ...working... done
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

  - cuquantum-python

Current channels:

  - https://conda.anaconda.org/conda-forge/osx-arm64
  - https://conda.anaconda.org/conda-forge/noarch
  - https://repo.anaconda.com/pkgs/main/osx-arm64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/osx-arm64
  - https://repo.anaconda.com/pkgs/r/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

Is m1 installation supported?

docs: Inconsistent number for the default value of CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SLICER_SLICE_FACTOR

https://docs.nvidia.com/cuda/cuquantum/cutensornet/overview.html#slicing says the default value is 32. But https://docs.nvidia.com/cuda/cuquantum/cutensornet/api/types.html#_CPPv4N49cutensornetContractionOptimizerConfigAttributes_t60CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SLICER_SLICE_FACTORE says the default value is 2.

3XTF32 issue with the most recent cuquantum

When I create a Network using

Network(eq, *operands, options={'compute_type': cuquantum.cutensornet.ComputeType.COMPUTE_3XTF32})

it gives me the following error:

ValueError: <ComputeType.COMPUTE_3XTF32: 8192> is not a valid ComputeType

This is not a problem with TF32, however. This is because there is no 3XTF32 in cuQuantum/python/cuquantum /_utils.pyx?
I have

cuda-version              11.8                 h70ddcb2_3    conda-forge
cudatoolkit               11.8.0              h4ba93d1_13    conda-forge
cupy                      13.0.0          py311h878bca4_3    conda-forge
cupy-core                 13.0.0          py311heecd119_3    conda-forge
cuquantum-python          24.03.0         py311h8bf0e4b_3    conda-forge
custatevec                1.6.0                h56904bc_3    conda-forge
cutensor                  2.0.1.2              hcdd5f01_0    conda-forge
cutensornet               2.4.0           nompi_h56904bc_103    conda-forge

cirq + custatevec on multiple GPUs

Hi,

I am using cuQuantum Appliance 22.07-Cirq to experiment with cirq + custatevec simulator.
I am able to run up to 32 qubit simulations with single NVidia A100 40GB gpu as expected.

However I am having trouble getting it to run on multiple GPUs. I am using QSimOptions.gpu_mode = 2 to achieve this as explained in cuquantum docs but I only see one gpu being used thourgh nvidia-smi command, and I run out of memory for 33 qubits.

here is a minimal reproducer:

import cirq
import qsimcirq

def load_test(num_gpus = 1, depth = 4, num_qubits = 30):
	circuit = cirq.testing.random_circuit(
		qubits = num_qubits,
		n_moments = depth,
		op_density = 1.0,
		random_state = 1)
	num_gates = len(list(circuit.all_operations()))
	options = {"gpu_mode": num_gpus, "n_subsvs": num_gpus}
	qsim_simulator = qsimcirq.QSimSimulator(options)
	result = qsim_simulator.simulate(circuit)
	print (f"DONE with qubits: {num_qubits} \t gates: {num_gates} \t depth: {depth} \t")

load_test(num_gpus = 2, depth = 2, num_qubits = 30) # uses  ~8GB memory on 1 gpu
load_test(num_gpus = 2, depth = 2, num_qubits = 32) # uses ~32GB memory on 1 gpu
load_test(num_gpus = 2, depth = 2, num_qubits = 33) # CUDA error: out of memory vector_mgpu.h 116

I have 2 NVidia A100 gpu's on my machine and here is the output of nvidia-smi for 32 qubit case:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-SXM...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0    82W / 400W |  33561MiB / 40960MiB |     23%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM...  Off  | 00000000:00:05.0 Off |                    0 |
| N/A   43C    P0    75W / 400W |      2MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     26614      C   python                          33559MiB |
+-----------------------------------------------------------------------------+

Am I missing something?

[Feature] [Unprioritized] Rust Language Support

More and more quantum frameworks and libraries are being written in Rust. For example, Qiskit and tket. While there are possible integrations through C or Python bindings, it would be great to have a Rust crate.

The output of apply_matrix_batched in cuQuantum Python is same as input

This is a very cool library! I've just tried out this library using the batched_gate_application.py sample (without changing anything in the sample code) but I get ValueError: does not match result. When I checked the input and output state vectors, they are exactly the same. Then, I tried other functions including apply_gate_matrix and compute_expectations_on_pauli_basis but they are also the same, the output state vector is same as the input one. I've also tried reinstalling cuQuantum Python, reinstalling CUDA toolkit, and creating a new Conda environment but it all didn't work. Any helps will be appreciated, thanks!

`cudaq` never giving correct result for `maxcut` QAOA problem.

I tried solving a simple $6$ node Max Cut problem , using both qiskit and cudaq. While qiskit usually solves it correctly, cudaq never solved to the right answer.

How to Recreate?

Step 1: Make a MaxCut Graph Problem

import networkx as nx

from qiskit_optimization.applications import Maxcut

seed = 1
num_nodes = 6


G = nx.random_regular_graph(d=3, n=num_nodes, seed=seed)
nx.draw(G, with_labels=True, pos=nx.spring_layout(G, seed=seed))

maxcut = Maxcut(G)
problem = maxcut.to_quadratic_program()
print(problem.prettyprint())

This is the Quadratic Program for it:

Problem name: Max-cut

Maximize
  -2*x_0*x_1 - 2*x_0*x_3 - 2*x_0*x_4 - 2*x_1*x_2 - 2*x_1*x_5 - 2*x_2*x_3
  - 2*x_2*x_4 - 2*x_3*x_5 - 2*x_4*x_5 + 3*x_0 + 3*x_1 + 3*x_2 + 3*x_3 + 3*x_4
  + 3*x_5

Subject to
  No constraints

  Binary variables (6)
    x_0 x_1 x_2 x_3 x_4 x_5

Step 2: Making a Hamiltonian

qubitOp, offset = qp.to_ising()
print("Offset:", offset)
print("Ising Hamiltonian:")
print(str(qubitOp))

The resulting Hamiltonian is

Offset: -4.5
Ising Hamiltonian:
SparsePauliOp(['IIIIZZ', 'IIZIIZ', 'IZIIIZ', 'IIIZZI', 'ZIIIZI', 'IIZZII', 'IZIZII', 'ZIZIII', 'ZZIIII'],
              coeffs=[0.5+0.j, 0.5+0.j, 0.5+0.j, 0.5+0.j, 0.5+0.j, 0.5+0.j, 0.5+0.j, 0.5+0.j,
 0.5+0.j])

Step 3 : Make a QAOA and Solve;

# QAOA ansatz circuit
ansatz = QAOAAnsatz(hamiltonian, reps=3)
def cost_func(params, ansatz, hamiltonian, estimator):
    """Return estimate of energy from estimator

    Parameters:
        params (ndarray): Array of ansatz parameters
        ansatz (QuantumCircuit): Parameterized ansatz circuit
        hamiltonian (SparsePauliOp): Operator representation of Hamiltonian
        estimator (Estimator): Estimator primitive instance

    Returns:
        float: Energy estimate
    """
    cost = estimator.run(ansatz, hamiltonian, parameter_values=params).result().values[0]
    return cost

x1 = np.random.uniform(- np.pi / 8.0, np.pi/ 8.0 ,ansatz.num_parameters)
res = minimize(cost_func, x1, args=(ansatz, hamiltonian, estimator), method="COBYLA")

The result it gives is:
011010 which is exactly similar to the one I'm getting via Brute Force.

But the moment I make use of cudaq and use the code:

import cudaq
from cudaq import spin
import matplotlib.pyplot as plt
import numpy as np
import time
# Here we build up a kernel for QAOA with `p` layers, with each layer
# containing the alternating set of unitaries corresponding to the problem
# and the mixer Hamiltonians. The algorithm leverages the VQE algorithm
# to compute the Max-Cut of a rectangular graph illustrated below.

#       v0  0---------------------0 v1
#           |                     |
#           |                     |
#           |                     |
#           |                     |
#       v3  0---------------------0 v2
# The Max-Cut for this problem is 0101 or 1010.

# The problem Hamiltonian
#hamiltonian = 0.5 * spin.z(0) * spin.z(1) + 0.5 * spin.z(1) * spin.z(2) + 0.5 * spin.z(0) * spin.z(5) + 0.5 * spin.z(2) * spin.z(3) + 0.5 * spin.z(3) * spin.z(4) + 0.5 * spin.z(4) * spin.z(5) - 3.0
# 10 Node Max Cut
# Set the target to our density matrix simulator.
#cudaq.set_target('density-matrix-cpu')



hamiltonian =  0.5 * spin.z(0) * spin.z(1) + 0.5 * spin.z(0) * spin.z(2) + 0.5 * spin.z(1) * spin.z(3) + 0.5 * spin.z(2) * spin.z(3) + 0.5 * spin.z(0) * spin.z(4) + 0.5 * spin.z(3) * spin.z(4) + 0.5 * spin.z(1) * spin.z(5)  + 0.5 * spin.z(2) * spin.z(5) + 0.5 * spin.z(4) * spin.z(5) - 4.5 
# Problem parameters.
qubit_count: int = 6
layer_count: int = 3
parameter_count: int = 2 * layer_count


def kernel_qaoa() -> cudaq.Kernel:
    """QAOA ansatz for Max-Cut"""
    kernel, thetas = cudaq.make_kernel(list)
    qvec = kernel.qalloc(qubit_count)

    # Create superposition
    kernel.h(qvec)

    # Loop over the layers
    for i in range(layer_count):
        # Loop over the qubits
        # Problem unitary
        for j in range(qubit_count):
            kernel.cx(qvec[j], qvec[(j + 1) % qubit_count])
            kernel.rz(2.0 * thetas[i], qvec[(j + 1) % qubit_count])
            kernel.cx(qvec[j], qvec[(j + 1) % qubit_count])

        # Mixer unitary
        for j in range(qubit_count):
            kernel.rx(2.0 * thetas[i + layer_count], qvec[j])

    return kernel


# Specify the optimizer and its initial parameters. Make it repeatable.
cudaq.set_random_seed(2)
optimizer = cudaq.optimizers.COBYLA()
optimizer.max_iterations = 1000
np.random.seed(20)
optimizer.initial_parameters = np.random.uniform(- np.pi / 8.0, np.pi / 8.0, parameter_count)
#optimizer.initial_parameters = 2.0 * np.pi * np.random.rand(parameter_count)
print("Initial parameters = ", optimizer.initial_parameters)

# Pass the kernel, spin operator, and optimizer to `cudaq.vqe`.
tic = time.time()
optimal_expectation, optimal_parameters = cudaq.vqe(
    kernel=kernel_qaoa(),
    spin_operator=hamiltonian,
    optimizer=optimizer,
    parameter_count=parameter_count)

# Print the optimized value and its parameters
print("Optimal value = ", optimal_expectation)
print("Optimal parameters = ", optimal_parameters)

# Sample the circuit using the optimized parameters
counts = cudaq.sample(kernel_qaoa(), optimal_parameters)
toc = time.time()
print("Time taken = ", toc-tic)
ny_dict = dict(sorted(counts.items(),key=lambda item: item[1], reverse=True))
print(dict(sorted(counts.items(),key=lambda item: item[1], reverse=True)))
#counts.dump()

# plot the histogram of my_dict, for only first 10 elements
# Extract first 10 key-value pairs
first_10_items = list(ny_dict.items())[:10]
x_values = [item[0] for item in first_10_items]
y_values = [item[1] for item in first_10_items]

# Plot the data
plt.figure(figsize=(10, 6))
plt.bar(x_values, y_values, color='skyblue')
plt.xlabel('Keys')
plt.ylabel('Values')
plt.title('Plot of First 10 Key-Value Pairs')
plt.xticks(rotation=90)  # Rotate x-axis labels for better readability
plt.show()

I'm never getting the correct answer, I'm always getting 010101. Now, all the parameters used in both Qiskit and Cuda are the same, the optimizer is the same, the circuit is effectively the same and so is the Hamiltonian. But why one is giving the correct result, while cudaq is never giving the correct result.

The same happens when I scale the problem to more number of qubits, Why is this happening? Are there some other parameters that I need to optimize or hyperparameters to set before?

How to run a browser for jupyter notebook in cuQuantum docker environment ?

Following the well designed documentation I was able to successfully get this to run. I am using terminal to write my Quantum experiments.I want to know how to run a jupyter notebook inside docker container. After installing jupyter in docker and running it gives a link that doesn't open in a browser of my linux system.I then installed firefox inside cuQuantum docker but it doesn't launch the browser. Gives a display not found error.How can I run my experiments in jupyter ?

CUTENSORNET_STATUS_CUTENSOR_VERSION_MISMATCH in line 198

I am trying to compile the examples in the /samples/cutensornet directory and the compilation is successful, however it fails at runtime with the error in the subject.

According to https://github.com/NVIDIA/cuQuantum/blob/main/samples/cutensornet/README.md and https://docs.nvidia.com/cuda/cuquantum/cutensornet/index.html as of today, I must have

cuTensorNet v1.1.0
CUDA Toolkit 11.x and compatible driver r450+ (see CUDA Driver Release Notes).
cuTENSOR v1.5.0 (or above)

I actually have

cuTensorNet v1.1.0
CUDA Version: 11.7 with Driver Version: 515.48.07
cuTENSOR v1.6.0.3 (from libcutensor-linux-x86_64-1.6.0.3-archive)

So I meet all the requirements but still get that error. The only thing I can think is that libcutensor libraries are split in 3 directories for 10.2, 11, and 11.0 so perhaps CUDA v11.7 is too new? Does anybody has any insight?

nvidia / cuquantum Goto Github PK

cuquantum's Introduction

Welcome to the cuQuantum repository!

Installation

License

Citing cuQuantum

cuquantum's People

Contributors

Stargazers

Watchers

Forkers

cuquantum's Issues

Example to reproduce error:

Suspected solution:

Related bug:

Informations

What is the current behavior?

Steps to reproduce the problem

Workaround: please install typing_extensions via pip or conda:

Symptom:

Discussed in #52

See #52 (reply in thread)

Discussed in #53

How to Recreate?

Step 1: Make a MaxCut Graph Problem

Step 2: Making a Hamiltonian

Step 3 : Make a QAOA and Solve;

Recommend Projects

Recommend Topics

Recommend Org

Workaround: please install `typing_extensions` via pip or conda: