gpue-group / gpue Goto Github PK

This project forked from mlxd/gpue

GPU Gross-Pitaevskii Equation numerical solver for Bose-Einstein condensates.

Home Page: https://gpue-group.github.io/

License: Other

Makefile 0.35% Shell 0.57% C 3.69% C++ 15.08% MATLAB 6.16% Python 16.42% Cuda 55.06% CMake 0.26% TeX 1.58% Dockerfile 0.13% Julia 0.72%

cuda fft gpu numerical-calculations physics quantum-mechanics research simulation superfluid

gpue's People

Contributors

Stargazers

Watchers

Forkers

draft-land teslos timkeller15 1155092080 hifabian yongchengding huli2023

gpue's Issues

JOSS testing review issue

Submitting author: @leios
Repository: https://github.com/GPUE-group/GPUE
Version: v0.3
Editor: @leios
Reviewer: @leios
Archive: Pending

Fix unit tests + a list of other issues

Grid test:

set gridding generation to separate function
loop through check all elements == gid

parSum test:

Fix comments (64 x 64), not 8 x 8

General:

Return errors from all mallocs also

Evolution_test:

Simple harmonic trap returns energy of wavefunction

Missing tests:

Evolution test

Non Unit Tests:

Remove 3D vortex tracking (for JOSS)
Add more math functionality to dynamic parsing

Add front-end usability improvements

Allow for config file, Python, Julia, etc. usage with GPUE as backend for solving systems.

Parser inconsistent with `-r` flag

-r flag (and similar flags) should be always used after setting the data directory, regardless of when the flag is set in the command-line argument.

Energy calculation uses too much GPU memory

In the process of re-writing the energy calculation, we needed a bit more memory than previously thought. As such, if we are running a memory-intensive task, the energy calculation will not work correctly. We need to find a way to rewrite it to minimize the memory footprint.

CS: norm host implementation

Change host norm implementation to use thrust and split the vector into chunks small enough to fit on the gpu

box size inappropriate for 3D simulations

The -b flag does not scale well for 3D and when running GPUE with the default step size, the simulation is too far zoomed-in to see dynamics

AST implementation still unnecessarily allocates variables

The AST implementation is currently working as quickly, if not faster than the non-AST implementation of GPUE and should be considered to be the default. As such, there is no longer a need to pre-allocate variables on the GPU.

Overhaul of plot.py

The plot.py file is horribly documented and unclear. It needs a general rework. In addition, we need to make something similar for 3D. Ideally, we somehow incorporate gen_data.py into this script.

Non-GPUE dependencies included in repo

Hi all, I have noticed some external source files included from other repositories in some of the branches. I would appreciate it if these were not checked-in, and only pulled down during the initial clone/build step.

While licensing may be compatible, it is an area that can cause headaches later, and so not including these deps here would be preferred.

I am currently working on the CMake build, of which many of these deps can be pulled in as required when fully set up. I would like these deps to be scrubbed from the history as well when completed (e.g. catch2, etc.). Thanks.

Large number of stale branches

Hi all, I am hoping to prune many of the unneeded branches in the repo. Most of these have had their features either merged into other branches, or have become stale. As such, to tidy things up it would be nice if a branch is not being used, or is not needed, that it is deleted.

I have added anybody with contributor access to this just to get your inputs. If no responses, I can assume (after a reasonable amount of time) that your associated branches will not be needed. Let me know what is worth keeping, and I will track it on the project board. Thanks!

create GPU-based curl function

Determining the magnetic field takes a lot of time because it is CPU-only. As such, a GPU kernel would be nice... Along with an additional derivative function (psuedocode):

___global___ void derive(data, out, stride, n, dx){
    int gid = [init]
    if (gid + stride < n){
        out[gid] = (data[gid+stride] - data[gid])/dx
    }
    else{
        out[gid] = data[gid]/dx
    }
}

Here, a stride of 1 -> xDim, xDim -> yDim, xDim*yDim -> zDim derivatives

then we just use this to create our curl function, so we might want to make a separate __device__ kernel as well.

python scripts inconsistent / directory extectations.

The python visualization scripts need to be updated and modernized. In particular, gen_data.py needs to be made consistent in terms of where files are called from within the GPU directory.

For the most part, all python scripts should be made to run from the directory they are in, not from the GPUE root directory.

CS: separate findSupport and threshold

Separate findSupport_sort_host into findSupport and threshold, make support into a class to allow for diverse implementation

Renormalization factor for angular momentum step incorrect

We have been sloppy with the n-dimensional renormalization factors and currently just have a 1d renorm factor, but this should be split to an x, y, and z renorm factor for 1d fft's along each respective dimension

CS: deprecate format and swap

Deprecate format and swap in CCompressor and move them to CWaveFunction when the storage format is changed to double2

Dynamic variables other than potential and gauge fields

All variables should be allowed to be time-dependent, not just V, A and K.

Multi-component condensate simulations

We should be able to simulate multi-component BEC's

Add contribution guidelines

A set of guidelines for contributing would make contributions easier for both the maintainers and the potential contributors. See the GitHub docs for examples.

Change build system to CMake instead of Makefile

Allows for more automated building of GPUE

Convergence criteria for imaginary time propagation

A feature in the ground state calculation that stops the imaginary time propagation when the wave function converges (for instance, stoping the calculation when the maximum difference of the wave function or energy difference between the previous step and the current step is less then a certain value).

As Leios pointed out, it would require running the energy calculation and watching it until it levels out. I wonder if it'll be better if the energy is calculated per 100 loops (or something like that) since the energy calculation is quite time consuming as far as I know.

Thanks!

Unit test all CUDA kernels

Add user callable unit testing for all CUDA functions and kernels.

Meeting notes to be implemented

Focus on:

Usability - Cfg file
Vortex tracking: 2D/3D
Unit tests
Paper (JOSS)
GPUE-Guide: Doxygen

Pros of GPUE:

Performance
Dynamic fields
Lattice engineering

Vortex tracking segfaults if mask is too large

This isn't an easy fix without a new vortex tracker, but we turn the mask off unless vortex tracking is required for 2D

Create paper.md and follow JOSS guidelines for submission

Difficulties with the build process

Hi,

Im trying to build GPUE with the instructions provided, but i get the problems as shown below I have tried to use the standard makefile, but I also get the same issues, and I cannot fix them up by adding the -std=gnu++0x arguments.

I do have other gcc compilers avalilable, but this is the standard RHEL6 one which is the system that i have access to a GPU card on, and if i go above 5 there is an error.

Hope this helps

Mark

git clone https://github.com/GPUE-group/GPUE.git
Initialized empty Git repository in /dls/tmp/ssg37927/test/GPUE/.git/
remote: Enumerating objects: 66, done.
remote: Counting objects: 100% (66/66), done.
remote: Compressing objects: 100% (46/46), done.
remote: Total 2705 (delta 33), reused 45 (delta 20), pack-reused 2639
Receiving objects: 100% (2705/2705), 3.83 MiB | 650 KiB/s, done.
Resolving deltas: 100% (2057/2057), done.

[ssg37927@cs04r-sc-com14-09 test]$ module load cuda/8.0

[ssg37927@cs04r-sc-com14-09 test]$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

[ssg37927@cs04r-sc-com14-09 test]$ gcc --version
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23)
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[ssg37927@cs04r-sc-com14-09 test]$ module load cmake

[ssg37927@cs04r-sc-com14-09 test]$ cmake --version
cmake version 3.12.0
CMake suite maintained and supported by Kitware (kitware.com/cmake).

[ssg37927@cs04r-sc-com14-09 test]$ cd GPUE/

[ssg37927@cs04r-sc-com14-09 GPUE]$ cmake .
-- The CXX compiler identification is GNU 4.4.7
-- The CUDA compiler identification is NVIDIA 8.0.61
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working CUDA compiler: /dls_sw/apps/cuda/8.0/bin/nvcc
-- Check for working CUDA compiler: /dls_sw/apps/cuda/8.0/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Configuring done
-- Generating done
-- Build files have been written to: /dls/tmp/ssg37927/test/GPUE

[ssg37927@cs04r-sc-com14-09 GPUE]$ make
Scanning dependencies of target unitTest
[ 4%] Building CUDA object CMakeFiles/unitTest.dir/src/unit_test.cu.o
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
In file included from /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/unordered_map:35,
from /dls/tmp/ssg37927/test/GPUE/src/../include/ds.h:24,
from /dls/tmp/ssg37927/test/GPUE/src/unit_test.cu:1:
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/c++0x_warning.h:31:2: error: #error This file requires compiler and library support for the upcoming ISO C++ standard, C++0x. This support is currently experimental, and must be enabled with the -std=c++0x or -std=gnu++0x compiler options.
make[2]: *** [CMakeFiles/unitTest.dir/src/unit_test.cu.o] Error 1
make[1]: *** [CMakeFiles/unitTest.dir/all] Error 2
make: *** [all] Error 2

[ssg37927@cs04r-sc-com14-09 GPUE]$ make
[ 4%] Building CUDA object CMakeFiles/unitTest.dir/src/unit_test.cu.o
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
In file included from /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/unordered_map:35,
from /dls/tmp/ssg37927/test/GPUE/src/../include/ds.h:24,
from /dls/tmp/ssg37927/test/GPUE/src/unit_test.cu:1:
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/c++0x_warning.h:31:2: error: #error This file requires compiler and library support for the upcoming ISO C++ standard, C++0x. This support is currently experimental, and must be enabled with the -std=c++0x or -std=gnu++0x compiler options.
make[2]: *** [CMakeFiles/unitTest.dir/src/unit_test.cu.o] Error 1
make[1]: *** [CMakeFiles/unitTest.dir/all] Error 2
make: *** [all] Error 2

3D angular momentum operator incorrect

It seems like the Ax and Ay arrays are somehow mixed up in GPUE, and the current master branch does not evolve correctly with rotation.

The following apply_gauge(...) function works:

// 3D
void apply_gauge(Grid &par, double2 *wfc, double2 *Ax, double2 *Ay,
                 double2 *Az, double renorm_factor_x,
                 double renorm_factor_y, double renorm_factor_z, bool flip,
                 cufftHandle plan_1d, cufftHandle plan_dim2,
                 cufftHandle plan_dim3, double dx, double dy, double dz,
                 double time, int yDim, int size){

    dim3 grid = par.grid;
    dim3 threads = par.threads;

    if (flip){

        // 1d forward / mult by Ay
        cufftHandleError( cufftExecZ2Z(plan_1d, wfc, wfc, CUFFT_FORWARD) );
        scalarMult<<<grid,threads>>>(wfc, renorm_factor_z, wfc);
        cudaCheckError();
        if(par.bval("Ay_time")){
            EqnNode_gpu* Ay_eqn = par.astval("Ay");
            int e_num = par.ival("Ay_num");
            ast_cmult<<<grid,threads>>>(wfc, wfc, Ay_eqn, dx, dy, dz,
                                        time, e_num);
            cudaCheckError();
        }
        else{
            cMult<<<grid,threads>>>(wfc, (cufftDoubleComplex*) Ay, wfc);
            cudaCheckError();
        }
        cufftHandleError( cufftExecZ2Z(plan_1d, wfc, wfc, CUFFT_INVERSE) );
        scalarMult<<<grid,threads>>>(wfc, renorm_factor_z, wfc);
        cudaCheckError();

        // loop to multiply by Ax
        for (int i = 0; i < yDim; i++){
            cufftHandleError( cufftExecZ2Z(plan_dim2,  &wfc[i*size],
                                           &wfc[i*size], CUFFT_FORWARD) );
        }

        scalarMult<<<grid,threads>>>(wfc, renorm_factor_y, wfc);
        cudaCheckError();
        if(par.bval("Ax_time")){
            EqnNode_gpu* Ax_eqn = par.astval("Ax");
            int e_num = par.ival("Ax_num");
            ast_cmult<<<grid,threads>>>(wfc, wfc, Ax_eqn, dx, dy, dz,
                                        time, e_num);
            cudaCheckError();
        }
        else{
            cMult<<<grid,threads>>>(wfc, (cufftDoubleComplex*) Ax, wfc);
            cudaCheckError();
        }

        for (int i = 0; i < yDim; i++){
            //size = xDim * zDim;
            cufftHandleError( cufftExecZ2Z(plan_dim2, &wfc[i*size],
                                           &wfc[i*size], CUFFT_INVERSE) );
        }
        scalarMult<<<grid,threads>>>(wfc, renorm_factor_y, wfc);
        cudaCheckError();

        // 1D FFT to Az
        cufftHandleError( cufftExecZ2Z(plan_dim3, wfc, wfc, CUFFT_FORWARD) );
        scalarMult<<<grid,threads>>>(wfc, renorm_factor_x, wfc);
        cudaCheckError();

        if(par.bval("Az_time")){
            EqnNode_gpu* Az_eqn = par.astval("Az");
            int e_num = par.ival("Az_num");
            ast_cmult<<<grid,threads>>>(wfc, wfc, Az_eqn, dx, dy, dz,
                                        time, e_num);
            cudaCheckError();
        }
        else{
            cMult<<<grid,threads>>>(wfc, (cufftDoubleComplex*) Az, wfc);
            cudaCheckError();
        }

        cufftHandleError( cufftExecZ2Z(plan_dim3, wfc, wfc, CUFFT_INVERSE) );
        scalarMult<<<grid,threads>>>(wfc, renorm_factor_x, wfc);
        cudaCheckError();

    }
    else{

        // 1D FFT to Az
        cufftHandleError( cufftExecZ2Z(plan_dim3, wfc, wfc, CUFFT_FORWARD) );
        scalarMult<<<grid,threads>>>(wfc, renorm_factor_z, wfc);
        cudaCheckError();

        if(par.bval("Az_time")){
            EqnNode_gpu* Az_eqn = par.astval("Az");
            int e_num = par.ival("Az_num");
            ast_cmult<<<grid,threads>>>(wfc, wfc, Az_eqn, dx, dy, dz,
                                        time, e_num);
            cudaCheckError();
        }
        else{
            cMult<<<grid,threads>>>(wfc, (cufftDoubleComplex*) Az, wfc);
            cudaCheckError();
        }

        cufftHandleError( cufftExecZ2Z(plan_dim3, wfc, wfc, CUFFT_INVERSE) );
        scalarMult<<<grid,threads>>>(wfc, renorm_factor_z, wfc);
        cudaCheckError();

        // loop to multiply by Ax
        for (int i = 0; i < yDim; i++){
            cufftHandleError( cufftExecZ2Z(plan_dim2,  &wfc[i*size],
                                           &wfc[i*size], CUFFT_FORWARD) );
        }

        scalarMult<<<grid,threads>>>(wfc, renorm_factor_x, wfc);
        cudaCheckError();

        if(par.bval("Ax_time")){
            EqnNode_gpu* Ax_eqn = par.astval("Ax");
            int e_num = par.ival("Ax_num");
            ast_cmult<<<grid,threads>>>(wfc, wfc, Ax_eqn, dx, dy, dz,
                                        time, e_num);
            cudaCheckError();
        }
        else{
            cMult<<<grid,threads>>>(wfc, (cufftDoubleComplex*) Ax, wfc);
            cudaCheckError();
        }

        for (int i = 0; i < yDim; i++){
            //size = xDim * zDim;
            cufftHandleError( cufftExecZ2Z(plan_dim2, &wfc[i*size],
                                           &wfc[i*size], CUFFT_INVERSE) );
        }
        scalarMult<<<grid,threads>>>(wfc, renorm_factor_x, wfc);
        cudaCheckError();

        // 1d forward / mult by Ay
        cufftHandleError( cufftExecZ2Z(plan_1d, wfc, wfc, CUFFT_FORWARD) );
        scalarMult<<<grid,threads>>>(wfc, renorm_factor_y, wfc);
        cudaCheckError();

        if(par.bval("Ay_time")){
            EqnNode_gpu* Ay_eqn = par.astval("Ay");
            int e_num = par.ival("Ay_num");
            ast_cmult<<<grid,threads>>>(wfc, wfc, Ay_eqn, dx, dy, dz,
                                        time, e_num);
            cudaCheckError();
        }
        else{
            cMult<<<grid,threads>>>(wfc, (cufftDoubleComplex*) Ay, wfc);
            cudaCheckError();
        }
        cufftHandleError( cufftExecZ2Z(plan_1d, wfc, wfc, CUFFT_INVERSE) );
        scalarMult<<<grid,threads>>>(wfc, renorm_factor_y, wfc);
        cudaCheckError();

    }

}

Note that the dim2 plan is for some reason for the Ax operator instead of Ay. I am looking into this now.

These changes are on the 3d_rot_fix branch.

Perform a ground state and real time evolution, with more than 1 time step in real time evolution. Retrieve the wfc_ev_0 and wfc_evi_0 wavefunctions and create a new data.h5 file using this.
Now only perform the real time evolution using the new data.h5 file. Retrieve the 'new' wfc_ev_0 and wfc_evi_0 wavefucntions.
Compare the density profile of these two wavefunctions, you will see two different density profiles. There should be no difference between the density profiles since they should be the same wavefunctions both in time step = 0.