Coder Social home page Coder Social logo

gpue-group / gpue Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mlxd/gpue

37.0 37.0 7.0 30.69 MB

GPU Gross-Pitaevskii Equation numerical solver for Bose-Einstein condensates.

Home Page: https://gpue-group.github.io/

License: Other

Makefile 0.35% Shell 0.57% C 3.69% C++ 15.08% MATLAB 6.16% Python 16.42% Cuda 55.06% CMake 0.26% TeX 1.58% Dockerfile 0.13% Julia 0.72%
cuda fft gpu numerical-calculations physics quantum-mechanics research simulation superfluid

gpue's People

Contributors

benchislett avatar leios avatar mlxd avatar peterwittek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

gpue's Issues

Fix unit tests + a list of other issues

Grid test:

  • set gridding generation to separate function
  • loop through check all elements == gid

parSum test:

  • Fix comments (64 x 64), not 8 x 8

General:

  • Return errors from all mallocs also

Evolution_test:

  • Simple harmonic trap returns energy of wavefunction

Missing tests:

  • Evolution test

Non Unit Tests:

  • Remove 3D vortex tracking (for JOSS)
  • Add more math functionality to dynamic parsing

Parser inconsistent with `-r` flag

-r flag (and similar flags) should be always used after setting the data directory, regardless of when the flag is set in the command-line argument.

Energy calculation uses too much GPU memory

In the process of re-writing the energy calculation, we needed a bit more memory than previously thought. As such, if we are running a memory-intensive task, the energy calculation will not work correctly. We need to find a way to rewrite it to minimize the memory footprint.

CS: norm host implementation

Change host norm implementation to use thrust and split the vector into chunks small enough to fit on the gpu

Overhaul of plot.py

The plot.py file is horribly documented and unclear. It needs a general rework. In addition, we need to make something similar for 3D. Ideally, we somehow incorporate gen_data.py into this script.

Non-GPUE dependencies included in repo

Hi all, I have noticed some external source files included from other repositories in some of the branches. I would appreciate it if these were not checked-in, and only pulled down during the initial clone/build step.

While licensing may be compatible, it is an area that can cause headaches later, and so not including these deps here would be preferred.

I am currently working on the CMake build, of which many of these deps can be pulled in as required when fully set up. I would like these deps to be scrubbed from the history as well when completed (e.g. catch2, etc.). Thanks.

Large number of stale branches

Hi all, I am hoping to prune many of the unneeded branches in the repo. Most of these have had their features either merged into other branches, or have become stale. As such, to tidy things up it would be nice if a branch is not being used, or is not needed, that it is deleted.

I have added anybody with contributor access to this just to get your inputs. If no responses, I can assume (after a reasonable amount of time) that your associated branches will not be needed. Let me know what is worth keeping, and I will track it on the project board. Thanks!

create GPU-based curl function

Determining the magnetic field takes a lot of time because it is CPU-only. As such, a GPU kernel would be nice... Along with an additional derivative function (psuedocode):

___global___ void derive(data, out, stride, n, dx){
    int gid = [init]
    if (gid + stride < n){
        out[gid] = (data[gid+stride] - data[gid])/dx
    }
    else{
        out[gid] = data[gid]/dx
    }
}

Here, a stride of 1 -> xDim, xDim -> yDim, xDim*yDim -> zDim derivatives

then we just use this to create our curl function, so we might want to make a separate __device__ kernel as well.

python scripts inconsistent / directory extectations.

The python visualization scripts need to be updated and modernized. In particular, gen_data.py needs to be made consistent in terms of where files are called from within the GPU directory.

For the most part, all python scripts should be made to run from the directory they are in, not from the GPUE root directory.

CS: deprecate format and swap

Deprecate format and swap in CCompressor and move them to CWaveFunction when the storage format is changed to double2

Convergence criteria for imaginary time propagation

A feature in the ground state calculation that stops the imaginary time propagation when the wave function converges (for instance, stoping the calculation when the maximum difference of the wave function or energy difference between the previous step and the current step is less then a certain value).

As Leios pointed out, it would require running the energy calculation and watching it until it levels out. I wonder if it'll be better if the energy is calculated per 100 loops (or something like that) since the energy calculation is quite time consuming as far as I know.

Thanks!

Meeting notes to be implemented

Focus on:

  • Usability - Cfg file
  • Vortex tracking: 2D/3D
  • Unit tests
  • Paper (JOSS)
  • GPUE-Guide: Doxygen

Pros of GPUE:

  • Performance
  • Dynamic fields
  • Lattice engineering

Difficulties with the build process

Hi,

Im trying to build GPUE with the instructions provided, but i get the problems as shown below I have tried to use the standard makefile, but I also get the same issues, and I cannot fix them up by adding the -std=gnu++0x arguments.

I do have other gcc compilers avalilable, but this is the standard RHEL6 one which is the system that i have access to a GPU card on, and if i go above 5 there is an error.

Hope this helps

Mark

git clone https://github.com/GPUE-group/GPUE.git
Initialized empty Git repository in /dls/tmp/ssg37927/test/GPUE/.git/
remote: Enumerating objects: 66, done.
remote: Counting objects: 100% (66/66), done.
remote: Compressing objects: 100% (46/46), done.
remote: Total 2705 (delta 33), reused 45 (delta 20), pack-reused 2639
Receiving objects: 100% (2705/2705), 3.83 MiB | 650 KiB/s, done.
Resolving deltas: 100% (2057/2057), done.

[ssg37927@cs04r-sc-com14-09 test]$ module load cuda/8.0

[ssg37927@cs04r-sc-com14-09 test]$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

[ssg37927@cs04r-sc-com14-09 test]$ gcc --version
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23)
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[ssg37927@cs04r-sc-com14-09 test]$ module load cmake

[ssg37927@cs04r-sc-com14-09 test]$ cmake --version
cmake version 3.12.0
CMake suite maintained and supported by Kitware (kitware.com/cmake).

[ssg37927@cs04r-sc-com14-09 test]$ cd GPUE/

[ssg37927@cs04r-sc-com14-09 GPUE]$ cmake .
-- The CXX compiler identification is GNU 4.4.7
-- The CUDA compiler identification is NVIDIA 8.0.61
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working CUDA compiler: /dls_sw/apps/cuda/8.0/bin/nvcc
-- Check for working CUDA compiler: /dls_sw/apps/cuda/8.0/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Configuring done
-- Generating done
-- Build files have been written to: /dls/tmp/ssg37927/test/GPUE

[ssg37927@cs04r-sc-com14-09 GPUE]$ make
Scanning dependencies of target unitTest
[ 4%] Building CUDA object CMakeFiles/unitTest.dir/src/unit_test.cu.o
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
In file included from /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/unordered_map:35,
from /dls/tmp/ssg37927/test/GPUE/src/../include/ds.h:24,
from /dls/tmp/ssg37927/test/GPUE/src/unit_test.cu:1:
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/c++0x_warning.h:31:2: error: #error This file requires compiler and library support for the upcoming ISO C++ standard, C++0x. This support is currently experimental, and must be enabled with the -std=c++0x or -std=gnu++0x compiler options.
make[2]: *** [CMakeFiles/unitTest.dir/src/unit_test.cu.o] Error 1
make[1]: *** [CMakeFiles/unitTest.dir/all] Error 2
make: *** [all] Error 2

[ssg37927@cs04r-sc-com14-09 GPUE]$ make
[ 4%] Building CUDA object CMakeFiles/unitTest.dir/src/unit_test.cu.o
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
In file included from /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/unordered_map:35,
from /dls/tmp/ssg37927/test/GPUE/src/../include/ds.h:24,
from /dls/tmp/ssg37927/test/GPUE/src/unit_test.cu:1:
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/c++0x_warning.h:31:2: error: #error This file requires compiler and library support for the upcoming ISO C++ standard, C++0x. This support is currently experimental, and must be enabled with the -std=c++0x or -std=gnu++0x compiler options.
make[2]: *** [CMakeFiles/unitTest.dir/src/unit_test.cu.o] Error 1
make[1]: *** [CMakeFiles/unitTest.dir/all] Error 2
make: *** [all] Error 2

3D angular momentum operator incorrect

It seems like the Ax and Ay arrays are somehow mixed up in GPUE, and the current master branch does not evolve correctly with rotation.

The following apply_gauge(...) function works:

// 3D
void apply_gauge(Grid &par, double2 *wfc, double2 *Ax, double2 *Ay,
                 double2 *Az, double renorm_factor_x,
                 double renorm_factor_y, double renorm_factor_z, bool flip,
                 cufftHandle plan_1d, cufftHandle plan_dim2,
                 cufftHandle plan_dim3, double dx, double dy, double dz,
                 double time, int yDim, int size){

    dim3 grid = par.grid;
    dim3 threads = par.threads;

    if (flip){

        // 1d forward / mult by Ay
        cufftHandleError( cufftExecZ2Z(plan_1d, wfc, wfc, CUFFT_FORWARD) );
        scalarMult<<<grid,threads>>>(wfc, renorm_factor_z, wfc);
        cudaCheckError();
        if(par.bval("Ay_time")){
            EqnNode_gpu* Ay_eqn = par.astval("Ay");
            int e_num = par.ival("Ay_num");
            ast_cmult<<<grid,threads>>>(wfc, wfc, Ay_eqn, dx, dy, dz,
                                        time, e_num);
            cudaCheckError();
        }
        else{
            cMult<<<grid,threads>>>(wfc, (cufftDoubleComplex*) Ay, wfc);
            cudaCheckError();
        }
        cufftHandleError( cufftExecZ2Z(plan_1d, wfc, wfc, CUFFT_INVERSE) );
        scalarMult<<<grid,threads>>>(wfc, renorm_factor_z, wfc);
        cudaCheckError();

        // loop to multiply by Ax
        for (int i = 0; i < yDim; i++){
            cufftHandleError( cufftExecZ2Z(plan_dim2,  &wfc[i*size],
                                           &wfc[i*size], CUFFT_FORWARD) );
        }

        scalarMult<<<grid,threads>>>(wfc, renorm_factor_y, wfc);
        cudaCheckError();
        if(par.bval("Ax_time")){
            EqnNode_gpu* Ax_eqn = par.astval("Ax");
            int e_num = par.ival("Ax_num");
            ast_cmult<<<grid,threads>>>(wfc, wfc, Ax_eqn, dx, dy, dz,
                                        time, e_num);
            cudaCheckError();
        }
        else{
            cMult<<<grid,threads>>>(wfc, (cufftDoubleComplex*) Ax, wfc);
            cudaCheckError();
        }

        for (int i = 0; i < yDim; i++){
            //size = xDim * zDim;
            cufftHandleError( cufftExecZ2Z(plan_dim2, &wfc[i*size],
                                           &wfc[i*size], CUFFT_INVERSE) );
        }
        scalarMult<<<grid,threads>>>(wfc, renorm_factor_y, wfc);
        cudaCheckError();

        // 1D FFT to Az
        cufftHandleError( cufftExecZ2Z(plan_dim3, wfc, wfc, CUFFT_FORWARD) );
        scalarMult<<<grid,threads>>>(wfc, renorm_factor_x, wfc);
        cudaCheckError();

        if(par.bval("Az_time")){
            EqnNode_gpu* Az_eqn = par.astval("Az");
            int e_num = par.ival("Az_num");
            ast_cmult<<<grid,threads>>>(wfc, wfc, Az_eqn, dx, dy, dz,
                                        time, e_num);
            cudaCheckError();
        }
        else{
            cMult<<<grid,threads>>>(wfc, (cufftDoubleComplex*) Az, wfc);
            cudaCheckError();
        }

        cufftHandleError( cufftExecZ2Z(plan_dim3, wfc, wfc, CUFFT_INVERSE) );
        scalarMult<<<grid,threads>>>(wfc, renorm_factor_x, wfc);
        cudaCheckError();

    }
    else{

        // 1D FFT to Az
        cufftHandleError( cufftExecZ2Z(plan_dim3, wfc, wfc, CUFFT_FORWARD) );
        scalarMult<<<grid,threads>>>(wfc, renorm_factor_z, wfc);
        cudaCheckError();

        if(par.bval("Az_time")){
            EqnNode_gpu* Az_eqn = par.astval("Az");
            int e_num = par.ival("Az_num");
            ast_cmult<<<grid,threads>>>(wfc, wfc, Az_eqn, dx, dy, dz,
                                        time, e_num);
            cudaCheckError();
        }
        else{
            cMult<<<grid,threads>>>(wfc, (cufftDoubleComplex*) Az, wfc);
            cudaCheckError();
        }

        cufftHandleError( cufftExecZ2Z(plan_dim3, wfc, wfc, CUFFT_INVERSE) );
        scalarMult<<<grid,threads>>>(wfc, renorm_factor_z, wfc);
        cudaCheckError();

        // loop to multiply by Ax
        for (int i = 0; i < yDim; i++){
            cufftHandleError( cufftExecZ2Z(plan_dim2,  &wfc[i*size],
                                           &wfc[i*size], CUFFT_FORWARD) );
        }

        scalarMult<<<grid,threads>>>(wfc, renorm_factor_x, wfc);
        cudaCheckError();

        if(par.bval("Ax_time")){
            EqnNode_gpu* Ax_eqn = par.astval("Ax");
            int e_num = par.ival("Ax_num");
            ast_cmult<<<grid,threads>>>(wfc, wfc, Ax_eqn, dx, dy, dz,
                                        time, e_num);
            cudaCheckError();
        }
        else{
            cMult<<<grid,threads>>>(wfc, (cufftDoubleComplex*) Ax, wfc);
            cudaCheckError();
        }

        for (int i = 0; i < yDim; i++){
            //size = xDim * zDim;
            cufftHandleError( cufftExecZ2Z(plan_dim2, &wfc[i*size],
                                           &wfc[i*size], CUFFT_INVERSE) );
        }
        scalarMult<<<grid,threads>>>(wfc, renorm_factor_x, wfc);
        cudaCheckError();

        // 1d forward / mult by Ay
        cufftHandleError( cufftExecZ2Z(plan_1d, wfc, wfc, CUFFT_FORWARD) );
        scalarMult<<<grid,threads>>>(wfc, renorm_factor_y, wfc);
        cudaCheckError();

        if(par.bval("Ay_time")){
            EqnNode_gpu* Ay_eqn = par.astval("Ay");
            int e_num = par.ival("Ay_num");
            ast_cmult<<<grid,threads>>>(wfc, wfc, Ay_eqn, dx, dy, dz,
                                        time, e_num);
            cudaCheckError();
        }
        else{
            cMult<<<grid,threads>>>(wfc, (cufftDoubleComplex*) Ay, wfc);
            cudaCheckError();
        }
        cufftHandleError( cufftExecZ2Z(plan_1d, wfc, wfc, CUFFT_INVERSE) );
        scalarMult<<<grid,threads>>>(wfc, renorm_factor_y, wfc);
        cudaCheckError();

    }

}

Note that the dim2 plan is for some reason for the Ax operator instead of Ay. I am looking into this now.

These changes are on the 3d_rot_fix branch.

Unnecessary renormalization after FFT's for performance

We only need to renormalize when outputting, not every timestep. Try to do this and see if you can get away without renormalization even for angular momentum steps

Here, we are just talking about FFT renorm, not renorm for imaginary time, which is still necessary.

Mismatch in input wavefunction and output wavefunction even if they are in the same time step

The bug can be replicated as follows:

  1. Perform a ground state and real time evolution, with more than 1 time step in real time evolution. Retrieve the wfc_ev_0 and wfc_evi_0 wavefunctions and create a new data.h5 file using this.
  2. Now only perform the real time evolution using the new data.h5 file. Retrieve the 'new' wfc_ev_0 and wfc_evi_0 wavefucntions.
  3. Compare the density profile of these two wavefunctions, you will see two different density profiles. There should be no difference between the density profiles since they should be the same wavefunctions both in time step = 0.

loading of external potentials from file

It would be nice if one could load the external potential from a file. This would allow for the inclusion of any potential, not only potentials that have simple analytic expressions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.