salmon-tddft / salmon2 Goto Github PK

SALMON 2.0.0 Development Repository

License: Apache License 2.0

CMake 2.48% Python 5.55% Fortran 88.92% C 1.07% Shell 1.36% Cuda 0.63%

salmon2's Introduction

SALMON: Scalable Ab-initio Light-Matter simulator for Optics and Nanoscience

SALMON is an open-source software based on first-principles time-dependent density functional theory to describe optical responses and electron dynamics in matters induced by light electromagnetic fields.

SALMON has been tested and optimized to run in the following supercomputer platforms:

K-computer (It is scheduled to be permanently shut down at the end of August, 2019)
Fujitsu FX100 supercomputer system
Fujitsu supercomputer (Fugaku, FX1000, FX700) with A64FX processor
Linux PC Cluster with Intel Xeon Phi (Knights Landing architecture)
Linux PC Cluster with x86-64 CPU

For more information, please visit our website.

http://salmon-tddft.jp/

License

SALMON is available under Apache License version 2.0.

Copyright 2017-2023 SALMON developers

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

salmon2's People

Contributors

Stargazers

Watchers

Forkers

shunsuke-sato eminsight yhirokawa-ccs abhinsuresh yuk-to uemoto1 mperezjigato chrinide

salmon2's Issues

GCC 4.8.5 fails test 131.

When using GCC, current develop-2.0.0 branch fails test 131 is caused by PR #463.
However, I don't understand what the cause of bug.

$ gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ mpirun -V
Intel(R) MPI Library for Linux* OS, Version 2019 Update 5 Build 20190806 (id: 7e5a4f84c)
Copyright 2003-2019, Intel Corporation.
$ CC=mpicc FC=mpif90 ${SALMON_SRC}/configure.py --enable-mpi && make -j8
$ OMP_SCHEDULE=static ctest -R 131 -V
...
60: ############################################################
60: # Verification start
60: # Checking the existance of outputfile
60: # Checking calculated result
60: Mismatch |1.723080e+226 - 2.712660e-01| > 4.000000e-05)
60: ############################################################
...
$ git revert -m 1 21e4988d
$ make -j8
$ OMP_SCHEDULE=static ctest -R 131 -V
...
60: ############################################################
60: # Verification start
60: # Checking the existance of outputfile
60: # Checking calculated result
60: # Verification end
60: ############################################################
...

GS calculation with symmetry

I found a bug of symmetry calculation.
testsuites/111_bulk_Si_gs_dp with sym.dat file of diamond structure crashes at the end of the calculation and then data_for_restart directory is empty.
I will try bug fixing.

Calculations sometimes give different (wrong) results (a little but not ignorable)

I (and some other guys, too) sometimes (or rarely) face a problem that the calculation sometimes gives different results even though the code is not changed:
At least I experienced today like:
I got the correct results (the same as the reference number) by my calculation on my account. Then, I pushed the code on the github and did pull-request. Then, jenkins check was failed in the C2H2 GS calculation due to somehow small different results (did not converge). It was wired because I only changed some input keyword things. I again tried pushing the same code, then, it passed the jenkins check...

FFTE gives wrong total energy

For example, the calculation of testsuites/123_periodic_H2O_gs_dp with yn_ffte='y' gives NaN for the total energy and the force.

Compiler option "-check all" causes wrong GS calculation.

I realized that if we add compiler option of ifort "-check all" the ground-state calculation goes wrong: total energy becomes obviously wrong and the scf does not converge (I saw it using excersize-01: C2H2_gs.inp).

(Currently from some checks, I suspect something inconsistent treatment of array such as access violation of array, but not sure. )

"Infinity" and abnormally large values are printed in the output file of GS calculation

Abnormally large value (e.x. 1.797693134862316E+308) and "Infinity" are printed around the last part of output file of ground-state calculation. Here is the example obtained by using exercise-01 (i.e. using C2H2_gs.inp)

band information-----------------------------------------
Bottom of VB -0.676690444783346
Top of VB -0.269748315275347
Bottom of CB 1.797693134862316E+308
Top of CB -1.797693134862316E+308
Fundamental gap 1.797693134862316E+308
BG between same k-point 1.797693134862316E+308
Physicaly upper bound of CB for DOS -1.797693134862316E+308
Physicaly upper bound of CB for eps(omega) -1.797693134862316E+308

Bottom of VB[eV] -18.4138297072665
Top of VB[eV] -7.34028325594664
Bottom of CB[eV] Infinity
Top of CB[eV] -Infinity
Fundamental gap[eV] Infinity
BG between same k-point[eV] Infinity

add topics

I suggest adding topics such as density-functional-theory, dft, optics in the About section.

dft_k_expand option does not work on Fugaku

The "dft_k_expand" option that converts restart data with k-points into gamma-point does not work on Fugaku. There are no errors in the conversion job itself (calculation with theory=dft_k_expand). The error shows up when it is reading the converted restart data (sliced format) in the next calculation as below:

*** Error in free(): invalid next size (normal): 0x000000000b5c5fe0 ***
======= Backtrace: =========
/opt/FJSVxos/mmm/lib64/libmpg.so.1(+0x8044)[0x400000e98044]
/opt/FJSVxos/mmm/lib64/libmpg.so.1(+0x849c)[0x400000e9849c]
/opt/FJSVxos/mmm/lib64/libmpg.so.1(+0xa5dc)[0x400000e9a5dc]
/opt/FJSVxtclanga/tcsds-1.2.27b/lib64/libfj90i.so.1(jwe_xdal+0xe8)[0x400000903720]
/home/hp200181/u00579/salmon2/SALMON2_20201019/build/salmon[0x830250]
......
......
.....

This calculation works on OFP correctly without errors (even with strict debug option).

(*) The above calculation is GS of the expanded supercell system. Then, if_real_orbital=.false. has to be forced in the code (as the converted wavefunction is complex number even it is the gamma-point system)

Orbital parallelization

Calculation with the input file testsuites/115_bulk_Si_gs_dp_op/inputfile but nscf=120 instead of nscf=2 does not converge.
And also, RT calculation with the orbital parallelization for periodic systems has a relational bug.

Segmentation fault if we specify 68 cores (omp_thread) on ofp computer

I by chance found that segmentation fault occurs if we calculate ground-state calculation with 68 cores (=omp_thread) on OFP computers but it does not occur with other number of cores e.x. 64 and 254 cores. Input file I used is H2O with periodic boundary condition with "domain_parallel=y" (i.e. using GCEED code) (# of node=1). This problem is from recent merge of 0479bc9 . I just put memo of this problem here, but usually (probably) we don't specify 68 cores, so this is not urgent issue.

Tests 101-131 failed with `--disable-mpi`

I found that many test cases crash when executing a OpenMP version (built with --disable-mpi).

Test environment: Intel compiler 2019u5 on Xeon Gold 6242.

$ CFLAGS="-traceback" FFLAGS="-fpe0 -fpe-all=0 -check all,noarg_temp_created -traceback" ~/salmon2/configure.py --arch=intel-avx512 --debug --disable-mpi && make -j8

~~1. testcase 101~~
~~ ~~11: ============init_ps==============~~ ~~11: forrtl: severe (174): SIGSEGV, segmentation fault occurred~~ ~~11: Image PC Routine Line Source~~ ~~11: salmon.cpu 000000000123E833 Unknown Unknown Unknown~~ ~~11: libpthread-2.17.s 00007F80499865F0 Unknown Unknown Unknown~~ ~~11: salmon.cpu 0000000000D88C6F prep_pp_sub_mp_ca 907 prep_pp.f90~~ ~~11: salmon.cpu 0000000000D5BDEF prep_pp_sub_mp_in 112 prep_pp.f90~~ ~~11: salmon.cpu 0000000000952DB4 main_dft_ 130 main_dft.f90~~ ~~11: salmon.cpu 0000000000407DF3 MAIN__ 37 main.f90~~ ~~11: salmon.cpu 0000000000407BE2 Unknown Unknown Unknown~~ ~~11: libc-2.17.so 00007F80492C9505 __libc_start_main Unknown Unknown~~ ~~11: salmon.cpu 0000000000407AE9 Unknown Unknown Unknown~~ ~~

testcase 104, 105, 116, 117, 121, 131

20:   Libxc: [disabled]
20:  inumcpu_check error!
20:  number of cpu is not correct!

testcase 122

50:  ============init_ps==============
50:  restart/Si_gs.bin
50:
50: forrtl: No such file or directory
50: forrtl: severe (29): file not found, unit 96, file /fs01/homes/hirokawa/salmon2-build/single/testsuites/122_bulk_Si_rt_response_temperature_dp/restart/Si_gs.bin

~~4. testcase 111, 112, 113 (known error)~~
~~ ~~27: ############################################################~~ ~~27: # Verification start~~ ~~27: # Checking the existance of outputfile~~ ~~27: # Checking calculated result~~ ~~27: Result eigen energy for io=1, ik=2 = -1.708282e-01 (Reference = -1.716932e-01)~~ ~~27: Mismatch |-1.708282e-01 - -1.716932e-01| > 4.000000e-05)~~ ~~3/3 Test #27: verify_111_bulk_Si_gs_dp .........***Failed 0.03 sec~~ ~~...~~ ~~30: ############################################################~~ ~~30: # Verification start~~ ~~30: # Checking the existance of outputfile~~ ~~30: # Checking calculated result~~ ~~30: -0.00030664957~~ ~~30: Result Current = -3.066496e-04 (Reference = -2.970908e-04)~~ ~~30: Mismatch |-3.066496e-04 - -2.970908e-04| > 1.000000e-08)~~ ~~3/3 Test #30: verify_112_bulk_Si_rt_response_dp ...***Failed 0.01 sec~~ ~~...~~ ~~33: ############################################################~~ ~~33: # Verification start~~ ~~33: # Checking the existance of outputfile~~ ~~33: # Checking calculated result~~ ~~33: 0.00064904193~~ ~~33: Result Current = 6.490419e-04 (Reference = 6.571660e-04)~~ ~~33: Mismatch |6.490419e-04 - 6.571660e-04| > 1.000000e-08)~~ ~~3/3 Test #33: verify_113_bulk_Si_rt_pulse_dp ...***Failed 0.03 sec~~ ~~

io_gs_wfn_k.f90: `read_write_gs_wfn_k(iflag_write)` crashed

read_write_gs_wfn_k(iflag_write) clashed when finalizing ARTED GS part. testcase 001 displayed error:

...
 BG between same k-point -2.116046729082610E-004
 Physicaly upper bound of CB for DOS  0.304261584666184     
 Physicaly upper bound of CB for eps(omega)  0.606363677678584     
 -----------------------------------------------
 -----------------------------------------------

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 27132 RUNNING AT skl01
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

Geometry optimization does not work

Currently, geometrical optimization does not work correctly. In the first optimization cycle, the calculation (SCF, converged electronic state, force) is correct, but after the second cycle in the opt iteration, the results are wrong. Then, I found the results after the second cycle depends on the number of cores used, i.e., probably bug of openmp?. I couldn't find any error with -check all option of ifort. Once by chance when I used different version of intel compiler (latest ver. 2019) with -check all option, I got correct result (but this can be also just by chance). I think we need fix this kind of bug in near future.

GCC 4.8.5 + MPI: tests 203 and 204 crashed.

I found the SEGV when executing GCC 4.8.5 + Intel MPI 2019u5.

68: Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
68:
68: Backtrace for this error:
68: #0  0x2AC1DD678697
68: #1  0x2AC1DD678CDE
68: #2  0x2AC1DFAF63AF
68: #3  0x2AC1DFC16E16
68: #4  0x2AC1DD747640
68: #5  0x2AC1DD741801
68: #6  0x4DFFD7 in __checkpoint_restart_sub_MOD_write_wavefunction at checkpoint_restart.f90:426 (discriminator 1)
68: #7  0x4E0F1D in __checkpoint_restart_sub_MOD_write_bin at checkpoint_restart.f90:243
68: #8  0x4E1CAC in __checkpoint_restart_sub_MOD_checkpoint_rt at checkpoint_restart.f90:153
68: #9  0x4C25B1 in main_tddft_ at main_tddft.f90:527
68: #10  0x4067A5 in MAIN__ at main.f90:42
68: #11  0x2AC1DFAE2504

https://github.com/SALMON-TDDFT/salmon2/blob/c66fdc303597413403a97d1b119b8e089d710839/src/io/checkpoint_restart.f90#L426-L428

Fluctuation of results caused by new checkpoint/restart

After the implementation of new checkpoint/restart, fluctuation of results got worse. Values of initial density for RT cauculations of C2H2 changed by each jenkins check, and those values were 10.00000001, 10.00000004, etc. Sometimes those were less than 10. Because of this, final results changed and jenkins check didn't pass.