Coder Social home page Coder Social logo

grimme-lab / mctc-lib Goto Github PK

View Code? Open in Web Editor NEW
15.0 15.0 15.0 4.44 MB

Modular computation tool chain library

Home Page: https://grimme-lab.github.io/mctc-lib

License: Apache License 2.0

Fortran 95.41% Meson 1.97% CMake 2.39% Python 0.21% C 0.01%
computational-chemistry

mctc-lib's People

Contributors

ajmay81 avatar albkat avatar awvwgk avatar bugfixe avatar e-kwsm avatar kjelljorner avatar loriab avatar mtolston avatar yurivict avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

mctc-lib's Issues

Support more general reader

Currently reading an input file requires providing a formatted unit. However, non of the readers does directly read from the unit, rather they fetch a new line, tokenize and parse it. A general reader which can provide a line from any input source (formatted unit, unformatted stream, string, ...) would provide more flexibility.

The possibility to read from a file or formatted unit via the generic interface should be retained (preferably under the same symbol), however they would generate a temporary reader internally and pass it to the actual reader implementation.

Switch to V3K connection table format for 1000+ atoms or bonds

Currently only the V2K connection table format is supported for writing molfiles or SDFs, which will fail for 1000+ atoms or bonds due to the limited width of the fields. We should be more flexible and switch to the V3K format if we exceed the limit (or maybe just default to the new format).

failed to read VASP: no atoms or `Number of atom types mismatches the number of counts`

mctc-lib cc30341 compiled by GCC 10.2.1 on Debian bullseye.

Prepare the following POSCAR:

POSCAR
3.0
1.0  0.0  0.0
0.0  1.0  0.0
0.0  0.0  1.0
 S
 1
direct
0.0  0.0  0.0

mctc-convert converts the file successfully:

$ mctc-convert POSCAR -o xyz -
1
poscar
S            0.00000000000000        0.00000000000000        0.00000000000000

However, if the indentation of the ion species and numbers are removed, i.e. changed to

S
1

mctc reads nothing:

$ mctc-convert POSCAR -o xyz -
0
poscar

Also, Number of atom types mismatches the number of counts is issued if changed to:

Si
1
 S
1
S
 1

The error does not occur if the above patterns are indented.

If the number of the ions are increased to ten,

  • good: S\n10, Si\n10, Si\n10, Si\n 10
  • bad: S\n10, S\n 10

Cannot read Maestro SDF format

The problem

SDF and molfiles generated by the Maestro suite don't follow the connection table specification. The main difference is that they use less entries for each record in the V2K format (6 columns instead of the required 12 for coordinates and 3 columns instead of the required 4 for bonds).

This results in the following error when reading a Maestro generated SDF or molfile:

Error: Cannot read coordinates from connection table
 --> aspirin3d_maestro.mol:5:52-54
  |
5 |     1.2333    0.5540    0.7792 O   0  0  0  0  0  0
  |                                                    ^^^ unexpected value
  |

Note that the error message here could be clearer stating that we expect more values.


The solution

The best fix would be to allow the format extension by Maestro as valid connection table format. In case we exhaust the columns we just assume zeros were provided and not raise a syntax error in such a case.

The actual implementation is present in src/mctc/io/read/ctfile.f90:

Entries for the coordinates are read at (here no entry is really required if we assume zero)

do i = 1, 11
if (stat == 0) then
token = token_type(34 + i*3, 36 + i*3)
call read_token(line, token, list12(i+1), stat)
end if
end do

Entries for the bonds are read at (here we need the first three entries, atom indices + bond order)

do i = 1, 7
if (stat == 0) then
token = token_type(i*3 - 2, i*3)
call read_token(line, token, list7(i), stat)
end if
end do

The entries, if present, should still be parsed to avoid overlooking malformatted input files.


Reproducer for regression testing

valid aspirin3d.mol
2244
  -OEChem-02042111203D

 21 21  0     0  0  0  0  0  0999 V2000
    1.2333    0.5540    0.7792 O   0  0  0  0  0  0  0  0  0  0  0  0
   -0.6952   -2.7148   -0.7502 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.7958   -2.1843    0.8685 O   0  0  0  0  0  0  0  0  0  0  0  0
    1.7813    0.8105   -1.4821 O   0  0  0  0  0  0  0  0  0  0  0  0
   -0.0857    0.6088    0.4403 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.7927   -0.5515    0.1244 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.7288    1.8464    0.4133 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.1426   -0.4741   -0.2184 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.0787    1.9238    0.0706 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.7855    0.7636   -0.2453 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.1409   -1.8536    0.1477 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.1094    0.6715   -0.3113 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.5305    0.5996    0.1635 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.1851    2.7545    0.6593 H   0  0  0  0  0  0  0  0  0  0  0  0
   -2.7247   -1.3605   -0.4564 H   0  0  0  0  0  0  0  0  0  0  0  0
   -2.5797    2.8872    0.0506 H   0  0  0  0  0  0  0  0  0  0  0  0
   -3.8374    0.8238   -0.5090 H   0  0  0  0  0  0  0  0  0  0  0  0
    3.7290    1.4184    0.8593 H   0  0  0  0  0  0  0  0  0  0  0  0
    4.2045    0.6969   -0.6924 H   0  0  0  0  0  0  0  0  0  0  0  0
    3.7105   -0.3659    0.6426 H   0  0  0  0  0  0  0  0  0  0  0  0
   -0.2555   -3.5916   -0.7337 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  5  1  0  0  0  0
  1 12  1  0  0  0  0
  2 11  1  0  0  0  0
  2 21  1  0  0  0  0
  3 11  2  0  0  0  0
  4 12  2  0  0  0  0
  5  6  1  0  0  0  0
  5  7  2  0  0  0  0
  6  8  2  0  0  0  0
  6 11  1  0  0  0  0
  7  9  1  0  0  0  0
  7 14  1  0  0  0  0
  8 10  1  0  0  0  0
  8 15  1  0  0  0  0
  9 10  2  0  0  0  0
  9 16  1  0  0  0  0
 10 17  1  0  0  0  0
 12 13  1  0  0  0  0
 13 18  1  0  0  0  0
 13 19  1  0  0  0  0
 13 20  1  0  0  0  0
M  END
“incorrect” aspirin3d_maestro.mol
2244
                    3D
 Schrodinger Suite 2022-1.
 21 21  0  0  1  0            999 V2000
    1.2333    0.5540    0.7792 O   0  0  0  0  0  0
   -0.6952   -2.7148   -0.7502 O   0  0  0  0  0  0
    0.7958   -2.1843    0.8685 O   0  0  0  0  0  0
    1.7813    0.8105   -1.4821 O   0  0  0  0  0  0
   -0.0857    0.6088    0.4403 C   0  0  0  0  0  0
   -0.7927   -0.5515    0.1244 C   0  0  0  0  0  0
   -0.7288    1.8464    0.4133 C   0  0  0  0  0  0
   -2.1426   -0.4741   -0.2184 C   0  0  0  0  0  0
   -2.0787    1.9238    0.0706 C   0  0  0  0  0  0
   -2.7855    0.7636   -0.2453 C   0  0  0  0  0  0
   -0.1409   -1.8536    0.1477 C   0  0  0  0  0  0
    2.1094    0.6715   -0.3113 C   0  0  0  0  0  0
    3.5305    0.5996    0.1635 C   0  0  0  0  0  0
   -0.1851    2.7545    0.6593 H   0  0  0  0  0  0
   -2.7247   -1.3605   -0.4564 H   0  0  0  0  0  0
   -2.5797    2.8872    0.0506 H   0  0  0  0  0  0
   -3.8374    0.8238   -0.5090 H   0  0  0  0  0  0
    3.7290    1.4184    0.8593 H   0  0  0  0  0  0
    4.2045    0.6969   -0.6924 H   0  0  0  0  0  0
    3.7105   -0.3659    0.6426 H   0  0  0  0  0  0
   -0.2555   -3.5916   -0.7337 H   0  0  0  0  0  0
  1  5  1  0  0  0
  1 12  1  0  0  0
  2 11  1  0  0  0
  2 21  1  0  0  0
  3 11  2  0  0  0
  4 12  2  0  0  0
  5  6  1  0  0  0
  5  7  2  0  0  0
  6  8  2  0  0  0
  6 11  1  0  0  0
  7  9  1  0  0  0
  7 14  1  0  0  0
  8 10  1  0  0  0
  8 15  1  0  0  0
  9 10  2  0  0  0
  9 16  1  0  0  0
 10 17  1  0  0  0
 12 13  1  0  0  0
 13 18  1  0  0  0
 13 19  1  0  0  0
 13 20  1  0  0  0
M  END

OpenMP is not effective

OpenMP effectiveness can be configured by

option(
'openmp',
type: 'boolean',
value: false,
yield: true,
description: 'use OpenMP parallelisation',
)
or
option(WITH_OpenMP "Enable support for shared memory parallelisation with OpenMP" TRUE)
.

(the default values differ)

OpenMP flags, however, are not added even if the variable is set to true.

CMake requires something like

--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -55,2 +55,10 @@ if(WITH_JSON)
 endif()
+if(WITH_OpenMP)
+  find_package(OpenMP REQUIRED)
+  target_link_libraries(
+    "${PROJECT_NAME}-lib"
+    PRIVATE
+    "OpenMP::OpenMP_Fortran"
+  )
+endif()
 set_target_properties(

The feature is used in

!$omp parallel do shared(testsuite, unit) reduction(+:stat) if(parallelize)
do ii = 1, size(testsuite)
!$omp critical(mctc_env_testsuite)
write(unit, '(1x, 3(1x, a), 1x, "(", i0, "/", i0, ")")') &
& "Starting", testsuite(ii)%name, "...", ii, size(testsuite)
!$omp end critical(mctc_env_testsuite)
call run_unittest(testsuite(ii), unit, stat)
end do
and in
!$omp critical(mctc_env_testsuite)
if (allocated(error) .neqv. test%should_fail) then
if (test%should_fail) then
write(unit, fmt) indent, test%name, "[UNEXPECTED PASS]"
else
write(unit, fmt) indent, test%name, "[FAILED]"
end if
stat = stat + 1
else
if (test%should_fail) then
write(unit, fmt) indent, test%name, "[EXPECTED FAIL]"
else
write(unit, fmt) indent, test%name, "[PASSED]"
end if
end if
if (allocated(error)) then
write(unit, fmt) "Message:", error%message
end if
!$omp end critical(mctc_env_testsuite)
.

Description

I'd like to add "Modular computation tool chain library" to the Fortran Code on GitHub list under Computational Chemistry but am unsure how to describe it in 1 or 2 sentences.

Support ctfiles with V3000 formatted connection table

molfiles and SDFs can specify the connection table in V3000 format, currently we only support V2000.

Example:

Compound 11
     RDKit          3D

  0  0  0  0  0  0  0  0  0  0999 V3000
M  V30 BEGIN CTAB
M  V30 COUNTS 17 16 0 0 0
M  V30 BEGIN ATOM
M  V30 1 O -2.821131 -0.276238 -0.753131 0
M  V30 2 C -2.076407 0.000289 0.175864 0
M  V30 3 O -2.469860 0.872693 1.126516 0
M  V30 4 C -0.648307 -0.508439 0.384207 0 CFG=1
M  V30 5 N -0.553725 -1.908221 -0.092137 0
M  V30 6 C 0.306640 0.448659 -0.352110 0
M  V30 7 C 1.764852 0.167437 -0.097096 0
M  V30 8 C 2.575104 0.984442 0.587951 0
M  V30 9 H -3.391314 1.091514 0.873612 0
M  V30 10 H -0.438887 -0.513473 1.460318 0
M  V30 11 H 0.421206 -2.197466 -0.111774 0
M  V30 12 H -0.893195 -1.946576 -1.055443 0
M  V30 13 H 0.073377 1.483235 -0.066299 0
M  V30 14 H 0.129860 0.396798 -1.434760 0
M  V30 15 H 2.179323 -0.745345 -0.519251 0
M  V30 16 H 2.219589 1.914253 1.021797 0
M  V30 17 H 3.622875 0.736438 0.730780 0
M  V30 END ATOM
M  V30 BEGIN BOND
M  V30 1 2 1 2
M  V30 2 1 2 3
M  V30 3 1 4 2
M  V30 4 1 4 5
M  V30 5 1 4 6
M  V30 6 1 6 7
M  V30 7 2 7 8 CFG=2
M  V30 8 1 3 9
M  V30 9 1 4 10 CFG=1
M  V30 10 1 5 11
M  V30 11 1 5 12
M  V30 12 1 6 13
M  V30 13 1 6 14
M  V30 14 1 7 15
M  V30 15 1 8 16
M  V30 16 1 8 17
M  V30 END BOND
M  V30 BEGIN COLLECTION
M  V30 MDLV30/STERAC1 ATOMS=(1 4)
M  V30 END COLLECTION
M  V30 END CTAB
M  END

Validate mol/sdf in V3K version

Seems like the mol/sdf produced by the writer in V3K format is not completely correct. The OpenFF parser has problems with recognizing the produced output.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.