grimme-lab / mctc-lib Goto Github PK
View Code? Open in Web Editor NEWModular computation tool chain library
Home Page: https://grimme-lab.github.io/mctc-lib
License: Apache License 2.0
Modular computation tool chain library
Home Page: https://grimme-lab.github.io/mctc-lib
License: Apache License 2.0
Currently reading an input file requires providing a formatted unit. However, non of the readers does directly read from the unit, rather they fetch a new line, tokenize and parse it. A general reader which can provide a line from any input source (formatted unit, unformatted stream, string, ...) would provide more flexibility.
The possibility to read from a file or formatted unit via the generic interface should be retained (preferably under the same symbol), however they would generate a temporary reader internally and pass it to the actual reader implementation.
Currently only the V2K connection table format is supported for writing molfiles or SDFs, which will fail for 1000+ atoms or bonds due to the limited width of the fields. We should be more flexible and switch to the V3K format if we exceed the limit (or maybe just default to the new format).
The JSON frontend for TOML Fortran (https://github.com/toml-f/jonquil/) allows for easier reporting of errors in a similar style as used my other mctc-lib IO implementations. Also, it natively supports inclusion with meson and cmake.
The error message is slightly malformatted for empty files.
Example:
Error: coordinates not present, cannot work without coordinates
--> coord:0
|
0 |
|^ unexpected end of input
|
mctc-lib cc30341 compiled by GCC 10.2.1 on Debian bullseye.
Prepare the following POSCAR:
POSCAR
3.0
1.0 0.0 0.0
0.0 1.0 0.0
0.0 0.0 1.0
S
1
direct
0.0 0.0 0.0
mctc-convert
converts the file successfully:
$ mctc-convert POSCAR -o xyz -
1
poscar
S 0.00000000000000 0.00000000000000 0.00000000000000
However, if the indentation of the ion species and numbers are removed, i.e. changed to
S
1
mctc reads nothing:
$ mctc-convert POSCAR -o xyz -
0
poscar
Also, Number of atom types mismatches the number of counts
is issued if changed to:
Si
1
S
1
S
1
The error does not occur if the above patterns are indented.
If the number of the ions are increased to ten,
S\n10
, Si\n10
, Si\n10
, Si\n 10
S\n10
, S\n 10
The problem
SDF and molfiles generated by the Maestro suite don't follow the connection table specification. The main difference is that they use less entries for each record in the V2K format (6 columns instead of the required 12 for coordinates and 3 columns instead of the required 4 for bonds).
This results in the following error when reading a Maestro generated SDF or molfile:
Error: Cannot read coordinates from connection table
--> aspirin3d_maestro.mol:5:52-54
|
5 | 1.2333 0.5540 0.7792 O 0 0 0 0 0 0
| ^^^ unexpected value
|
Note that the error message here could be clearer stating that we expect more values.
The solution
The best fix would be to allow the format extension by Maestro as valid connection table format. In case we exhaust the columns we just assume zeros were provided and not raise a syntax error in such a case.
The actual implementation is present in src/mctc/io/read/ctfile.f90
:
Entries for the coordinates are read at (here no entry is really required if we assume zero)
mctc-lib/src/mctc/io/read/ctfile.f90
Lines 199 to 204 in f02b590
Entries for the bonds are read at (here we need the first three entries, atom indices + bond order)
mctc-lib/src/mctc/io/read/ctfile.f90
Lines 227 to 232 in f02b590
The entries, if present, should still be parsed to avoid overlooking malformatted input files.
Reproducer for regression testing
aspirin3d.mol
2244
-OEChem-02042111203D
21 21 0 0 0 0 0 0 0999 V2000
1.2333 0.5540 0.7792 O 0 0 0 0 0 0 0 0 0 0 0 0
-0.6952 -2.7148 -0.7502 O 0 0 0 0 0 0 0 0 0 0 0 0
0.7958 -2.1843 0.8685 O 0 0 0 0 0 0 0 0 0 0 0 0
1.7813 0.8105 -1.4821 O 0 0 0 0 0 0 0 0 0 0 0 0
-0.0857 0.6088 0.4403 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.7927 -0.5515 0.1244 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.7288 1.8464 0.4133 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.1426 -0.4741 -0.2184 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.0787 1.9238 0.0706 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.7855 0.7636 -0.2453 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.1409 -1.8536 0.1477 C 0 0 0 0 0 0 0 0 0 0 0 0
2.1094 0.6715 -0.3113 C 0 0 0 0 0 0 0 0 0 0 0 0
3.5305 0.5996 0.1635 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.1851 2.7545 0.6593 H 0 0 0 0 0 0 0 0 0 0 0 0
-2.7247 -1.3605 -0.4564 H 0 0 0 0 0 0 0 0 0 0 0 0
-2.5797 2.8872 0.0506 H 0 0 0 0 0 0 0 0 0 0 0 0
-3.8374 0.8238 -0.5090 H 0 0 0 0 0 0 0 0 0 0 0 0
3.7290 1.4184 0.8593 H 0 0 0 0 0 0 0 0 0 0 0 0
4.2045 0.6969 -0.6924 H 0 0 0 0 0 0 0 0 0 0 0 0
3.7105 -0.3659 0.6426 H 0 0 0 0 0 0 0 0 0 0 0 0
-0.2555 -3.5916 -0.7337 H 0 0 0 0 0 0 0 0 0 0 0 0
1 5 1 0 0 0 0
1 12 1 0 0 0 0
2 11 1 0 0 0 0
2 21 1 0 0 0 0
3 11 2 0 0 0 0
4 12 2 0 0 0 0
5 6 1 0 0 0 0
5 7 2 0 0 0 0
6 8 2 0 0 0 0
6 11 1 0 0 0 0
7 9 1 0 0 0 0
7 14 1 0 0 0 0
8 10 1 0 0 0 0
8 15 1 0 0 0 0
9 10 2 0 0 0 0
9 16 1 0 0 0 0
10 17 1 0 0 0 0
12 13 1 0 0 0 0
13 18 1 0 0 0 0
13 19 1 0 0 0 0
13 20 1 0 0 0 0
M END
aspirin3d_maestro.mol
2244
3D
Schrodinger Suite 2022-1.
21 21 0 0 1 0 999 V2000
1.2333 0.5540 0.7792 O 0 0 0 0 0 0
-0.6952 -2.7148 -0.7502 O 0 0 0 0 0 0
0.7958 -2.1843 0.8685 O 0 0 0 0 0 0
1.7813 0.8105 -1.4821 O 0 0 0 0 0 0
-0.0857 0.6088 0.4403 C 0 0 0 0 0 0
-0.7927 -0.5515 0.1244 C 0 0 0 0 0 0
-0.7288 1.8464 0.4133 C 0 0 0 0 0 0
-2.1426 -0.4741 -0.2184 C 0 0 0 0 0 0
-2.0787 1.9238 0.0706 C 0 0 0 0 0 0
-2.7855 0.7636 -0.2453 C 0 0 0 0 0 0
-0.1409 -1.8536 0.1477 C 0 0 0 0 0 0
2.1094 0.6715 -0.3113 C 0 0 0 0 0 0
3.5305 0.5996 0.1635 C 0 0 0 0 0 0
-0.1851 2.7545 0.6593 H 0 0 0 0 0 0
-2.7247 -1.3605 -0.4564 H 0 0 0 0 0 0
-2.5797 2.8872 0.0506 H 0 0 0 0 0 0
-3.8374 0.8238 -0.5090 H 0 0 0 0 0 0
3.7290 1.4184 0.8593 H 0 0 0 0 0 0
4.2045 0.6969 -0.6924 H 0 0 0 0 0 0
3.7105 -0.3659 0.6426 H 0 0 0 0 0 0
-0.2555 -3.5916 -0.7337 H 0 0 0 0 0 0
1 5 1 0 0 0
1 12 1 0 0 0
2 11 1 0 0 0
2 21 1 0 0 0
3 11 2 0 0 0
4 12 2 0 0 0
5 6 1 0 0 0
5 7 2 0 0 0
6 8 2 0 0 0
6 11 1 0 0 0
7 9 1 0 0 0
7 14 1 0 0 0
8 10 1 0 0 0
8 15 1 0 0 0
9 10 2 0 0 0
9 16 1 0 0 0
10 17 1 0 0 0
12 13 1 0 0 0
13 18 1 0 0 0
13 19 1 0 0 0
13 20 1 0 0 0
M END
Inputs like QCSchema or SDF provide the user with the possibility to associate arbitrary data with their geometry input, when round-tripping structures this information should be preserved.
OpenMP effectiveness can be configured by
Lines 15 to 21 in cc30341
mctc-lib/config/CMakeLists.txt
Line 17 in cc30341
(the default values differ)
OpenMP flags, however, are not added even if the variable is set to true.
CMake requires something like
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -55,2 +55,10 @@ if(WITH_JSON)
endif()
+if(WITH_OpenMP)
+ find_package(OpenMP REQUIRED)
+ target_link_libraries(
+ "${PROJECT_NAME}-lib"
+ PRIVATE
+ "OpenMP::OpenMP_Fortran"
+ )
+endif()
set_target_properties(
The feature is used in
mctc-lib/src/mctc/env/testing.f90
Lines 207 to 214 in cc30341
mctc-lib/src/mctc/env/testing.f90
Lines 269 to 287 in cc30341
I'd like to add "Modular computation tool chain library" to the Fortran Code on GitHub list under Computational Chemistry but am unsure how to describe it in 1 or 2 sentences.
molfiles and SDFs can specify the connection table in V3000 format, currently we only support V2000.
Example:
Compound 11
RDKit 3D
0 0 0 0 0 0 0 0 0 0999 V3000
M V30 BEGIN CTAB
M V30 COUNTS 17 16 0 0 0
M V30 BEGIN ATOM
M V30 1 O -2.821131 -0.276238 -0.753131 0
M V30 2 C -2.076407 0.000289 0.175864 0
M V30 3 O -2.469860 0.872693 1.126516 0
M V30 4 C -0.648307 -0.508439 0.384207 0 CFG=1
M V30 5 N -0.553725 -1.908221 -0.092137 0
M V30 6 C 0.306640 0.448659 -0.352110 0
M V30 7 C 1.764852 0.167437 -0.097096 0
M V30 8 C 2.575104 0.984442 0.587951 0
M V30 9 H -3.391314 1.091514 0.873612 0
M V30 10 H -0.438887 -0.513473 1.460318 0
M V30 11 H 0.421206 -2.197466 -0.111774 0
M V30 12 H -0.893195 -1.946576 -1.055443 0
M V30 13 H 0.073377 1.483235 -0.066299 0
M V30 14 H 0.129860 0.396798 -1.434760 0
M V30 15 H 2.179323 -0.745345 -0.519251 0
M V30 16 H 2.219589 1.914253 1.021797 0
M V30 17 H 3.622875 0.736438 0.730780 0
M V30 END ATOM
M V30 BEGIN BOND
M V30 1 2 1 2
M V30 2 1 2 3
M V30 3 1 4 2
M V30 4 1 4 5
M V30 5 1 4 6
M V30 6 1 6 7
M V30 7 2 7 8 CFG=2
M V30 8 1 3 9
M V30 9 1 4 10 CFG=1
M V30 10 1 5 11
M V30 11 1 5 12
M V30 12 1 6 13
M V30 13 1 6 14
M V30 14 1 7 15
M V30 15 1 8 16
M V30 16 1 8 17
M V30 END BOND
M V30 BEGIN COLLECTION
M V30 MDLV30/STERAC1 ATOMS=(1 4)
M V30 END COLLECTION
M V30 END CTAB
M END
Seems like the mol/sdf produced by the writer in V3K format is not completely correct. The OpenFF parser has problems with recognizing the produced output.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.