ornl / hydragnn Goto Github PK

View Code? Open in Web Editor NEW

61.0 61.0 26.0 8.07 MB

Distributed PyTorch implementation of multi-headed graph convolutional neural networks

License: BSD 3-Clause "New" or "Revised" License

Python 99.27% Shell 0.73%

machine-learning

hydragnn's People

Contributors

Stargazers

Watchers

hydragnn's Issues

Load serialized *pkl files directly

Currently, dataset loading support raw data files, e.g., LSMS format. For every run, it would read the raw data files, convert them to serialized format and generate *pkl files. The process can be time consuming and even unnecessary sometimes. We should provide an option to load *pkl files directly.

Directional graphs

Metallic bonds and covalence bonds require undirected graphs because the electrons are shared between atoms and there are no exclusive owners.
However, ionic bonds are created between atom pairs where there is a clear donor and a clear receiver of the electron. Therefore, we can inject physics information in the adjacency matrix by transforming the graph from undirected (as the current implementation performs) to directed.

Currently, the undirected graph relies on a routines inside the "GCNN/data_utils/helper_functions.py" file:

remove_collinear_candidate
This function makes sure that if A is neighbour of B, then B is mutually neighbour of A. This routines somewhat guarantees that the connectivity between atoms is local (the adjacency os only locally dense, not globally dense), avoiding the connectivity between atoms to "explode" and transform the adjacency matrix into something that is globally dense.

I think that for the directional graph we can avoid calling this function, because A can be neighbor of B without having B being neighbor of A.

Use DFT data for unit test

Error when running the prediction

After I trained a model, the log directory has been created with the following contents:

(open-ce-1.4.0-py38-0) bash-4.4$ ls -l logs/PNAStack-r-7-mnnn-5-ncl-6-hd-5-ne-2-lr-0.001-bs-64-data-FePt_32atoms-node_ft-0-task_weights-1.0-1.0-1.0-/
total 2561
-rw------- 1 sacer sacer 575017 Oct 28 12:34 PNAStack-r-7-mnnn-5-ncl-6-hd-5-ne-2-lr-0.001-bs-64-data-FePt_32atoms-node_ft-0-task_weights-1.0-1.0-1.0-.pk
-rw------- 1 sacer sacer 155644 Oct 28 12:34 charge_density.png
-rw------- 1 sacer sacer 137315 Oct 28 12:34 charge_density_-001.png
-rw------- 1 sacer sacer 182418 Oct 28 12:34 charge_density_error_hist1d.png
-rw------- 1 sacer sacer 153256 Oct 28 12:34 charge_density_error_hist1d_-001.png
-rw------- 1 sacer sacer 145924 Oct 28 12:34 charge_density_scatter_condm_err.png
-rw------- 1 sacer sacer   4130 Oct 28 12:34 config.json
-rw------- 1 sacer sacer    674 Oct 28 12:34 events.out.tfevents.1635438862.h50n07.1263424.0
-rw------- 1 sacer sacer  38818 Oct 28 12:34 free_energy.png
-rw------- 1 sacer sacer  34485 Oct 28 12:34 free_energy_-001.png
-rw------- 1 sacer sacer  61311 Oct 28 12:34 free_energy_scatter_condm_err.png
-rw------- 1 sacer sacer    943 Oct 28 12:34 history_loss.pckl
-rw------- 1 sacer sacer 178170 Oct 28 12:34 history_loss.png
-rw------- 1 sacer sacer 157442 Oct 28 12:34 magnetic_moment.png
-rw------- 1 sacer sacer 167461 Oct 28 12:34 magnetic_moment_-001.png
-rw------- 1 sacer sacer 171006 Oct 28 12:34 magnetic_moment_error_hist1d.png
-rw------- 1 sacer sacer 175293 Oct 28 12:34 magnetic_moment_error_hist1d_-001.png
-rw------- 1 sacer sacer 141539 Oct 28 12:34 magnetic_moment_scatter_condm_err.png

Here, I used ./examples/configuration.json as is except for "num_epoch": 2 to finish early.

Then I changed example.py as follows and ran it:

import hydragnn

hydragnn.run_prediction("./examples/configuration.json")

I got the following error:

Traceback (most recent call last):
  File "example.py", line 3, in <module>
    hydragnn.run_prediction("./examples/configuration.json")
  File "/gpfs/alpine/stf008/scratch/sacer/allGNN/HydraGNN/hydragnn/run_prediction.py", line 48, in run_prediction
    output_type = config["NeuralNetwork"]["Variables_of_interest"]["type"]
TypeError: string indices must be integers

Run command on Summit:
jsrun -n24 -a1 -g1 -c7 -r6 -b rs --smpiargs="off" python example.py

Include templating infrastructure for Hierarchical Community-aware Graph Neural Network (HC-GNN)

This paper describes a very elegant way to improve the performance of localized (short-range) message passing neural networks (MPNNs) by including global attention mechanics to model long-range interactions through hierarchical clustering of nodes
https://arxiv.org/pdf/2009.03717.pdf

Looking at the original implementation of HC-GNN
https://github.com/zhiqiangzhongddu/HC-GNN/blob/master/model.py
it seems like the inclusion of hierarchical MPNN can be easily templated over the underlying localized MPNN.
Since we are already tempting HydraGNN with respect to MPNNs, including Hierarchical MPNN as an additional level of may be very doable and not too difficult to perform.

Map data from CPU to GPU batch by batch if the total dataset is too large to fit onto the GPU

When I run the Ising model and I create data using the current set-up

number_atoms_per_dimension = 5

configurational_histogram_cutoff = 1000

The following line crashes:

HydraGNN/hydragnn/preprocess/serialized_dataset_loader.py", line 88, in load_serialized_data

data.to(device)

This happens because the total volume of the dataset is too large and we map all the data at once on the GPU.

File "/root/HydraGNN/hydragnn/preprocess/serialized_dataset_loader.py", line 88, in load_serialized_data

data.to(device)

File "/opt/conda/lib/python3.8/site-packages/torch_geometric/data/data.py", line 216, in to

return self.apply(

File "/opt/conda/lib/python3.8/site-packages/torch_geometric/data/data.py", line 204, in apply

store.apply(func, *args)

File "/opt/conda/lib/python3.8/site-packages/torch_geometric/data/storage.py", line 146, in apply

self[key] = recursive_apply(value, func)

File "/opt/conda/lib/python3.8/site-packages/torch_geometric/data/storage.py", line 495, in recursive_apply

return func(data)

File "/opt/conda/lib/python3.8/site-packages/torch_geometric/data/data.py", line 217, in

lambda x: x.to(device=device, non_blocking=non_blocking), *args)

RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 6; 31.75 GiB total capacity; 30.10 GiB already allocated; 2.25 MiB free; 30.48 GiB reserved in total by PyTorch)

I greatly appreciate the effort you all made in ensuring that the data is mapped to GPU once, but if the data is too big, that is not possible. We may need to keep the data on CPU, and re-load it to the GPU only if strictly needed at the current batch.

Implement skip connections using the JumpingKnowledge pyg module

modify Base class to instantiate pyg.nn.models.JumpingKnowledge
modify the layer/dimensionality logic to account for concatenated aggregation scheme within JumpingKnowledge
within Base class include a reset_parameters() to reset skip connection choices
modify test cases and log the performance of the model on CI test case.

previous issues/PRS

PR #165
Issue #164

Specify device_ids in torch.distributed.barrier()

Use 16-bit Floats

https://blog.paperspace.com/pytorch-memory-multi-gpu-debugging/

Add periodicity

Depends on edge lengths

Customize formatting

Update black settings

Add new config unit test

Compare CI JSON entries (run directly in CI) to example JSON

Implement Global attention mechanism

resources to consider

https://pytorch-geometric.readthedocs.io/en/1.3.1/_modules/torch_geometric/nn/glob/attention.html

Add boolean flag to model stack to choose between classification and regression

Add linting

Add additional descriptors

valence electrons
electronegativity
https://singroup.github.io/dscribe/latest/tutorials/tutorials.html#descriptors
https://hachmannlab.github.io/chemml/chemml_library.html
https://deepchem.readthedocs.io/en/latest/index.html

Remove SERIALIZED_DATA_PATH

Set output paths in the config file - could replace with one path or multiple (serialized, log, etc.)

AttributeError: module 'adios2' has no attribute 'open'

File "/home/HydraGNN/hydragnn/utils/adiosdataset.py", line 330, in init
with ad2.open(self.filename, "r", self.comm) as f:

AttributeError: module 'adios2' has no attribute 'open'

Thanks for your comment

Print log and save results only at rank 0

Cross-validation capabilities

TORCH_GEOMETRIC.TRANSFORMS

@pzhanggit @streeve

There are many transformations already provided by torch.geometric.
https://pytorch-geometric.readthedocs.io/en/latest/modules/transforms.html#torch_geometric.transforms.RadiusGraph

For example, torch_geometric.transforms.RadiusGraph creates edges based on node positions pos to all points within a given distance.

Check that NLL still works

Add edge features and invariances

Edge lengths
Angles
Unit test translational invariance
Unit test rotational invariance

Support file glob inputs

Should add flexibility for multiple formats in the same folder and for subdirectories

CPU mode for profiling tools

One thing observed in PR is that something needs to fixed about using profiling in CPU mode in md17 and qm9 in CI test.
`
_____________________________ pytest_examples[qm9] _____________________________
Traceback (most recent call last):
File "/home/runner/work/HydraGNN/HydraGNN/tests/test_examples.py", line 26, in pytest_examples
assert return_code == 0
AssertionError: assert -11 == 0
----------------------------- Captured stderr call -----------------------------
Downloading https://data.pyg.org/datasets/qm9_v3.zip
Extracting dataset/qm9/raw/qm9_v3.zip
Processing...
Using a pre-processed version of the dataset. Please install 'rdkit' to alternatively process the raw data.
Done!
0: Using CPU
0: Using CPU

0%| | 0/11 [00:00<?, ?it/s]
36%|███▋ | 4/11 [00:00<00:00, 32.53it/s]ERROR:2022-05-12 13:46:01 3754:3754 CudaDeviceProperties.cpp:26] cudaGetDeviceCount failed with code 35
____________________________ pytest_examples[md17] _____________________________
Traceback (most recent call last):
File "/home/runner/work/HydraGNN/HydraGNN/tests/test_examples.py", line 26, in pytest_examples
assert return_code == 0
AssertionError: assert -11 == 0
----------------------------- Captured stderr call -----------------------------

ALIGNN model implementation using built-in LineGraph Data Transformation capabilities

We discussed quite a while about the idea of including the support of the Atomistic Line Graph Neural Network (ALIGNN) model described in the following paper:
https://arxiv.org/abs/2106.01829

There are built-in PyTorch Geometric capabilities that would make that easier to implement. For instance, the construction of the line graph is already supported:
https://pytorch-geometric.readthedocs.io/en/latest/_modules/torch_geometric/transforms/line_graph.html#LineGraph

Introduce table with information for each element of the periodic table

For each element of the periodic table, provide the following with

proton number
atomic mass
valence electrons
electronegativity

https://gist.github.com/GoodmanSciences/c2dd862cd38f21b0ad36b8f96b4bf1ee

Add pre-commit to make contributing easy.

I noticed that in the CONTRIBUTING.md mentions that the contributor must run black and pytest before contributing. Having pre-commit would make that process much easier and enforce that the developers follow coding conventions.

I'd be more than happy to take this up.

Skip connection for ResNet type of GNN

As the name suggests, the skip connections in deep architecture bypass some of the neural network layers and feed the output of one layer as the input to the following levels. It is a standard module and provides an alternative path for the gradient with backpropagation.

Skip Connections were originally created to tackle various difficulties in various architectures and were introduced even before residual networks. In the case of residual networks or ResNets, skip connections were used to solve the degradation problems (e.g., vanishing gradient), and in the case of dense networks or DenseNets, it ensured feature reusability.

Enable interactive plots

Most of the options are there to display rather than save figures, but a few changes are still needed throughout the viz class

Project contribution

I want to use HydraGNN multi-head attention implementation in my research for predicting reproducibility of scholarly articles. I appreciate the contributions of this project in the GNN space.

I checked good-first-issue and found no open issues. I previously commented on #164 for potential contribution. Does this repository support external contribution/collaboration ?

Allow Hydragnn models to be build with embedding layers for node and edge features

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()

        self.node_emb = Embedding(21, 21)
        self.edge_emb = Embedding(4, 4)

        aggregators = ['mean', 'min', 'max', 'std']
        scalers = ['identity', 'amplification', 'attenuation']

        self.convs = ModuleList()
        self.batch_norms = ModuleList()
        for _ in range(2):
            conv = PNAConv(in_channels=21, out_channels=21,
                           aggregators=aggregators, scalers=scalers, deg=deg,
                           edge_dim=4, towers=1, pre_layers=1, post_layers=1,
                           divide_input=False)
            self.convs.append(conv)
            self.batch_norms.append(BatchNorm(21))

        self.mlp = Sequential(Linear(21, 21), ReLU(), Linear(21, 21), ReLU(),
                              Linear(21, 1))

Add SchNet as additional convolutional layer

[Feature Request 🚀] Add a `CITATION.cff`

Github recently released a new feature where repository owners can add a CITATION.cff file making it easy for others to cite the repository.

Currently the information on the README isn't very helpful as it doesn't provide a BibTeX reference. Adding a CITATION.cff would make the attribution process very easy.

Move and close issues from previous repo

Unit test for data loading

Check graph/node scalar/vector data loading works as intended - create synthetic torch_geometric data and verify the values returned

Clean up visualizer

Some functions in visualizer.py have not been used for a while and would also likely to break the code since they are outdated. We need to either delete these functions since not used or update them consistently with other part of the code.

Also, create plots only on rank 0.

pickle5 dependency is no longer needed in setup.py

When following the Predicting Chemical and Material Molecular Properties With HydraGNN tutorial, there is an instruction to use pip install -e ., which will fail if using a Python version >= 3.8 . I was able to fix this on my machine by simply removing the pickle5 dependency in the install_requires list in setup.py; the built-in pickle module should support all functionality with later Python versions.

Repository sanitization and pathway for automatic deployment of docs

repo sanitization

assess the structure of the repo to:
- include docstring style documentation for all models, utils
- include style guidelines, automated tests for linting
include git conventions, branch creation strategies, recommended good practices for contribution
include ISSUE_TEMPLATE, and guidelines for creating PR's to contributors

docs and auto-deployment

modify the README to provide higher level idea, description, and example script to get started
include necessary documentation resources within README and point to the docs site for elaborate usage/explanations of HydraGNN
mention about HPC component of running HydraGNN on exascale compute clusters like Frontier, etc.
test the feasibility of either using pdoc3 or readthedocs to ensure documentation of the website is automatically pushed to a connected web endpoint

Test with external datasets

torch_geometric.datasets
- QM9
- MD17
[ x ] Materials Project
[ x ] Open Catalyst 2020
[ x ] Open Catalyst 2022
etc.

ADIOS ImportError is being ignored

https://github.com/ORNL/HydraGNN/blob/main/hydragnn/utils/adiosdataset.py#L10C1-L13C9

An ImportError caused by a missing ADIOS installation is not handled. This leads to errors with the ADIOS datasets further in the code.
https://github.com/ORNL/HydraGNN/blob/main/hydragnn/utils/adiosdataset.py#L300

NameError: name 'ad2' is not defined

Support MPI backend

This is mostly motivated by the use case of running one instance of Hydra per GPU on a node. I was only able to do so with the mpi backend and iterating through the LSB_HOSTS list.

@jychoi-hpc if there's a better way, please let me know; otherwise, I'll plan to add something like what I describe

tutorial questions

Can you please comment where is "within HydraGNN" ? Is it the json file ? Can you please describe in details the steps to run an example using the FePt.zip dataset ?

Set the path to the selected dataset within HydraGNN and run

Are the ADIOSData file required if users on not on Summit ?

Thanks