ornl / hydragnn Goto Github PK
View Code? Open in Web Editor NEWDistributed PyTorch implementation of multi-headed graph convolutional neural networks
License: BSD 3-Clause "New" or "Revised" License
Distributed PyTorch implementation of multi-headed graph convolutional neural networks
License: BSD 3-Clause "New" or "Revised" License
Currently, dataset loading support raw data files, e.g., LSMS format. For every run, it would read the raw data files, convert them to serialized format and generate *pkl files. The process can be time consuming and even unnecessary sometimes. We should provide an option to load *pkl files directly.
Metallic bonds and covalence bonds require undirected graphs because the electrons are shared between atoms and there are no exclusive owners.
However, ionic bonds are created between atom pairs where there is a clear donor and a clear receiver of the electron. Therefore, we can inject physics information in the adjacency matrix by transforming the graph from undirected (as the current implementation performs) to directed.
Currently, the undirected graph relies on a routines inside the "GCNN/data_utils/helper_functions.py" file:
remove_collinear_candidate
This function makes sure that if A is neighbour of B, then B is mutually neighbour of A. This routines somewhat guarantees that the connectivity between atoms is local (the adjacency os only locally dense, not globally dense), avoiding the connectivity between atoms to "explode" and transform the adjacency matrix into something that is globally dense.
I think that for the directional graph we can avoid calling this function, because A can be neighbor of B without having B being neighbor of A.
After I trained a model, the log directory has been created with the following contents:
(open-ce-1.4.0-py38-0) bash-4.4$ ls -l logs/PNAStack-r-7-mnnn-5-ncl-6-hd-5-ne-2-lr-0.001-bs-64-data-FePt_32atoms-node_ft-0-task_weights-1.0-1.0-1.0-/
total 2561
-rw------- 1 sacer sacer 575017 Oct 28 12:34 PNAStack-r-7-mnnn-5-ncl-6-hd-5-ne-2-lr-0.001-bs-64-data-FePt_32atoms-node_ft-0-task_weights-1.0-1.0-1.0-.pk
-rw------- 1 sacer sacer 155644 Oct 28 12:34 charge_density.png
-rw------- 1 sacer sacer 137315 Oct 28 12:34 charge_density_-001.png
-rw------- 1 sacer sacer 182418 Oct 28 12:34 charge_density_error_hist1d.png
-rw------- 1 sacer sacer 153256 Oct 28 12:34 charge_density_error_hist1d_-001.png
-rw------- 1 sacer sacer 145924 Oct 28 12:34 charge_density_scatter_condm_err.png
-rw------- 1 sacer sacer 4130 Oct 28 12:34 config.json
-rw------- 1 sacer sacer 674 Oct 28 12:34 events.out.tfevents.1635438862.h50n07.1263424.0
-rw------- 1 sacer sacer 38818 Oct 28 12:34 free_energy.png
-rw------- 1 sacer sacer 34485 Oct 28 12:34 free_energy_-001.png
-rw------- 1 sacer sacer 61311 Oct 28 12:34 free_energy_scatter_condm_err.png
-rw------- 1 sacer sacer 943 Oct 28 12:34 history_loss.pckl
-rw------- 1 sacer sacer 178170 Oct 28 12:34 history_loss.png
-rw------- 1 sacer sacer 157442 Oct 28 12:34 magnetic_moment.png
-rw------- 1 sacer sacer 167461 Oct 28 12:34 magnetic_moment_-001.png
-rw------- 1 sacer sacer 171006 Oct 28 12:34 magnetic_moment_error_hist1d.png
-rw------- 1 sacer sacer 175293 Oct 28 12:34 magnetic_moment_error_hist1d_-001.png
-rw------- 1 sacer sacer 141539 Oct 28 12:34 magnetic_moment_scatter_condm_err.png
Here, I used ./examples/configuration.json
as is except for "num_epoch": 2
to finish early.
Then I changed example.py
as follows and ran it:
import hydragnn
hydragnn.run_prediction("./examples/configuration.json")
I got the following error:
Traceback (most recent call last):
File "example.py", line 3, in <module>
hydragnn.run_prediction("./examples/configuration.json")
File "/gpfs/alpine/stf008/scratch/sacer/allGNN/HydraGNN/hydragnn/run_prediction.py", line 48, in run_prediction
output_type = config["NeuralNetwork"]["Variables_of_interest"]["type"]
TypeError: string indices must be integers
Run command on Summit:
jsrun -n24 -a1 -g1 -c7 -r6 -b rs --smpiargs="off" python example.py
This paper describes a very elegant way to improve the performance of localized (short-range) message passing neural networks (MPNNs) by including global attention mechanics to model long-range interactions through hierarchical clustering of nodes
https://arxiv.org/pdf/2009.03717.pdf
Looking at the original implementation of HC-GNN
https://github.com/zhiqiangzhongddu/HC-GNN/blob/master/model.py
it seems like the inclusion of hierarchical MPNN can be easily templated over the underlying localized MPNN.
Since we are already tempting HydraGNN with respect to MPNNs, including Hierarchical MPNN as an additional level of may be very doable and not too difficult to perform.
When I run the Ising model and I create data using the current set-up
number_atoms_per_dimension = 5
configurational_histogram_cutoff = 1000
The following line crashes:
HydraGNN/hydragnn/preprocess/serialized_dataset_loader.py", line 88, in load_serialized_data
data.to(device)
This happens because the total volume of the dataset is too large and we map all the data at once on the GPU.
File "/root/HydraGNN/hydragnn/preprocess/serialized_dataset_loader.py", line 88, in load_serialized_data
data.to(device)
File "/opt/conda/lib/python3.8/site-packages/torch_geometric/data/data.py", line 216, in to
return self.apply(
File "/opt/conda/lib/python3.8/site-packages/torch_geometric/data/data.py", line 204, in apply
store.apply(func, *args)
File "/opt/conda/lib/python3.8/site-packages/torch_geometric/data/storage.py", line 146, in apply
self[key] = recursive_apply(value, func)
File "/opt/conda/lib/python3.8/site-packages/torch_geometric/data/storage.py", line 495, in recursive_apply
return func(data)
File "/opt/conda/lib/python3.8/site-packages/torch_geometric/data/data.py", line 217, in
lambda x: x.to(device=device, non_blocking=non_blocking), *args)
RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 6; 31.75 GiB total capacity; 30.10 GiB already allocated; 2.25 MiB free; 30.48 GiB reserved in total by PyTorch)
I greatly appreciate the effort you all made in ensuring that the data is mapped to GPU once, but if the data is too big, that is not possible. We may need to keep the data on CPU, and re-load it to the GPU only if strictly needed at the current batch.
Base
class to instantiate pyg.nn.models.JumpingKnowledge
JumpingKnowledge
Base
class include a reset_parameters()
to reset skip connection choicesDepends on edge lengths
Update black settings
Compare CI JSON entries (run directly in CI) to example JSON
Set output paths in the config file - could replace with one path or multiple (serialized, log, etc.)
File "/home/HydraGNN/hydragnn/utils/adiosdataset.py", line 330, in init
with ad2.open(self.filename, "r", self.comm) as f:
AttributeError: module 'adios2' has no attribute 'open'
Thanks for your comment
There are many transformations already provided by torch.geometric.
https://pytorch-geometric.readthedocs.io/en/latest/modules/transforms.html#torch_geometric.transforms.RadiusGraph
For example, torch_geometric.transforms.RadiusGraph
creates edges based on node positions pos
to all points within a given distance.
Should add flexibility for multiple formats in the same folder and for subdirectories
One thing observed in PR is that something needs to fixed about using profiling in CPU mode in md17 and qm9 in CI test.
`
_____________________________ pytest_examples[qm9] _____________________________
Traceback (most recent call last):
File "/home/runner/work/HydraGNN/HydraGNN/tests/test_examples.py", line 26, in pytest_examples
assert return_code == 0
AssertionError: assert -11 == 0
----------------------------- Captured stderr call -----------------------------
Downloading https://data.pyg.org/datasets/qm9_v3.zip
Extracting dataset/qm9/raw/qm9_v3.zip
Processing...
Using a pre-processed version of the dataset. Please install 'rdkit' to alternatively process the raw data.
Done!
0: Using CPU
0: Using CPU
0%| | 0/11 [00:00<?, ?it/s]
36%|███▋ | 4/11 [00:00<00:00, 32.53it/s]ERROR:2022-05-12 13:46:01 3754:3754 CudaDeviceProperties.cpp:26] cudaGetDeviceCount failed with code 35
____________________________ pytest_examples[md17] _____________________________
Traceback (most recent call last):
File "/home/runner/work/HydraGNN/HydraGNN/tests/test_examples.py", line 26, in pytest_examples
assert return_code == 0
AssertionError: assert -11 == 0
----------------------------- Captured stderr call -----------------------------
We discussed quite a while about the idea of including the support of the Atomistic Line Graph Neural Network (ALIGNN) model described in the following paper:
https://arxiv.org/abs/2106.01829
There are built-in PyTorch Geometric capabilities that would make that easier to implement. For instance, the construction of the line graph is already supported:
https://pytorch-geometric.readthedocs.io/en/latest/_modules/torch_geometric/transforms/line_graph.html#LineGraph
For each element of the periodic table, provide the following with
https://gist.github.com/GoodmanSciences/c2dd862cd38f21b0ad36b8f96b4bf1ee
I noticed that in the CONTRIBUTING.md
mentions that the contributor must run black
and pytest
before contributing. Having pre-commit
would make that process much easier and enforce that the developers follow coding conventions.
I'd be more than happy to take this up.
As the name suggests, the skip connections in deep architecture bypass some of the neural network layers and feed the output of one layer as the input to the following levels. It is a standard module and provides an alternative path for the gradient with backpropagation.
Skip Connections were originally created to tackle various difficulties in various architectures and were introduced even before residual networks. In the case of residual networks or ResNets, skip connections were used to solve the degradation problems (e.g., vanishing gradient), and in the case of dense networks or DenseNets, it ensured feature reusability.
Most of the options are there to display rather than save figures, but a few changes are still needed throughout the viz class
I want to use HydraGNN
multi-head attention implementation in my research for predicting reproducibility of scholarly articles. I appreciate the contributions of this project in the GNN space.
I checked good-first-issue
and found no open issues. I previously commented on #164 for potential contribution. Does this repository support external contribution/collaboration ?
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.node_emb = Embedding(21, 21)
self.edge_emb = Embedding(4, 4)
aggregators = ['mean', 'min', 'max', 'std']
scalers = ['identity', 'amplification', 'attenuation']
self.convs = ModuleList()
self.batch_norms = ModuleList()
for _ in range(2):
conv = PNAConv(in_channels=21, out_channels=21,
aggregators=aggregators, scalers=scalers, deg=deg,
edge_dim=4, towers=1, pre_layers=1, post_layers=1,
divide_input=False)
self.convs.append(conv)
self.batch_norms.append(BatchNorm(21))
self.mlp = Sequential(Linear(21, 21), ReLU(), Linear(21, 21), ReLU(),
Linear(21, 1))
Github recently released a new feature where repository owners can add a CITATION.cff
file making it easy for others to cite the repository.
Currently the information on the README isn't very helpful as it doesn't provide a BibTeX
reference. Adding a CITATION.cff
would make the attribution process very easy.
Check graph/node scalar/vector data loading works as intended - create synthetic torch_geometric data and verify the values returned
Some functions in visualizer.py have not been used for a while and would also likely to break the code since they are outdated. We need to either delete these functions since not used or update them consistently with other part of the code.
Also, create plots only on rank 0.
When following the Predicting Chemical and Material Molecular Properties With HydraGNN tutorial, there is an instruction to use pip install -e .
, which will fail if using a Python version >= 3.8 . I was able to fix this on my machine by simply removing the pickle5
dependency in the install_requires
list in setup.py
; the built-in pickle
module should support all functionality with later Python versions.
docstring
style documentation for all models, utilsISSUE_TEMPLATE
, and guidelines for creating PR's to contributorsREADME
to provide higher level idea, description, and example script to get startedREADME
and point to the docs site for elaborate usage/explanations of HydraGNN
HPC
component of running HydraGNN
on exascale compute clusters like Frontier
, etc.pdoc3
or readthedocs
to ensure documentation of the website is automatically pushed to a connected web endpointhttps://github.com/ORNL/HydraGNN/blob/main/hydragnn/utils/adiosdataset.py#L10C1-L13C9
An ImportError caused by a missing ADIOS installation is not handled. This leads to errors with the ADIOS datasets further in the code.
https://github.com/ORNL/HydraGNN/blob/main/hydragnn/utils/adiosdataset.py#L300
NameError: name 'ad2' is not defined
This is mostly motivated by the use case of running one instance of Hydra per GPU on a node. I was only able to do so with the mpi
backend and iterating through the LSB_HOSTS
list.
@jychoi-hpc if there's a better way, please let me know; otherwise, I'll plan to add something like what I describe
Can you please comment where is "within HydraGNN" ? Is it the json file ? Can you please describe in details the steps to run an example using the FePt.zip dataset ?
Set the path to the selected dataset within HydraGNN and run
Are the ADIOSData file required if users on not on Summit ?
Thanks
This would allow an enhances expressive power of the neural network, as input features would be immediately projected into a high dimensional space before being used for training.
Set as None or empty list in update_config
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.