Comments (3)
Hi @gshs12051 ,
Thanks for your interest in our work!
What is your PyTorch version? This looks like a familiar bug from PyTorch that should be resolved by updating to the latest support stable version (1.11).
from pair_allegro.
Thanks. I was using Pytorch 1.10 version and after updating to 1.11 version the problem solved.
I have two more questions. First question is in the case of MD simulation with MPI. LAMMPS didn't proceed after this stage.
mpirun -np 8 lmp -sf omp -pk omp 4 -in in.lammps
run 10
No /omp style for force computation currently active
While it works well in the case of mpirun -np 4 lmp -sf omp -pk omp 8 -in in.lammps like below.
I am wondering if there is a specific limit in MPI processor grid size. and sometime MD simluation ends with error below
Unit style : metal
Current step : 0
Time step : 0.0005
Per MPI rank memory allocation (min/avg/max) = 11.64 | 11.64 | 11.64 Mbytes
Step Temp TotEng PotEng Press Volume S/CPU CPULeft
0 1000 -502.17886 -517.56082 4733.1234 3471.2258 0 0
10 1037.6838 -502.17892 -518.14053 4911.4855 3471.2258 0.62056358 96670.196
20 1149.0517 -502.18269 -519.85735 5438.6038 3471.2258 3.3797967 57200.356
30 1366.1239 -502.20265 -523.21631 6466.0332 3471.2258 3.3869016 44029.363
40 1706.0198 -502.27363 -528.51555 8074.8025 3471.2258 3.3691646 37465.69
50 2092.37 -502.46846 -534.6532 9903.4456 3471.2258 0.80146 44927.751
60 2388.6437 -502.88855 -539.63056 11305.746 3471.2258 0.49348786 57677.206
70 2591.1369 -503.60771 -543.46446 12264.171 3471.2258 0.49194526 66832.571
80 2867.5918 -504.70262 -548.81179 13572.666 3471.2258 0.54046821 72327.097
90 3162.0488 -506.21135 -554.84985 14966.367 3471.2258 0.47370461 78332.381
100 3463.3768 -508.07882 -561.35234 16392.59 3471.2258 0.43357856 84302.633
110 3783.0973 -510.20537 -568.39681 17905.867 3471.2258 0.4624285 88399.774
120 4040.5194 -512.46371 -574.61481 19124.277 3471.2258 0.4751138 91522.343
130 3916.8145 -468.80556 -529.05384 18538.766 3471.2258 0.56733556 92585.622
140 4160.4834 -471.28922 -535.2856 19692.081 3471.2258 0.48887372 94704.054
150 5138.8348 -472.80773 -551.85307 24322.739 3471.2258 0.47169693 96834.505
160 5735.4544 -477.15941 -565.38192 27146.614 3471.2258 0.47489075 98642.676
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 39872 RUNNING AT n020
= EXIT CODE: 6
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
Intel(R) MPI Library troubleshooting guide:
https://software.intel.com/node/561764
===================================================================================
And next question is during the training, I tried to use the train set of multiple cell size. (for example some training set of 120 atoms and some training set of 60 atoms) Then the training ended with the errors below.
instantiate NpzDataset
optional_args : key_mapping
optional_args : npz_fixed_field_keys
optional_args : root
optional_args : extra_fixed_fields <- dataset_extra_fixed_fields
optional_args : file_name <- dataset_file_name
...NpzDataset_param = dict(
... optional_args = {'key_mapping': {'z': 'atomic_numbers', 'E': 'total_energy', 'F': 'forces', 'R': 'pos'}, 'include_keys': [], 'npz_fixed_field_keys': ['atomic_numbers'], 'file_name': './train_set.npz', 'url': None, 'force_fixed_keys': [], 'extra_fixed_fields': {'r_max': 4.0}, 'include_frames': None, 'root': 'results/GeSe2'},
... positional_args = {'type_mapper': <nequip.data.transforms.TypeMapper object at 0x2b9f505d7490>})
Traceback (most recent call last):
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/utils/auto_init.py", line 232, in instantiate
instance = builder(**positional_args, **final_optional_args)
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/data/dataset.py", line 681, in __init__
super().__init__(
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/data/dataset.py", line 123, in __init__
super().__init__(root=root, transform=type_mapper)
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/utils/torch_geometric/dataset.py", line 90, in __init__
self._process()
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/utils/torch_geometric/dataset.py", line 175, in _process
self.process()
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/data/dataset.py", line 269, in process
data_list = [
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/data/dataset.py", line 270, in <listcomp>
constructor(**{**{f: v[i] for f, v in fields.items()}, **fixed_fields})
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/data/AtomicData.py", line 326, in from_points
return cls(edge_index=edge_index, pos=torch.as_tensor(pos), **kwargs)
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/data/AtomicData.py", line 221, in __init__
_process_dict(kwargs)
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/data/AtomicData.py", line 163, in _process_dict
raise ValueError(
ValueError: atomic_numbers is a node field but has the wrong dimension torch.Size([72, 1])
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/gshs12051/anaconda3/envs/pytorch/bin/nequip-train", line 8, in <module>
sys.exit(main())
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/scripts/train.py", line 74, in main
trainer = fresh_start(config)
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/scripts/train.py", line 177, in fresh_start
dataset = dataset_from_config(config, prefix="dataset")
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/data/_build.py", line 78, in dataset_from_config
instance, _ = instantiate(
File "/home/gshs12051/anaconda3/envs/pytorch/lib/python3.8/site-packages/nequip/utils/auto_init.py", line 234, in instantiate
raise RuntimeError(
RuntimeError: Failed to build object with prefix `dataset` using builder `NpzDataset`
from pair_allegro.
Hi @gshs12051 ,
Great, glad it resolved your issue!
Could you please open a new issue on pair_allegro
(this repo) for the MPI question, and a separate issue on the nequip
repo for the training issue? This helps keep information searchable and organized for future users.
Thanks!
from pair_allegro.
Related Issues (20)
- [QUESTION] Error while using potential in lammps. HOT 3
- Error during final linking step for LAMMPS HOT 3
- Configuring LAMMPS with pair_allegro HOT 2
- Virial and Lammps interface HOT 14
- Running pair_allegro with Kokkos on multiple GPUs HOT 1
- Error with the new pair_allegro-stress branch HOT 14
- 🐛 [BUG] Compilation of pair_allegro fails with `is protected within this context` HOT 2
- Problems parallelizing across more than 1 GPU HOT 11
- More trouble in LAMMPS compilation due to "LAMMPS_NS" HOT 5
- Using pair_allegro without stress on the newest version of LAMMPS HOT 2
- Mix Allegro and LJ type pair styles HOT 5
- Request for Raw Benchmark Data from Paper
- Some problems encountered when using multiple GPUs HOT 8
- Any plan of updates for newer LAMMPS? HOT 2
- Issue of running NEB with mpirun HOT 6
- Problem compiling lammps with kokkos HOT 1
- RuntimeError: CUDA error: device-side assert triggered HOT 2
- Simulated annealing calculation error using pair-allegro
- Calculating virial stress in lammps HOT 7
- allegro_pair style and empty partitions HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pair_allegro.