Coder Social home page Coder Social logo

test-data's Introduction

Test data for MALA

This repository contains data to test, develop and debug MALA and MALA based runscripts. If you plan to do machine-learning tests ("Does this network implementation work? Is this new data loading strategy working?"), this is the right data to test with. It is NOT production level data!

Be2

Contains DFT calculation output from a QuantumEspresso calculation for a beryllium cell with 2 atoms, along with input scripts and pseudopotential to replicate this calculation. LDOS files are usually large, therefore this reduced example samples the LDOS somewhat inaccurately, in order to reduce storage size. The energy grid for the LDOS is 11 entries long, starting at -5 eV with a spacing of 2.5 eV. For LDOS and descriptors, 4 snapshots are contained. In detail, the following data files can be found:

File Name Description
recreate_data/ Input scripts for QE
cubes/ .cube files for the local density of states
Be.pbe-n-rrkjus_psl.1.0.0.UPF Pseudopotential used for the QE calculation
Be_snapshot0.dens.npy Electronic density numpy array (snapshot 0)
Be_snapshot.dens.h5 Electronic density (HDF5 format, see details below)
Be_snapshot0.dos.npy Density of states numpy array (snapshot 0)
Be_snapshot0-3.out Output file of QE. calculation
Be_snapshot0-3.in.npy Bispectrum descriptors numpy array
Be_snapshot0-3.out.npy Local density of states numpy array
Be_snapshot0-3.in.h5 Bispectrum descriptors (HDF5 format)
Be_snapshot0-3.out.h5 Local density of states (HDF5 format)

numpy format files

SNAP bispectrum descriptors of length 91 on 18 x 18 x 27 real space grid.

Note

In the last dimension of length 94, the first 3 entries are the grid coordinates / indices (an artifact of the SNAP vector generation). The actual features are snap_array[..., 3:].

>>> np.load('Be2/Be_snapshot1.in.npy').shape
(18, 18, 27, 94)

LDOS (11 points) on 18 x 18 x 27 real space grid.

>>> np.load('Be2/Be_snapshot1.out.npy').shape
(18, 18, 27, 11)

Density of states (only provided for snapshot 0):

>>> np.load('Be2/Be_snapshot0.dos.npy').shape
(11,)

Density for snapshot 0 on a 18 x 18 x 27 real space grid. The extra dimension can be ignored, i.e. use d=np.load(...); d[..., -1] to squeeze the shape to (18, 18, 27).

>>> np.load('Be2/Be_snapshot0.dens.npy').shape
(18, 18, 27, 1)

openPMD-based files

MALA supports the openPMD format, so we also provide data in that format here.

$ h5ls -r Be_snapshot0.in.h5 | grep Dataset | sort -V
/data/0/meshes/Bispectrum/0  Dataset {18, 18, 27}
/data/0/meshes/Bispectrum/1  Dataset {18, 18, 27}
...
/data/0/meshes/Bispectrum/93 Dataset {18, 18, 27}

$ h5ls -r Be_snapshot0.out.h5 | grep Dataset | sort -V
/data/0/meshes/LDOS/0     Dataset {18, 18, 27}
/data/0/meshes/LDOS/1     Dataset {18, 18, 27}
...
/data/0/meshes/LDOS/10    Dataset {18, 18, 27}

For the density, the snapshot number 0 is encoded in the name /data/0.

$ h5ls -r Be_snapshot.dens.h5 | grep Dataset
/data/0/meshes/Density/0 Dataset {18, 18, 27}

To understand the naming scheme, we can use openPMD's introspection tool:

$ openpmd-ls Be_snapshot.dens.h5
openPMD series: Be_snapshot.dens
openPMD standard: 1.1.0
openPMD extensions: 0

data author: ...
data created: 2023-05-23 15:37:18 +0200
data backend: HDF5
generating machine: unknown
generating software: MALA (version: 1.1.0)
generating software dependencies: unknown

number of iterations: 1 (groupBased)
  all iterations: 0

number of meshes: 1
  all meshes:
    Density

number of particle species: 0

So /data/0/ is the openPMD iteration counter, which we use to name snapshots. Density/0 is one grid / array / Dataset (in hdf terms) / mesh (in openPMD terms) of shape 18 x 18 x 27. Multiple snapshots in one file would be called

/data/0/meshes/Density/0     Dataset {18, 18, 27}
/data/1/meshes/Density/0     Dataset {18, 18, 27}
/data/2/meshes/Density/0     Dataset {18, 18, 27}
...

workflow_test/

Contains the saved parameters, network and input/output scaler for a run of MALA example 01. With these the correct loading of a checkpoint in MALA can be confirmed, i.e. the workflow can be checked.

test-data's People

Contributors

randomdefaultuser avatar elcorto avatar

Stargazers

Bartosz Brzoza avatar

Watchers

Karan Shah avatar Aidan Thompson avatar Vlad Oles avatar Siva Rajamanickam avatar  avatar  avatar Franz Pöschel avatar Petr Cagas avatar Tim Callow avatar Kyle Daniel Miller avatar

test-data's Issues

Merge `densities_gp` branch?

Do we plan on merging densities_gp? There we document snap_array[...,3:], i.e. how to get rid of the lammps grid index. The Al36 part of the README as well as the Al36 data needs to be removed.

If no merge is planned, we should at least move the improved docs over.

Document h5 files

In the main branch, we have two density files:

  • Be_snapshot0.dens.npy with shape (18, 18, 27, 1)

  • Be_snapshot.dens.h5 with one dataset /data/0/meshes/Density/0 and shape (18, 18, 27)

    $ h5ls -r Be_snapshot.dens.h5
    /                        Group
    /data                    Group
    /data/0                  Group
    /data/0/meshes           Group
    /data/0/meshes/Density   Group
    /data/0/meshes/Density/0 Dataset {18, 18, 27}

The code below compares the data and finds that the arrays are the same:

import h5py
from icecream import ic
import numpy as np


# From https://github.com/elcorto/pwtools/blob/master/src/pwtools/io.py
def read_h5(fn):
    fh = h5py.File(fn, mode="r")
    dct = {}

    def get(name, obj):
        if isinstance(obj, h5py.Dataset):
            _name = name if name.startswith("/") else "/" + name
            val = obj[()]
            dct[_name] = obj.asstr()[()] if isinstance(val, bytes) else val

    fh.visititems(get)
    fh.close()
    return dct


d_h5 = read_h5("Be_snapshot.dens.h5")
ic(list(d_h5.keys()))
data_h5 = list(d_h5.values())[0]
ic(data_h5.shape)

data_np = np.load("Be_snapshot0.dens.npy")
ic(data_np.shape)

ic(data_h5.dtype)
ic(data_np.dtype)

# data_np[..., -1] : (18, 18, 27, 1) -> (18, 18, 27)
assert (data_h5 == data_np[..., -1]).all()

which prints

ic| list(d_h5.keys()): ['/data/0/meshes/Density/0']
ic| data_h5.shape: (18, 18, 27)
ic| data_np.shape: (18, 18, 27, 1)
ic| data_h5.dtype: dtype('float64')
ic| data_np.dtype: dtype('float64')

and the assert passes, so appart from the extra dimension in the .npy file, the data is equal.

Should Be_snapshot.dens.h5 be renamed to Be_snapshot0.dens.h5 then?

Repository is over its data quota

Was trying to update 25e1bdd to 7d093ff (tag: v1.7.0):

❯ git pull
Updating 25e1bdd..7d093ff
Updating files: 100% (23/23), done.
Downloading Be2/Be_snapshot.dens.h5 (273 KB)
Error downloading object: Be2/Be_snapshot.dens.h5 (f3734b4): Smudge error: Error downloading Be2/Be_snapshot.dens.h5 (f3734b4af05204cf0a5844ce098d2ecaca7ae0cb6f82d0e26ac3d0b9b7242d34): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

Errors logged to '/home/elcorto/soft/git/mala-project/test-data/.git/lfs/logs/20230714T125319.110317528.log'.
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: Be2/Be_snapshot.dens.h5: smudge filter lfs failed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.