Coder Social home page Coder Social logo

xca's Introduction

Crystallography Companion Agent (xca)

Overview

XCA is a psuedo-unsupervised learning approach for handling spherically integrated (powder) X-ray diffraction data. The approach depends on accurate data synthesis that encompasses the multitude of abberations which can impact a diffraction pattern. The magnitude of influence of these aberrations is meant to be informed by experience given an experimental design (e.g. well prepared powders will experience significantly less texturing than epitaxially grown thin films.) The dataset synthesis is accomplished using the cctbx, starting from .cif files of potential phases. From synthetic datasets an ensemble of feed forward convolutional neural networks can be trained, and subsequently used to predict phase existence in experimental data.

System requirements

Hardware requirements

xca package requires only a standalone computer with enough RAM to support the in-memory operations. For advanced use, a CUDA enabled GPU is recommended.

Software requirements

OS requirements

This package is supported for macOS and Linux. The package has been tested on the following systems:

  • macOS: Catalina (10.15.7)
  • Linux: Ubuntu 18.04

Python dependencies

xca Dataset generation makes extensive use of the cctbx, which is currently best installed into a conda environment.

The machine learning depends on a scientific tensorflow stack:

tensorflow >= 2.1.0
# tensorflow-gpu will be installed if a gpu is available 
numpy
scikit-learn
scipy

Installation guide

Due to the current unavailability of the cctbx on PyPi channels, we recommend first setting up a conda environment for the cctbx. The remaining dependencies can be installed via pip.

conda create -n xca -c conda-forge cctbx-base python=3.7
conda activate xca
git clone https://github.com/maffettone/xca
cd xca
python -m pip install .

For some machines, specificity with tensorflow may become necessary depending on your CUDA version, and the top line can be replaced with the following, replacing the correct version of tensorflow:

conda create --name xca -c conda-forge cctbx-base tensorflow-gpu=2.2

Getting started

A simple demonstration

A simple example of the full training pipeline is demonstrated in the simple_example.py script. Executing this will do the following in a tmp directory:

  1. Synthesize 100 example patterns for each phase of the three experimental systems presented in the paper below.
  2. Convert those patterns into a tfrecords object.
  3. Train an ensemble model, print the results, and save the full model.
cd xca/examples/arxiv200800283
python simple_example.py

This will take a few minutes to run for each example, approximately 10 minutes in total. The output will print the path to each cif file as it generates the relevant set of synthetic data. This includes 4 files for BaTiO, 5 files for ADTA, and 31 files for Ni-Co-Al. The output will then print the ensemble model summary. The time for each epoch will be displayed on with 8 epochs for each model training. Lastly results dictionary will be sloppily printed to show the loss, training, and validation metrics.

Details of generic synthesis and training can be found in example_synthesis.py and example_training.py.

Literature details

The application of this package is demonstrated in aXiv:2008.00283. To reproduce the models presented in this paper, the dataset synthesis should be scaled (use of multiprocessing is encouraged) to produce 50,000 patterns per phase using the same parameterization presented in example_synthesis.py.

ABSTRACT: The discovery of new structural and functional materials is driven by phase identification, often using X-ray diffraction (XRD). Automation has accelerated the rate of XRD measurements, greatly outpacing XRD analysis techniques that remain manual, time consuming, error prone, and impossible to scale. With the advent of autonomous robotic scientists or self-driving labs, contemporary techniques prohibit the integration of XRD. Here, we describe a computer program for the autonomous characterization of XRD data, driven by artificial intelligence (AI), for the discovery of new materials. Starting from structural databases, we train an ensemble model using a physically accurate synthetic dataset, which output probabilistic classifications --- rather than absolutes --- to overcome the overconfidence in traditional neural networks. This AI agent behaves as a companion to the researcher, improving accuracy and offering unprecedented time savings, and is demonstrated on a diverse set of organic and inorganic materials challenges. This innovation is directly applicable to inverse design approaches, robotic discovery systems, and can be immediately considered for other forms of characterization such as spectroscopy and the pair distribution function.

Developer Instructions

The same instructions above in the installation guide apply. However, we prefer to follow Black formatting and Flake8 style checking. As notebooks and tutorials get added, we will also use nbstripout to avoid committing notebook metadata and figures to the repository.

The following install script will build the necessary dependencies for the pre-commit hooks. For GPU compatibility, it is strongly suggested that you be explicit in your tensorflow versioning.

conda create -n xca -c conda-forge cctbx-base tensorflow-gpu=2.XXX 
conda activate xca
git clone https://github.com/maffettone/xca
cd xca
python -m pip install -e . -r requirements-dev.txt  
pre-commit install

xca's People

Contributors

maffettone avatar prinaldi3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

xca's Issues

Redundant parameters in cctbx builder

Parameters that do not require further processing should be left to **kwargs. For instance, sample_height is redundant with offset_height in

for key in kwargs:
    parameters[key] = np.random.uniform(*kwargs[key])

Refactoring of tf models

Things in tf_models are getting full of code redundancies. A good refactor would make things more clear.

Path issues preventing use on windows

Running simple_example.py on Windows 10 runs for a bit, then crashes out.


C:\Users\dolds\Anaconda3\envs\xca\lib\site-packages\xca\examples\arxiv200800283\cifs-BaTiO\cubic.cif
C:\Users\dolds\Anaconda3\envs\xca\lib\site-packages\xca\examples\arxiv200800283\cifs-BaTiO\ortho.cif
C:\Users\dolds\Anaconda3\envs\xca\lib\site-packages\xca\examples\arxiv200800283\cifs-BaTiO\rhomb.cif
C:\Users\dolds\Anaconda3\envs\xca\lib\site-packages\xca\examples\arxiv200800283\cifs-BaTiO\tetra.cif
Traceback (most recent call last):
  File ".\simple_example.py", line 19, in <module>
    main()
  File ".\simple_example.py", line 10, in main
    dir2TFR(f"tmp/{system}", f"tmp/{system}.tfrecords")
  File "C:\Users\dolds\Anaconda3\envs\xca\lib\site-packages\xca\ml\tf_data_proc.py", line 27, in dir2TFR
    label = str(fname).split('/')[-2]
IndexError: list index out of range

Ensembling more complex than fusion

Current models perform ensembling through averaging layer. Independent training and network exposure, with averaging only during testing.

Implementation of dense VAE from paper.

@prinaldi3 to take the first crack at this.
@maffettone To review and edit.
@lbanko to review final PR.

Potential approach

  1. Branch from #9
  2. Encoder: build_dense_encoder_model(), following lead from

    xca/xca/ml/tf_models.py

    Lines 18 to 30 in dfdad35

    def build_CNN_model(*,
    data_shape,
    filters,
    kernel_sizes,
    strides,
    ReLU_alpha,
    pool_sizes,
    batchnorm,
    n_classes,
    dense_dims=(),
    dense_dropout=0.,
    **kwargs
    ):

    Returns: Model(input_x, [z_mean, z_log_var], name="encoder")
  3. Decoder: build_dense_decoder_model()
    Returns: Model(z_in, x_dec, name="decoder")
  4. VAE class: VAE(tf.keras.Model)
    with methods:
  • __init__(encoder_model, decoder_model, kl_loss_weight)
  • encode(x) -> mean, log_var,
  • reparameterize(mean, log_var) -> z_sample,
  • decode(z) -> x_reconstruction,
  • kl_loss(z_mean, z_log_var),
  • reconstruction_loss(x, x_reconstruction)

Citation

Deep learning for visualization and novelty detection in large X-ray diffraction datasets
Paper in press at npj Computational Materials.
https://arxiv.org/abs/2104.04392

Parameter dict not updated, key error bkg_ea, bkg_eb

Parameter dict in cycle params not updated by default params for bkg_ea, bkg_eb

def cycle_params(n_profiles, output_path, input_params=None, shape_limit=0.,
                 march_range=(0., 1.), preferred_axes=None,
                 sample_height=None, noise_exp=None, n_jobs=1, **kwargs):
.
.
.
  for _ in range(n_profiles):
    for i in range(-2, 7):
.
.
.
              parameters['bkg_{}'.format(i)] = np.random.uniform(0, _default['bkg_{}'.format(i)])

Adding values for bkg_ea and bkg_eb input parameters dict avoids the issue.

xca error message.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.