mspraggs / pyqcd Goto Github PK

pyQCD provides a Python library for running small-scale lattice QCD simulations on desktop and workstation computers.

License: Other

CMake 0.95% Python 13.89% C++ 85.17%

pyqcd's Introduction

pyQCD

Master
Development

pyQCD is an extensible Python package for building and testing lattice field theory measurements on desktop and workstation computers.

Features

Native implementations of lattice QCD algorithms using C++11, exposed to Python using Cython with full NumPy interoperability.
Complete implementation of Wilson's formulation of lattice QCD by way of both fermionic and gluonic actions.
Rectangle-improved gauge actions (Symanzik and Iwasaki).
Conjugate gradient sparse matrix solver (with and without even-odd preconditioning).
Support for multiple lattice layouts (currently lexicographic and even-odd site orderings).
Functionality to reconfigure package for different precisions and numbers of colours.
Pseudo-heatbath gauge field update algorithm.
Shared-memory parallelism using OpenMP.

Installation

Building and testing has only been conducted only on Debian-based Linux OSes, so YMMV. Please report any installation bugs using GitHub's issue tracker and I'll do my best to help you out. Build variables (flags, include search paths etc.) may be found in pyQCD/utils/build/__init__.py.

First you'll need a couple of things:

C++ compiler supporting the C++11 standard and GCC build flags (e.g. GCC, clang, MinGW, etc.);
Eigen 3 linear algebra library, either by installing via your package manager or by downloading the headers from here.

Once you have these requirements, you're ready to install. Create a virtual environment if that's your thing, then run this command in the root of the pyQCD package tree:

pip install .

If you don't have pip installed then the following command should definitely work:

python setup.py install

If you want to run the unit tests, you'll need CMake version 2.8 or above. Run:

mkdir build && cd build
cmake .. && make run_tests
pyQCD/tests/run_tests

Basic tests are also available for the Python code using py.test. Simply run py.test in the package's source directory once you've installed the package.

Usage

Python code and C++ extensions are currently spread across four modules:

pyQCD.core: basic types, including colour vectors and matrices, their lattice equivalents and layout classes;
pyQCD.gauge: gauge actions and gauge field observables;
pyQCD.fermions: fermion actions;
pyQCD.algorithms: gauge field update algorithms and sparse matrix solvers.

Create a gauge field on a lattice with spatial extent L = 8 and temporal extent T = 16 and update it 100 times using Wilson's gauge action and the heatbath algorithm:

from pyQCD.algorithms import Heatbath
from pyQCD.gauge import WilsonGaugeAction, average_plaquette, cold_start

# Create a gauge field with all matrices set to the identity, laid out
# lexicographically
gauge_field = cold_start([16, 8, 8, 8])
# Create a Wilson gauge action instance with beta = 5.5
action = WilsonGaugeAction(5.5, gauge_field.layout)
# Create a heatbath update instance
heatbath_updater = Heatbath(gauge_field.layout, action)

# Update the gauge field 100 times
heatbath_updater.update(gauge_field, 100)
# Show link at (t, x, y, z) = (0, 0, 0, 0) and mu = 0
print(gauge_field[0, 0, 0, 0, 0])

Now use the resulting gauge field to solve the Dirac equation using Wilson's fermion action and even-odd preconditioned conjugate gradient

from pyQCD.algorithms import conjugate_gradient_eoprec
from pyQCD.core import EvenOddLayout, LatticeColourVector
from pyQCD.fermions import WilsonFermionAction

# Gauge field must be even-odd partitioned for even-odd solver to work
lexico_layout = gauge_field.layout
even_odd_layout = EvenOddLayout(lexico_layout.shape)
gauge_field.change_layout(even_odd_layout)

# Create the Wilson action with dimensionless mass of 0.4 and periodic
# boundary conditions
fermion_action = WilsonFermionAction(0.4, gauge_field, [0] * 4)

# Create a point source
rhs = LatticeColourVector(lexico_layout, 4)
rhs[0, 0, 0, 0, 0, 0] = 1.0
rhs.change_layout(even_odd_layout)

# Solve Dirac equation
sol, num_iterations, err = conjugate_gradient_eoprec(
    fermion_action, rhs, 1000, 1e-10)
print("Finsihed solving after {} iterations. Solution has error = {}"
      .format(num_iterations, err))

# Change layout of sol back to lexicographic to enable numpy-functionality
sol.change_layout(lexico_layout)
# Print solution at (t, x, y, z) = (0, 0, 0, 0)
print(sol[0, 0, 0, 0])

Reconfiguring for Different Theories

By default pyQCD supports conventional lattice QCD with three colours. The source code can be reconfigured to use a different number of colours, like so:

python setup.py codegen --num-colours=[number of colours]

This will regenerate the Cython code that interfaces the underlying C++ code with Python. You'll then need to compile and install the extension modules again by rerunning pip install . or python setup.py install.

The codegen command currently also accepts --precision and --representation flags. Note that the package currently only supports QCD in the fundamental representation. Supported precision options are "double" (the default) and "float".

pyqcd's People

Contributors

Stargazers

Watchers

Forkers

f4hy lucciano zzmjohn ravijanjam stevekapturowski d-mueller ghellstern hqet mortmaire delta137 bogdan-tanygin paolofinelli id2359 cphysics alphacentauri763 ylin910095

pyqcd's Issues

Bug in CUDA hopping kernel.

Results for kernels where the number of hops is greater than 1 don't match up with results from chroma, both in the free theory and using a test configuration.

To reproduce, compile with CUDA support, then run:

import pyQCD
lat = pyQCD.Lattice(4, 8, 5.5, "wilson", 10)
lat.load_config("chroma_config.npy")
prop = lat.get_hamberwu_propagator(0.4)
corrs = pyQCD.compute_meson_corr256(prop, prop)
print(corrs[('g5', 'g5')])

This won't match the resulting pion correlator if Chroma is run with the same configuration (in the tests data directory, there should also be a file chroma_config.ildg). If Chroma is run with point sources and sinks with a mass of 0.4, the resulting pion correlator in the output hadspec data file doesn't match that produced by pyQCD. This also appears if no config is loaded and a unit gauge is used.

The error is most likely to be in the file pyQCD/core/kernel/cuda/kernels.cu, though it could also be down to the way the cuda code calculates the next-to-nearest neighbours or something.

Option for sinh-shaped correlator fit in TwoPoint.compute_energy

Since it's possible that the correlator may be sinh-shaped, such as if it's a pseudoscalar-axial correlator, the compute_energy function and all functions that depend on it really ought to have an option for this (maybe default is cosh).

C++ cout statements not treated the same as Python print statements

For example, if you run pyQCD in an IPython notebook and generate a propagator, the output produced by C++ is note displayed in the notebook.

The solution to this may be along the lines of ensuring that the cout statements are redirected to the same place as the Python stdout stream.

Some relevant links:
http://www.gossamer-threads.com/lists/python/python/1035754
http://stackoverflow.com/questions/1956407/how-to-redirect-stderr-in-python-via-python-c-api

Remove if statements for parallel code

There's some old code left that basically duplicates the contents of certain functions in order to turn multithreading on and off at runtime. Given this can be achieved by using OMP_NUM_THREADS=1 before calling python, this code now seems a bit pointless.

Converters for importing UKHadron and Chroma data

Having some functions to import correlators from Chroma and UKhadron xml files and binaries would be really useful. I have already written code to import correlators from Chroma's mesonspec output, as well as the mesons and baryons from the hadspec output (these are in the TwoPoint object).

Functions are still needed to import the vector and axial currents from hadspec, as well as the correlators from UKHadron binaries. There is a C program doing the rounds on BG/Q by Andreas Juettner that reads these binaries. There may be some issues relating to endian-ness. Essentially the first few bytes of the file contain the number of momenta the correlator has been evaluated at (should be four bytes I assume). There is then a loop over momenta. At the start of each of these loops, ten integers are read. The sequence of integers has the following layout:

0 - Unknown
1 - Unknown
2 - Unknown
3 - Unknown
4 - x momentum
5 - y momentum
6 - z momentum
7 - First gamma matrix (from 0 to 15, following Chroma gamma matrix conventions)
8 - Second gamma matrix.
9 - Lattice temporal extent

Still within the momenta loop, there is a loop over the number of timeslices, which contains a loop over the 256 possible gamma matrix combinations. In python all the above would look like this:

for mom in xrange(num_mom):
    # Read 10 integer parameters
    for t in xrange(temporal_extent):
        # Loop over gamma matrix combinations
        for mu in xrange(16):
            for nu in xrange(16):
                read_two_doubles_from_binary()

I can provide sample binary files if necessary to test this.

In the long run it would be cool to be able to import propagator data as well from various formats.

Make DWF LinearOperator use Moebius rather than Shamir

Moebius is more flexible and advantageous, so it really ought to be used instead of Shamir (and it reduces to Shamir when the Moebius scaling is 1 anyway).

Implement all warnings properly

Python has functions to create warnings. Use these instead of other methods.

Improve efficiency of analysis code.

Some ideas for this:

Cache any jackknife or bootstrap copies in memory rather than on disk.
Change storage of correlators in TwoPoint so that they're stored in a dict and referenced using the list/tuple of descriptors - saves using eval and so on.
In jackknife and bootstrap, use measurement addition functions only as a fallback - try normal addition of measurements first.

Refactor CUDA code

There remains a lot to be desired with the current implementation. In particular in the compute propagator function there is a substantial amount of transfer between the host and device, especially when preconditioning. This really should be reduced if possible.

Optimize Wilson-loop calculation

Currently the Wilson loop calculation within the lattice kernel is very inefficient, with products of links often being repeated multiple times for loops of various sizes. There is plenty of scope for streamlining this calculation to make it more efficient.

Make lattice module an optional feature

The lattice class carries a lot of dependencies in order to build and use it. Many users will only need the analysis tools, so it may be good to have lattice as an optional module that doesn't have to be built in order to install pyQCD.

Anisotropy and Tadpole Improvement in CUDA Actions

For simplicity these have been left out at this stage, but it'd be good to add it in so that there's greater coherence with the rest of the (CPU) code.

Update and review tests

There are several new functions that don't have tests yet: for example the apply_hamberwu_dirac, apply_dwf_dirac and compute_hamberwu_propagator functions.

Also it'd be great to make use of the np.allclose function when comparing results, rather than doing this manually.

Add folding options to all functions that create or add a correlator.

The option to fold correlators on import (or indeed when using add_correlator) would be useful. This could be done be creating static functions to fold cosh and sinh shaped correlators. There could then be an argument to add_correlator to specify any folding or other actions to apply to the correlator before adding it to the TwoPoint object.

Remove some default arguments for Lattice object

There are some arguments it just doesn't make sense to have as default, for example:

Lattice shape
Inverse coupling
Gauge action
Distance between measurements - This arguments needs renaming too.

Create and expose functions to invert action on a source

Exposing a function to be able to invert on a single source would create a flexible API for python users, who could potentially then create their own sources and propagator computation routines. For example, there's currently no Z2 wall source in pyQCD. The end user could create such a source and supply it to the inversion routine.

Linear operator code generator

The linear operators/fermion actions are all quite similar in structure, and it isn't unfeasible that new ones could be generated by specifying the structure of the Dirac operator then splicing some generated code into a template in some way.

How cool would it be to have the code generator written in Python, with hooks to recompile the native part of pyQCD and reload it? Perhaps this is feasible, who knows.

Gauge fixing in Lattice object

Fixing to Coulomb gauge and Landau gauge would be really useful, particularly the latter for the case of computing the mean link for tadpole improvement.

Add basic observables to Config object

It'd be good to have some basic observable calculations such as average plaquette values for the various dimensions.

Correlator computation in Lattice object

It'd be useful to be able to compute correlators during a simulation, rather than in a step afterwards, since storing the propagators would be a pain.

Even-odd preconditioning in CUDA code

This will further improve the speed of inversion with GPUs. See the existing even-odd preconditioning the the CPU code for how this could be implemented.

Additional output from fitting routines

Information that may be useful:

Coefficient of determination
chi-squared

Additional Dirac operators for coarse lattices

In order to get some meaningful physics from coarse lattices, the following should be implemented:

Wilson-like improved Dirac operators - needed for light quark physics
HQET Dirac operators - needed for heavy quarks

A good reference for these things would be "Redesigning Lattice QCD" by Lepage

Move to "ask forgiveness" paradigm.

There are several places where if statements are used when it'd be faster to use try, except blocks, for example with type checking. Let's move to the latter to speed up the code.

Conform to Scott Meyers Effective C++

Enough said really. Will update with chapter titles when time. General coding convention stuff.

Multi-threaded jackknife/bootstrap

These functions are embarrassingly parallel, so it should be straightforward to get a speed up in the analysis code using this.

EDIT: I think multiprocessing is the safest module to use for this, since it's a built-in package in recent versions of python.

Add benchmark script

This would facilitate testing of performance improvements and so on. Suggested actions to benchmark:

Jackknife/bootstrap
Gauge field updates
Propagator computation.

Baryon correlator function TwoPoint

The computation of meson correlators from propagators is currently supported, but it'd be good to have baryon correlator computations also.

Add some examples

This would really help elucidate how the package could be used.

Make installation of lattice module and associated class(es) optional

Given that most users will only want to do analysis and not simulation, it makes sense to not put them through the rigmarole of building the lattice module unless they actually want to do so.

Enable keyboard interrupts in Boost Python code

At the moment if the boost code is executing in a long loop, the only way out is to kill the python process. In the boost python FAQ there is info on how to check for interrupts.

Propagator spin/colour trace.

This would probably be quite useful for computing things like disconnected diagrams and the like.

Clover term

For improving fermion actions this would be really useful.

Refactor Simulation object

Currently the way this works is perhaps a bit cryptic. Would it be feasible to have some sort of decorator that you applied to a function, so that when you called it it automatically did the measurement for all the specified configurations? Some method would be needed to chain together various functions so that they were all executed on the same configuration one after the other.

Class methods for TwoPoint UKHADRON and CHROMA load routines

This will streamline importing data from CHROMA and UKHADRON

Linop-based Dirac operators

Currently when a propagator is computed, a sparse matrix is constructed either in Eigen or the CUSP library for CUDA. This uses additional memory. Given memory is precious in CUDA, this needs to avoided where possible. Furthermore, the CG and BiCGSTAB routines in Eigen are single-threaded.

This can be overcome by using classes to specify the action of a (Dirac) linear operator on a fermion field. The function that applies this operator then reads from the gauge field when action on the fermion field, so no additional matrix needs to be stored.

CG and BiCGSTAB solvers would need to be created from scratch to use these operators to compute propagators, but the advantage would be that the matrix-vector multiplies could be multithreaded.

Improve test rigor

There are many tests that aren't particularly rigorous; they just call the relevant functions without necessarily testing the output.

CUDA Support for Propagator Inversions

If possible it would be great to utilize the power afforded by nVidia GPUs to speed up propagator computations. In the past this was done using the cusp sparse matrix library. This could still be used, but stencils for the various Dirac operators would need to be implemented instead of using sparse matrices, and some way of passing the gauge field through to the GPU RAM in a coherent way would need to be implemented.

Improve documentation

More documentation is needed, particularly for modules and the package overall. Class member variables should also be added to their respective class docstrings. More examples are needed in general, particularly complete working ones.

Add 5D propagator computation

The current propagator computation routine is not compatible with the new DWF action. New functions to generate 5D sources and invert on them, before projecting back to 4D, are essential if DWF propagators are to become a possibility.

Add in virtualenv

This will help identify dependency issues.

Reduce memory usage in pyQCD.compute_meson_corr256

I parallelised this function recently to reduce computation time, but memory usage really takes a hit as a result, I think because the supplied Propagator objects are copied to the new processes.

As a temporary fix I've added a function argument so the user may specify how many processors may be used, with a default of 1, but it'd be better if this wasn't necessary.

Add support for correlated fits

This a relatively commonplace thing to do so really needs to be implemented.

Update benchmark.py to use new API

The API has changed since benchmark.py was created, so benchmark.py needs to be updated to reflect these changes.

Use C++ 11 where possible

e.g. use auto pointers and so on.

Caching in CUDA code

The CUDA code is sub-optimal, and there's plenty of scope for caching the gauge links to speed up the matrix multiplication required for next-nearest neighbour hopping. This would also reduce non-local memory access patterns.

Some things to consider:

If the cached sub-lattice has side-length N, then there will need to be N^4 * 12 threads available, one for each site, spin and colour index in the spinor being computed. Hence N will need to be selected based on the maximum number of threads available for the current hardware.
Depending on the hop size, there will need to be a halo around the sub-lattice so that all relevant links are available to the threads that need them.
computeU will need to be modified to use the cached links rather than the links in global memory.

Add iterator feature to DataSet object

The various data in DataSet could be iterated over, so it could be used in conjunction with list comprehension.

Allow multiple 4D action arguments in DWF class

Currently the DWF linear operator only supports 4D actions with one argument. It would be fairly easy to overcome this by having DWF take the parameters it needs from integerParams, floatParams and so on, then pass the remaining parameters onto the linear operator factory.

This would allow the parameters in the Hamber-Wu and Naik actions to be selected arbitrarily.

DWF even-odd precondtioning

This really needs to be implemented. DWF is expensive at the best of times, so any preconditioning that can be done should be done.

Add specification of fermion action to propagator computation

This will pave the way for the use of additional actions, such as clover. There must also be room somewhere the Propagator object to denote which action has been used to compute the propagator.