lupoa / lsdensities Goto Github PK

Smeared spectral densities from lattice correlators

License: GNU General Public License v3.0

Python 100.00%

lsdensities's Introduction

LSDensities: Lattice Spectral Densities

lsdensities is a Python library for the calculation of smeared spectral densities from lattice correlators.

Solutions can be obtained with the Hansen Lupo Tantalo method and Bayesian inference with Gaussian Processes, or combinations of the two.

This library is based on mpmath for performing the high-precision arithmetic operations that are necessary for the solution of the inverse problem.

Authors

Niccolò Forzano, Alessandro Lupo.

Installation

One can download, build and install the package

pip install https://github.com/LupoA/lsdensities

Usage

Preliminary tests can be found in the tests folder, and tested using the pytest command.

Usage examples can be found in the examples folder.

The most basic workflow is illustrated in examples/runExact.py, which generates a high-precision correlator, and computes the corresponding spectral density smeared with one of the available kernels.

A realistic example is shown in examples/runInverseProblem.py, where input data for the correlator needs to be provided.

The most complete class is src/lsdensities/InverseProblemWrapper.py, which provides utilities for estimating errors and treating the bias both in the HLT and in the Bayesian framework.

Function call example:

from lsdensities.utils.rhoUtils import (
    init_precision,
    Inputs,
)
from mpmath import mp, mpf
from lsdensities.core import hlt_matrix
from lsdensities.transform import coefficients_ssd, get_ssd_scalar
from lsdensities.utils.rhoMath import gauss_fp

# compute the smeared spectral density at some energy,
# from a lattice correlator

init_precision(128)
parameters = Inputs()
parameters.time_extent = 32
parameters.kerneltype = 'FULLNORMGAUSS'  # Kernel smearing spectral density
parameters.periodicity = 'EXP'  # EXP / COSH for open / periodic boundary conditions
parameters.sigma = 0.25  # smearing radius in given energy units
peak = 1    #   energy level in the correlator
energy = 0.5     # energy at which the smeared spectral density
                 # is evaluated in given energy units
parameters.assign_values()  # assigns internal variables
                            # based on given inputs
                            # such as tmax = number of data points,
                            # which is inferred from time_extent and periodicity,
                            # if not specified

lattice_correlator = mp.matrix(parameters.tmax, 1)  #  vector; fill with lattice data
lattice_covariance = mp.matrix(parameters.tmax)     #  matrix; fill with data covariance

for t in range(parameters.tmax):    # mock data
    lattice_correlator[t] = mp.exp(-mpf(t + 1) * mpf(str(peak)))
    lattice_covariance[t,t] = lattice_correlator[t] * 0.02


regularising_parameter = mpf(str(1e-6))   # regularising parameters; must be tuned.
                                          # Automatic tuning is provided in InverseProblemWrapper.py
                                          # this example has exact data, so the parameters
                                          # can be made as small as zero,
                                          # in which case the result will be exact in
                                          # the limit of infinite tmax

regularised_matrix = hlt_matrix(parameters.tmax, alpha=0) + (regularising_parameter * lattice_covariance)
matrix_inverse = regularised_matrix**(-1)

coeff = coefficients_ssd(matrix_inverse,   # linear coefficients
                       parameters,
                       energy,
                       alpha=0)

result = get_ssd_scalar(coeff,     # linear combination of data
                                     lattice_correlator,
                                     parameters)

true_value = gauss_fp(peak, energy, parameters.sigma, norm="full")

print("Result: ", float(result))   # reconstructed smeared spectral density at E = energy
print("Exact results :", true_value)  # exact smeared spectral density at E = energy

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Development requirements can be installed by using pip install -r requirements.txt, and they are listed in requirements.txt.

References

For the main ideas: https://arxiv.org/pdf/1903.06476.pdf

For the Bayesian setup and the general treatment of the bias: https://arxiv.org/pdf/2311.18125.pdf

License

GPL

lsdensities's People

Contributors

Stargazers

Watchers

lsdensities's Issues

remove massNorm

the parameter massNorm will be removed from the Input class: it is an unnecessary parameter that can be potentially confusing and does not add real benefits

Returning true/false based on an `if`

This is a pedantic issue, but there is usually not a reason to write something like

if some_condition_that_is_boolean:
    return True
else:
    return False

The reason: some_condition_that_is_boolean is already True or False, so you can return it directly:

return some_condition_that_is_boolean

Bam, your function is 3 lines shorter so there are 3 fewer lines of code to need to understand.

`plt.clf(); plt.close(fig)` is redundant

Closing a figure destroys it and everything in it; you don't need to call plt.clf() beforehand to clear it.

Add comment justifying roundtrips to strings

In https://github.com/LupoA/rhos/blob/3bcbd9021027c56247efeec45a485409975575ef/rhos/HLT_class.py#L42, why do some variables get turned into strings, only then to get turned into mpf objects? This should not be necessary, and increases confusion reading the code.

Speed up Cauchy integration

As the title says, the Cauchy integration needs to be become quicker in the functions ft_mp and A0 in 'core'.

question about pre-commit

The config file in the repository reads

default_language_version:
    python: python3.11

which did not work for me, until I changed locally to my python version

default_language_version:
    python: python3.10.12

this looks a bit user-dependent. Is it a bug?

ugly parser

Conversion from parsed arguments to input arguments should be done in the parser and not at the beginning of each example file
remove extra parser functions that are not needed

Use data formats for data

I've only taken a 10,000 foot view so far, but it looks as though some results are only output by the code as text in a log file.

Parsing out data from a free-form log file is annoying and error-prone; I'd recommend that any results that want to be output from a program are output to an appropriate data file format—this might be CSV, JSON, or HDF5. (Probably not HDF5 for the sizes of data here.) Potentially they could still be output with logging.info in case anyone wants to keep an eye on a run while debugging.

Type annotations

Currently some but not all definitions have type annotations, and some functions have some but not all arguments type annotated. It would be nice if type annotations were used consistently (and then could be checked with mypy both at commit time #25 and in continuous integration #26)

Examples ≠ Tests

I'm assuming that the primary purpose of the repository is the library, with the exec directory giving example tools that can use it. In that case I would suggest renaming the exec directory to examples, and removing test from the filenames in there (otherwise pytest may complain).

I'd suggest adding them to [project.scripts] if you expect users to call them as programs routinely.

Add requirements.txt

pyproject.toml defines requirements for installing a package, and that is now done correctly.

requirements.txt is useful to let people know what other packages are needed to be able to contribute to development of a project. Currently this is pre-commit, but pytest is coming soon (#42).

Once that's created, there should be a mention in the README of it. (E.g. a short section on contributing/setting up a development environment.)

Create README

A README is essential for others to be able to make sense of and use your work. I'd suggest at least

Name of the package, authors
Brief description of what the thing does
Installation instructions
Usage instructions for any programs
Brief examples of how to use the API

(Some of these are not worth writing until any serious refactoring/restructuring is complete.)

Improve README

Some things would improve the README

Make HLT a link to where to read more about the method
Discuss usage in more detail
Mention examples

Add LICENSE

@LupoA Can you please decide what license you would like to distribute this software under? (Technically then @nickforce989 will need to agree to share his contributions under these terms.)

If you're not sure, I'd suggest reading http://choosealicense.com. If you're still not sure, I'd suggest GPL.

Don't commit `pycache`

__pycache__ is specific to your Python installation, and shouldn't be committed.

Any instances that are in the repo should be removed, and the name should be added to .gitignore.

Use pytest

Currently the tests directory contains programs with a main() function that outputs the result of a check; it would be better to use an automated testing tool like pytest for this instead—this will reduce the amount of code, make it more easily automated, and make it easier to read for others who are already familiar with pytest.

Use Python package structure

It would be really good to restructure this into a Python package, such that anyone can install it with pip, and so that you don't have to mess around with adding directories to the PATH to be able to import things correctly.

Python provides a tutorial on packaging Python projects. (Note if you're skim-reading that YOUR_USERNAME_HERE is not an essential part of a package name, that's just so you can follow the tutorial without clashing with anyone else's tutorial package.)

Essential aspects of this:

Create a pyproject.toml specifying requirements and metadata
Use e.g. from .core to import from a module in the same directory
Make utils part of rhos. Is correlator used at all? If so, move it into rhos, otherwise remove it.
Consider whether rhos is the best name for the package; make the directory name match
Add an __init__.py file
Remove importall.py

Once this is done, #26 will probably be easier. (If #26 is already done, then this will break CI and it will need to be fixed.)

several calls to Inputs.kerneltype but Inputs has no kerneltype attribute

Don't force your matplotlib style on me

Users of the library may want to use their own Matplotlib styles, which may conflict with the ones you set.

If you want to provide a default style for the executables, then define a .mplstyle for this, and allow the user to override it with their own.

A good rule I think would be:

Anything in rhos must not modify plt.rc/plt.rcParams
Anything in examples could modify plt.rc, but really should be using plt.usestyle instead

Sometimes there may be reasons to override the style, but this should not be done via plt.rc, and should only be when strictly necessary (e.g. setting Helvetica as the font in the legend would not be needed)

Use a code formatter and pre-commit hooks

I'd suggest using ruff format to automatically reformat the code in a standard style, and then impose this style on all future commits with a pre-commit hook.

At which counter we harvest the result should be an input

or at least better picked. Systematic error from different \alpha should be also included

Add docstrings

Each function should have at least a short description of what it does—e.g. what it expects as input, what it will give as output.

Similarly classes and ideally modules should have descriptions that will form their documentation when interrogated with help().

Use continuous integration

Since there is a test suite available, it would be good to run this automatically to check that new commits haven't broken any existing behaviour.

There's lots of documentation around on how to do this; one possible starting point is the training that we deliver to AIMLAC students

You can also get this to add a badge to your README showing whether the tests currently pass, etc.

Once #25 is done, then this can also be verified in CI in case any contributor forgets to install the pre-commit hook on their own machine.

S should be evaluated only once

rm or update exact and fake data examples

Don't use PDF for documentation

PDFs are awkward to diff, and awkward for others to update if they don't have the source (which isn't in the repository).

I'd suggest either using Markdown or reStructuredText for this. GitHub will now render equations in its Markdown, so I don't think there's a barrier to doing this.

I would suggest using a doc or documentation folder rather than documents.

Further down the line you may wish to generate a fuller set of documentation using something like Read the Docs; this also uses reStructuredText.

Reproducibly ergodic random seeds

random.seed(1994) is definitely reproducible, but will introduce correlations between otherwise independent analyses, and if applied to parallel code may introduce spurious statistics. This should be adjusted to still be reproducible, but also be more ergodic.

Filename and structure improvements

There are some Python conventions that will make the code more idiomatic to use if you can follow them.

Python module and variable names usually use snake_case rather than PascalCase. (The latter is used for class names.) So things like InverseProblemWrapper.py -> inverse_problem_wrapper.py
Is _class needed in filenames? I would imagine that from .gphlt import HLTWGPrapper` would be more fluent.
- (Is that class name a typo? Should there not be a Wrapper in there?)
Does it make sense to use subdirectories to add more structure? (You may already have done this as part of #27)

unify routines at one or more values of alpha

Use a code style checker

In addition to autoformatting (see #25), ruff can also check for more increasingly pedantic things that it can't automatically fix. I wouldn't suggest enabling everything, but would suggest we discuss what categories of warning are a good starting point.

For example, I would definitely want to try and get rid of any lines including import *, and more broadly any imports that aren't used.

Use logging

Python's build-in logging library gives some important advantages over using prints everywhere.

e.g.

You can control the log level so debug output can be hidden if it isn't wanted
You don't have to put a guard at the start of every log line to prepend it with the fact that it's a log message
You don't have to comment out logging messages that are too detailed; create them at a fine-detail logging level and once you're ready not to see them, increase the overall log level so they don't get printed.

I'd suggest that most things that are currently prints should be logging.info or logging.warns.

(If you need the timestamp on it, you can use logging.setFormatter to achieve this automatically.)