n3pdf / pdfflow Goto Github PK

View Code? Open in Web Editor NEW

8.0 8.0 0.0 358 KB

PDFflow is parton distribution function interpolation library written in Python and based on the TensorFlow framework.

Home Page: https://pdfflow.readthedocs.io

License: Apache License 2.0

Python 96.89% CMake 0.92% Makefile 0.25% C 1.30% Fortran 0.64%

hep-ex hep-ph hep-th parton-distribution-functions pdf pdf-interpolation python tensorflow

pdfflow's People

Contributors

Stargazers

Watchers

pdfflow's Issues

Add a point about disabling logs in the docs

As per issue #33, add a mention to the possibility of disabling the logs when using pdfflow.

import logging
logger_pdfflow = logging.getLogger('pdfflow')
logger_pdfflow.setLevel(logging.WARNING)

It might make sense to have it also as a environment variable because when pdfflow is used from C, Fortran, etc, changing the log level won't be this easy.

Write documentation

We can use for instance sphinx https://www.sphinx-doc.org/en/master/

At the moment we are relying on LHAPDF for the whole server/set/organization management. I don't think we really want to reimplement the whole thing, but it would be nice to have at least some feature that is able to download PDFs to /usr/share/pdfflow or even /usr/share/LHAPDF so that the library can work stand-alone-y.

Issue running PDFflow in mac

Originally from: NNPDF/nnpdf#2033

When running for instance: pytest -v test_hyperopt.py::test_restart_from_pickle in nnpdf.

(n3fit runs fine though, on the basic runcard)

I get the error:

[INFO]: All requirements processed and checked successfully. Executing actions.
[INFO] (pdfflow.pflow) Loading member 0 from NNPDF40_nnlo_as_01180
[INFO]: Loading member 0 from NNPDF40_nnlo_as_01180
[CRITICAL]: Bug in n3fit ocurred. Please report it.
Traceback (most recent call last):
  File "/Users/aronjansen/Dropbox/eScience/projects/protonStructure/nnpdfgit/nnpdf/n3fit/src/n3fit/scripts/n3fit_exec.py", line 332, in run
    super().run()
  File "/Users/aronjansen/Dropbox/eScience/projects/protonStructure/nnpdfgit/nnpdf/validphys2/src/validphys/app.py", line 151, in run
    super().run()
  File "/Users/aronjansen/.pyenv/versions/3.9.4/lib/python3.9/site-packages/reportengine/app.py", line 380, in run
    rb.execute_sequential()
  File "/Users/aronjansen/.pyenv/versions/3.9.4/lib/python3.9/site-packages/reportengine/resourcebuilder.py", line 166, in execute_sequential
    result = self.get_result(callspec.function,
  File "/Users/aronjansen/.pyenv/versions/3.9.4/lib/python3.9/site-packages/reportengine/resourcebuilder.py", line 175, in get_result
    fres =  function(**kwdict)
  File "/Users/aronjansen/Dropbox/eScience/projects/protonStructure/nnpdfgit/nnpdf/validphys2/src/validphys/covmats.py", line 253, in dataset_t0_predictions
    return central_predictions(dataset, t0set).to_numpy().reshape(-1)
  File "/Users/aronjansen/Dropbox/eScience/projects/protonStructure/nnpdfgit/nnpdf/validphys2/src/validphys/convolution.py", line 233, in central_predictions
    return _predictions(dataset, pdf, central_fk_predictions)
  File "/Users/aronjansen/Dropbox/eScience/projects/protonStructure/nnpdfgit/nnpdf/validphys2/src/validphys/convolution.py", line 166, in _predictions
    all_predictions.append(fkfunc(fk_w_cuts, pdf))
  File "/Users/aronjansen/Dropbox/eScience/projects/protonStructure/nnpdfgit/nnpdf/validphys2/src/validphys/convolution.py", line 302, in central_fk_predictions
    return central_hadron_predictions(loaded_fk, pdf)
  File "/Users/aronjansen/Dropbox/eScience/projects/protonStructure/nnpdfgit/nnpdf/validphys2/src/validphys/convolution.py", line 412, in central_hadron_predictions
    return _gv_hadron_predictions(loaded_fk, gv)
  File "/Users/aronjansen/Dropbox/eScience/projects/protonStructure/nnpdfgit/nnpdf/validphys2/src/validphys/convolution.py", line 335, in _gv_hadron_predictions
    gv1 = gv1func(qmat=[Q], vmat=FK_FLAVOURS, xmat=xgrid).squeeze(-1)
  File "/Users/aronjansen/Dropbox/eScience/projects/protonStructure/nnpdfgit/nnpdf/validphys2/src/validphys/pdfbases.py", line 308, in central_grid_values
    return self.apply_grid_values(func, vmat, xmat, qmat)
  File "/Users/aronjansen/Dropbox/eScience/projects/protonStructure/nnpdfgit/nnpdf/validphys2/src/validphys/pdfbases.py", line 422, in apply_grid_values
    gv = func(flmat, xmat, qmat)
  File "/Users/aronjansen/Dropbox/eScience/projects/protonStructure/nnpdfgit/nnpdf/validphys2/src/validphys/gridvalues.py", line 114, in central_grid_values
    return _grid_values(pdf.load_t0(), flmat, xmat, qmat)
  File "/Users/aronjansen/Dropbox/eScience/projects/protonStructure/nnpdfgit/nnpdf/validphys2/src/validphys/gridvalues.py", line 57, in _grid_values
    return lpdf.grid_values(flmat, xmat, qmat)
  File "/Users/aronjansen/Dropbox/eScience/projects/protonStructure/nnpdfgit/nnpdf/validphys2/src/validphys/lhapdfset.py", line 116, in grid_values
    raw = np.array([member.xfxQ(flavors, xarr, qarr) for member in self.members]).swapaxes(1, 2)
  File "/Users/aronjansen/Dropbox/eScience/projects/protonStructure/nnpdfgit/nnpdf/validphys2/src/validphys/lhapdfset.py", line 116, in <listcomp>
    raw = np.array([member.xfxQ(flavors, xarr, qarr) for member in self.members]).swapaxes(1, 2)
  File "/Users/aronjansen/Dropbox/eScience/projects/protonStructure/nnpdfgit/nnpdf/validphys2/src/validphys/lhapdf_compatibility.py", line 84, in xfxQ
    ret_dict = self.xfxQ(b, c)
  File "/Users/aronjansen/Dropbox/eScience/projects/protonStructure/nnpdfgit/nnpdf/validphys2/src/validphys/lhapdf_compatibility.py", line 80, in xfxQ
    return self._xfxQ_all_pid(a, b)
  File "/Users/aronjansen/Dropbox/eScience/projects/protonStructure/nnpdfgit/nnpdf/validphys2/src/validphys/lhapdf_compatibility.py", line 66, in _xfxQ_all_pid
    res = self.pdf.py_xfxQ2_allpid(x, q**2).numpy()
  File "/Users/aronjansen/.pyenv/versions/3.9.4/lib/python3.9/site-packages/pdfflow/pflow.py", line 433, in py_xfxQ2_allpid
    return self.xfxQ2_allpid(a_x, a_q2)
  File "/Users/aronjansen/.pyenv/versions/3.9.4/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/Users/aronjansen/.pyenv/versions/3.9.4/lib/python3.9/site-packages/pdfflow/pflow.py", line 394, in xfxQ2_allpid
    return self.xfxQ2(pid, a_x, a_q2)
  File "/Users/aronjansen/.pyenv/versions/3.9.4/lib/python3.9/site-packages/pdfflow/pflow.py", line 350, in xfxQ2
    f_f = self._xfxQ2(pid_idx, a_x, a_q2)
  File "/Users/aronjansen/.pyenv/versions/3.9.4/lib/python3.9/site-packages/pdfflow/pflow.py", line 266, in _xfxQ2
    res += subgrid(shape, a_q2, pids=u, arr_x=a_x)
  File "/Users/aronjansen/.pyenv/versions/3.9.4/lib/python3.9/site-packages/pdfflow/subgrid.py", line 181, in __call__
    result = self.fn_interpolation(
  File "/Users/aronjansen/.pyenv/versions/3.9.4/lib/python3.9/site-packages/pdfflow/functions.py", line 161, in first_subgrid
    if tf.size(f_idx) != 0:
tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: Using a symbolic `tf.Tensor` as a Python `bool` is not allowed. You can attempt the following resolutions to the problem: If you are running in Graph mode, use Eager execution mode or decorate this function with @tf.function. If you are using AutoGraph, you can try decorating this function with @tf.function. If that does not work, then you may be using an unsupported feature or your source code may not be visible to AutoGraph. See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/autograph/g3doc/reference/limitations.md#access-to-source-code for more information.

Benchmarks

Beside having a precision benchmark against LHAPDF it would be interesting to also know how does it compare in CPU-time (and, of course, also GPU-time)

Multimember execution is resource expensive

I'm not sure how to go around this because multi-member means, by nature, many grids have to be loaded in memory at once (if we want to run the interpolation of all of them) as a result the graph is humongous.

Maybe it actually makes sense to fallback to eager mode in this case. Or to tell the user that's an option.
Admittedly the usual scenario for a MC where vectorization is important is to ask for many values of x (and maybe q) at once, asking for many members (order 100) for a few members of x and q is quick even sequentially so run_eager in this case seems a reasonable option.

Add example folder

With several usage examples

Tabulation example
Python integration example
Example with VegasFlow

Dependency issue

Seems like one of the dependency your dependencies has been updated in a non-backward compatible way, namely cloudpickle through tensorflow_probability.

Traceback

ImportError while importing test module '/home/alessandro/projects/N3PDF/pdfflow/src/pdfflow/tests/test_pflow.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:   
src/pdfflow/tests/test_pflow.py:6: in 
    from pdfflow.pflow import mkPDF
src/pdfflow/pflow.py:18: in 
    from pdfflow.subgrid import Subgrid
src/pdfflow/subgrid.py:12: in 
    from pdfflow.functions import inner_subgrid
src/pdfflow/functions.py:32: in 
    from pdfflow.region_interpolator import interpolate                                                                                                                                                                                                                    
src/pdfflow/region_interpolator.py:7: in 
    from pdfflow.neighbour_knots import four_neighbour_knots
src/pdfflow/neighbour_knots.py:7: in                                                                                                                                                                                                                               
    import tensorflow_probability as tfp                                                                                                                                                                                                                                   
env/lib/python3.8/site-packages/tensorflow_probability/__init__.py:76: in 
    from tensorflow_probability.python import *  # pylint: disable=wildcard-import
env/lib/python3.8/site-packages/tensorflow_probability/python/__init__.py:23: in 
    from tensorflow_probability.python import distributions
env/lib/python3.8/site-packages/tensorflow_probability/python/distributions/__init__.py:88: in 
    from tensorflow_probability.python.distributions.pixel_cnn import PixelCNN
env/lib/python3.8/site-packages/tensorflow_probability/python/distributions/pixel_cnn.py:37: in 
    from tensorflow_probability.python.layers import weight_norm
env/lib/python3.8/site-packages/tensorflow_probability/python/layers/__init__.py:31: in 
    from tensorflow_probability.python.layers.distribution_layer import CategoricalMixtureOfOneHotCategorical
env/lib/python3.8/site-packages/tensorflow_probability/python/layers/distribution_layer.py:28: in 
    from cloudpickle.cloudpickle import CloudPickler
    ImportError: cannot import name 'CloudPickler' from 'cloudpickle.cloudpickle' (/home/alessandro/projects/N3PDF/pdfflow/env/lib/python3.8/site-packages/cloudpickle/cloudpickle.py)

It seems to me to have made all the required checks, and I really hope not to be my personal fault, but there is still this chance.

However, for the sake of reporting:

I freshly installed the bare tensorflow system-wide just before the installation (I had no previous installation at all)
everything else has been installed by setup.py into the environment

Maybe the minimal thing I would suggest is to specify on setup.py the restrictions on the version you are using. If you need any more info from my side let me know.

Ensure code is tf.function-compilable

This is necessary for performance.

Write setuptools installer

Integrating pineappl

I have just finalized the python interface for pineappl:

https://github.com/N3PDF/pineappl/tree/master/wrappers/python

I think would be interesting to implement the Drell-Yan photon-photon production:

https://github.com/N3PDF/pineappl/blob/master/examples/python-dy-aa/main.py

using vegasflow and then integrate the convolution using pdfflow.

Add badges, metadata, etc

Add bades to the readme.md for the test, coverage (if added) and documentation.
Also metadata to the repository, etc etc.

Switch on/off LogOutput && Multi-replicas computation

Apologies if I combine the following issues (requests) in one, but on the other hand, I do not think these require two separate issues.

It would be nice to have some kind of Verbose that switches on/off the log output here (for instance):

pdfflow/src/pdfflow/pflow.py

Line 168 in 96a6668

logger.info("loading %s", self.fname)

This would be helpful when loading multiple set of replicas.
It would also be incredibly useful to have something like: (1) mkPDFs that loads all the PDF sets at once, and (2) xfxQ2s that computes a three-dimensional grid of (rep, pid, x)-points. This is because, doing the following (see below) becomes expensive for large number of replicas.

for rep in replicas:
     res = rep.xfxQ2(pids, xgrid, q2scale)
    ...

On my computer, I could not generate a grid of 1000 replicas as it gets terminated by a SIGKILL error. I saw that there was some discussions in #19 but just wanted a separate issue 😅 .

Fix subgrid interpolation Q > mb

Add possibility to query (x,Q) arrays in the C/Fortran examples

As the title says.

Perform final optimization round

check the code is complete
check the code is optimized

Implement extrapolation behaviour

Roadmap for the paper

Following the development, here my wish list for the paper:

create final performance benchmark plots
create accuracy benchmark plots for NNPDF and other PDF sets
prepare examples (singlet top, FK convolution)
finalize code and related tasks.

Enable continous integration

Add documentation for alpha_s

Python 3.9 support

Basically at it stands PDFFlow should work with python 3.9 out of the box, the only thing stopping it from working is the fact that TensorFlow (tensorflow/tensorflow#44485) won't support official pip packages for python 3.9 for a while.

The pip package works perfectly fine with python 3.9 so if you have your own installation of Tensorflow working (for instance from your OS vendor) you can simply do

pip install pdfflow --no-deps

(see also N3PDF/vegasflow#62)

Add OpenMP example

Add an OpenMP version to the C/Fortran example.

Add unit testing and comparisons against LHAPDF

They should run after every commit (we can use github workflows for that)

Wrong Uncertainties on performance benchmark

Description

There could be an issue with the performance benchmark statistics. The problem relies in the following lines of code

Code example

pdfflow/benchmarks/compare_performance_lhapdf.py

Lines 141 to 146 in b5f608c

    
           avg_l = t_lha.mean(0) 
        
           avg_p0 = t_pdf0.mean(0) 
        
           avg_p1 = t_pdf1.mean(0) 
        
           std_l = t_lha.std(0) 
        
           std_p0 = t_pdf0.std(0) 
        
           std_p1 = t_pdf1.std(0)

Additional information

The thing is in order to give the uncertainties on this numbers in the plots I missed to dived by sqrt of the number of experiments.

We provided just 10 experiments, then the bars should be smaller by just a factor of three. They where almost invisible, so this wouldn't be a big issue in the plots.

I propose to include from now the correction factor.
Let me know what you think, thanks.

	avg_l = t_lha.mean(0)
	avg_p0 = t_pdf0.mean(0)
	avg_p1 = t_pdf1.mean(0)
	std_l = t_lha.std(0)
	std_p0 = t_pdf0.std(0)
	std_p1 = t_pdf1.std(0)