Coder Social home page Coder Social logo

pineko's Introduction

PINEKO

Tests Docs CodeFactor

PINEKO is a Python module to produce fktables from interpolation grids and EKOs.

Installation

PINEKO is available via

  • PyPI: PyPI
pip install pineko

Development

If you want to install from source you can run

git clone [email protected]:N3PDF/pineko.git
cd pineko
poetry install

To setup poetry, and other tools, see Contribution Guidelines.

Documentation

  • The documentation is available here: Docs
  • To build the documentation from source run
cd docs
poetry run make html

Tests and benchmarks

  • To run unit test you can do
poetry run pytest

Contributing

  • Your feedback is welcome! If you want to report a (possible) bug or want to ask for a new feature, please raise an issue: GitHub issues
  • Please follow our Code of Conduct and read the Contribution Guidelines

pineko's People

Contributors

alecandido avatar andreab1997 avatar cschwan avatar felixhekhorn avatar giacomomagni avatar pre-commit-ci[bot] avatar roystegeman avatar scarlehoff avatar scarrazza avatar t7phy avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

pineko's Issues

Refactor configs

After a brief experience with the current configs

  • drop non-local paths:
    paths.append(pathlib.Path.home())
    paths.append(pathlib.Path(appdirs.user_config_dir()))
    paths.append(pathlib.Path(appdirs.site_config_dir()))
  • include git-like resolution, i.e. check parent folders for a pineko.toml file
  • test paths semantics
    • absolute paths are kept as they are
    • relative paths are relative to pineko.toml

Essentially, the best way to achieve the last one should be something like:

pinekopath = pathlib.Path(...).absolute()

def absolute_or_pineko(path: pathlib.Path):
    try:
        return path.relative_to(pinekopath)
    except ValueError:
        return path

# or, almost the same
def absolute_or_pineko2(path: pathlib.Path):
    if path.is_absolute():
        return path
    return path.relative_to(pinekopath)

Factorization scale variations for grids.

At the moment we have only implemented renormalization scale variations for grids that are missing the scale dependence.
It would be good to add the factorization scale dependence as well.

This might be unnecessary if we get the last few grids with scale variations that we are missing but I'm opening the issue just to keep this issue in mind.

We should have already the information necessary for NLO.

Screenshot 2023-05-25 at 10 03 27

(I'm assigning myself to it since I don't want to inflict this pain onto others, but cc possible interested parties @cschwan @felixhekhorn @andreab1997)

FONLL-B DIS FK table need different PTO

I'm sorry to say again, but NLO FK tables are still wrong (only FK tables this time though - so it's not expensive):
https://github.com/N3PDF/pineko/blob/e12c6482a32c7c8f902bfa2d2504e4bfe451fd19/src/pineko/theory.py#L325

this is wrong in the case of NLO (so PTO=1) and FONLL-B because FONLL-B also contains bits at $O(a_s^2)$ which will be neglected with this statement ...

the solution is

  • find out (from the theory card) whether we're in FONLL-B
  • check whether we're looking at a DIS grid (from the lumi, I guess - we can cross check with NNPDF/nnpdf#1529)
  • eventually do a correction of max_as

KeyError: 'nf0'

I'm trying to generate an FK table to test the performance of NNPDF/pineappl#103. However, when I run ./run.py, I get the following error message:

Traceback (most recent call last):
  File "/scratch/cschwan/pineko/./run.py", line 58, in <module>
    ensure_eko(pineappl_path, myoperator_path)
  File "/scratch/cschwan/pineko/./run.py", line 37, in ensure_eko
    ops = eko.run_dglap(theory_card=theory_card, operators_card=operators_card)
  File "/home/cschwan/projects/pineappl/pineappl_py/env/lib/python3.9/site-packages/eko/__init__.py", line 27, in run_dglap
    r = runner.Runner(theory_card, operators_card)
  File "/home/cschwan/projects/pineappl/pineappl_py/env/lib/python3.9/site-packages/eko/runner.py", line 84, in __init__
    tc = ThresholdsAtlas.from_dict(theory_card)
  File "/home/cschwan/projects/pineappl/pineappl_py/env/lib/python3.9/site-packages/eko/thresholds.py", line 179, in from_dict
    nf_ref = theory_card["nf0"]
KeyError: 'nf0'

FK logs

In pineko.toml I have specified, as suggested

[paths.logs]
eko = "logs/eko"
fk = "logs/fk"

The first time I run
$ pineko theory fks <theory_number> <dataset_name>
I get the error
FileNotFoundError: [Errno 2] No such file or directory: '/home/enocera/Documents/NNPDF/nnpdfgit/pineko/logs/fk/<theory_number>-<dataset_name>-None.log'
If I create by hand the directory fk/, then everything works.

So it seems to me that the logs/eko dir is created (if not existing yet) when running
$ pineko theory ekos <theory_number> <dataset_name>
but the logs/fk dir is not created (if not existing) when running
$ pineko theory fks <theory_number> <dataset_name>
Is this correct?

Road map for NNPDF40MHOU

Let me collect here the steps that we are missing in order to complete NNPDF40MHOU (so that we can keep track of progresses)

  • Implement numerical FONLL (in charge: AB)
  • Compute jets FKtables (to be investigated)
  • Include NNLO k-factors in the FKtables (to be discussed...)
  • merge #75 (pineappl pre-release needed)
  • Use the automatic scale variations tool to add the sv orders to the hadronic grids (in charge: AB)
  • Recompute all the FKtables
  • Test thcovmat and fits
  • Do the fits

Let me also tag all the people that may be involved or just interested in this (just for you to know) @cschwan @felixhekhorn @alecandido @giacomomagni @scarlehoff

Integrability FK tables

When using pineko 0.3.3 there is a problem with integrability FKtables.
Ekos instead are produced correctly.

Configurations loaded from '/data/theorie/gmagni/N3PDF/pineko/pineko.toml'
Analyze INTEGXT3
┌───────────────┐
│ Computing ... │
└───────────────┘
   /data/theorie/gmagni/N3PDF/pineko/data/grids/440/NNPDF_INTEG_XT3_40.pineappl.lz4
 + /data/theorie/gmagni/N3PDF/pineko/data/ekos/440/NNPDF_INTEG_XT3_40.tar
 = /data/theorie/gmagni/N3PDF/pineko/data/fktables/440/NNPDF_INTEG_XT3_40.pineappl.lz4
 with max_as=3, max_al=0, xir=1.0, xif=1.0
Traceback (most recent call last):
  File "/project/theorie/gmagni/miniconda3/envs/nnpdf/bin/pineko", line 5, in <module>
    command()
  File "/project/theorie/gmagni/miniconda3/envs/nnpdf/lib/python3.9/site-packages/click/core.py", line 1130, in call
    return self.main(*args, **kwargs)
  File "/project/theorie/gmagni/miniconda3/envs/nnpdf/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/project/theorie/gmagni/miniconda3/envs/nnpdf/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/project/theorie/gmagni/miniconda3/envs/nnpdf/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/project/theorie/gmagni/miniconda3/envs/nnpdf/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/project/theorie/gmagni/miniconda3/envs/nnpdf/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/data/theorie/gmagni/N3PDF/pineko/src/pineko/cli/theory_.py", line 93, in fks
    theory.TheoryBuilder(
  File "/data/theorie/gmagni/N3PDF/pineko/src/pineko/theory.py", line 448, in fks
    self.iterate(self.fk, tcard=tcard, pdf=pdf)
  File "/data/theorie/gmagni/N3PDF/pineko/src/pineko/theory.py", line 201, in iterate
    f(name, grid, **kwargs)
  File "/data/theorie/gmagni/N3PDF/pineko/src/pineko/theory.py", line 417, in fk
    _grid, _fk, comparison = evolve.evolve_grid(
  File "/data/theorie/gmagni/N3PDF/pineko/src/pineko/evolve.py", line 165, in evolve_grid
    operators, targetgrid=eko.interpolation.XGrid(x_grid)
  File "/data/theorie/gmagni/N3PDF/eko/src/eko/interpolation.py", line 430, in init
    raise ValueError(f"xgrid needs at least 2 points, received {len(xgrid)}")
ValueError: xgrid needs at least 2 points, received 1

Drop Q2 everywhere

We should update code and docs not to use $Q^2$ in place of $\mu_F^2$, in order to avoid ambiguity, and resulting confusion and bugs.

  • fix docs
  • fix code

Also $x$ would be better replaced by $z$ (but this is less crucial).

More details in #53 (comment)

Fktables for NLO theory

This is the NLO eqiuivalent of NNPDF/fktables#5

We list here the Fktables which we cannot compute at the moment for the central NLO theory (208):

  • jets (so needs NNPDF/eko#105)
    • ATLAS_2JET_7TEV_R06
    • ATLAS_1JET_8TEV_R06
    • CMS_1JET_8TEV
    • CMS_2JET_7TEV
  • missing grids
    • ATLAS_WCHARM_WP_DIFF_7TEV (take from dom /media/Fk/fktables/data/appl_subgrids)
    • ATLAS_WCHARM_WM_DIFF_7TEV (take from dom /media/Fk/fktables/data/appl_subgrids)
    • CMSWCHARMTOT (Write the yamldb for the composition CMSWCHARM_WP and CMSWCHARM_WM with operand "add", take the two grids from /media/Fk/fktables/data/appl_subgrids )
    • CMSWCHARMRAT (same of previous but with operands "ratio")
    • CMS_WCHARM_DIFF_UNNORM_13TEV (same as the previous but with operand "add" and different grids: CMS_WCHARM_13TEV_WMC and CMS_WCHARM_13TEV_WPCB)
  • We miss ren sv:
    • ATLAS_TOPDIFF_DILEPT_8TEV_TTRAPNORM (because we are missing ren sv for CMSTTBARTOT8TEV-TOPDIFF8TEVTOT, coming from Sherpa)
    • ATLAS_TTBARTOT_13TEV_FULLLUMI (because we are missing ren sv for ATLAS_TTBARTOT_13TEV_FULLLUMI-TOPDIFF13TEVTOT, coming from Sherpa)
    • ATLASTTBARTOT7TEV (because of ATLASTTBARTOT7TEV-TOPDIFF7TEVTOT, coming from Sherpa)
    • ATLASTTBARTOT8TEV (because of ATLASTTBARTOT8TEV-TOPDIFF8TEVTOT, this should be the same as CMSTTBARTOT8TEV-TOPDIFF8TEVTOT)
    • ATLAS_WM_JET_8TEV_PT ( because of ATLAS_WM_JET_8TEV_PT-atlas-atlas-wjets-arxiv-1711.03296-xsec003 which is coming from ploughshare)
    • ATLAS_WP_JET_8TEV_PT ( because of ATLAS_WP_JET_8TEV_PT-atlas-atlas-wjets-arxiv-1711.03296-xsec002 which is coming from ploughshare)
    • ATLASZPT8TEVMDIST (it is coming from MCFM)
    • ATLASZPT8TEVYDIST (it is coming from MCFM)
    • CMS_1JET_8TEV (coming from NLOjet++)
    • CMSTOPDIFF8TEVTTRAPNORM (because of CMSTOPDIFF8TEVTTRAPNORM-TOPDIFF8TEVTTRAP, coming from Sherpa and total cross-section see above)
    • CMSTTBARTOT13TEV (because of CMSTTBARTOT13TEV-TOPDIFF13TEVTOT, coming from Sherpa)
    • CMSTTBARTOT8TEV (because of CMSTTBARTOT8TEV-TOPDIFF8TEVTOT, coming from Sherpa)
    • CMSTTBARTOT7TEV (because of CMSTTBARTOT7TEV-TOPDIFF7TEVTOT, coming from Sherpa)
    • CMSZDIFF12 (coming from MCFM)
    • ATLAS_WCHARM_WP_DIFF_7TEV
    • ATLAS_WCHARM_WM_DIFF_7TEV
    • CMSWCHARMTOT
    • CMSWCHARMRAT
    • CMS_WCHARM_DIFF_UNNORM_13TEV
  • FTDY (needs VRAP)
    • DYE886R_dw_ite
    • DYE886P
    • DYE906R_dw_ite
  • Things to check:
    • D0ZRAP_40 (working for NNLO, check the bug)
    • D0WMASY
    • ATLASWZRAP36PB
    • ATLASDY2D8TEV
    • ATLAS_DY_2D_8TEV_LOWMASS
    • ATLASPHT15_SF (two bins for each FKtable are correct the others are not)
    • CMSWEASY840PB
    • CMSWMASY47FB
    • CMSWMU8TEV
  • Things with a conjectured solution
    • ATLAS_TTB_DIFF_8TEV_LJ_TRAPNORM (maybe just a 2.0 conversion factor)
    • ATLAS_TTB_DIFF_8TEV_LJ_TTRAPNORM (same)
    • CDFZRAP_NEW (see issue NNPDF/fktables#17)
    • CMS_TTBAR_2D_DIFF_MTT_TRAP_NORM (this is actually correct NNPDF/fktables#5 )

Scaffolding

At the moment, we need a given hierarchy of folders to run.

Unfortunately, this may complicate the usage for a new user (or any non-developer in general), since you have to be manage part of the structure manually, but then it has to respect some constraints.

To improve UX, I propose to implement to following features, keeping them simple, but useful:

  • a unique subcommand to manage a project structure, let's call it pineko scaffold (name to decide, take it as a placeholder)
  • a command to setup a new project pineko scaffold new (possibly aliased to pineko new for simplicity)
    • it should create all the directories, and possibly download some stuffs (e.g. yamldb)
  • a function to verify correctness of the project structure, since it should be declared in the config file pineko.toml, check_folders()
    • it should be run by every subcommand of pineko scaffold before running (simple to implement in click, we just need to put the call in the group function)
    @command.group("scaffold")
    def subcommand():
        check_folders()
  • a subcommand pineko scaffold check to just run check_folders() (and print a simple and nice report if something goes wrong)

Since paths here are crucial, this is related to #51

Optionally:

  • a subcommand pineko scaffold update to download a new version of remote resources (e.g. yamldb)
    • in case, better to keep the old one as backup (put in a .old-data folder somewhere, and mangle with a timestamp)
  • a subcommand pineko scaffold repair, to fix the structure according to the current one spelled out in pineko.toml

Strong coupling constants are wrongly calculated

This was originally reported here: NNPDF/pineappl#226, but I'm quite confident that this is a bug in Pineko, specifically these lines:

pineko/src/pineko/evolve.py

Lines 200 to 207 in f8c3261

alphas_values = [
4.0
* np.pi
* sc.a_s(
xir * xir * muf2 / xif / xif,
)
for muf2 in muf2_grid
]

which evaluate the strong coupling constant that do not correspond the renormalization scales given here:

xir * xir * mur2_grid,

I believe the patch is to change these lines as follows, with which I get the correct evolution (tested with CMS_TTB_5TEV_TOT):

@@ -201,9 +201,9 @@ def evolve_grid(
         4.0
         * np.pi
         * sc.a_s(
-            xir * xir * muf2 / xif / xif,
+            xir * xir * mur2,
         )
-        for muf2 in muf2_grid
+        for mur2 in mur2_grid
     ]
     # We need to use ekompatibility in order to pass a dictionary to pineappl
     fktable = grid.evolve(

Pineko doesn't respect `_template.yaml`

I've noticed pineko doesn't respect all options in _template.yaml. I've noticed with n_integration_cores which gets filled as 0 regardless of what I put in the template.

This is with a fresh installation of pineline[full].

I've noticed this one because my computer crashed after eko took every possible resource, but it might be other options that are not respected. I'll have a look whether there's something else (relevant for physics) being ignored or whether it is a problem of the pineline package in pypi.

Fix click help - avoid running group code

The problem is here: since this code is always run, also when the user is just asking for help.

I propose this should be solved in this PR, at least for this subcommand.

Originally posted by @alecandido in #69 (comment)

It's a complex task, and not deadly relevant. I now propose to postpone for later.

Default ren scale different from fact

@andreab1997 Neither of those two options, it will be evaluated at xir * xir * mur2, where xir is 1.0 in your example and mur2 is set internally and may differ w.r.t. Q2. As I said this usually isn't the case because in pinefarm we always set mur = muf, but for some grids this is the case, see for instance here NNPDF/pineappl#25 (comment).

Originally posted by @cschwan in #53 (comment)

We should take into account that sometimes we can not use the central fact scale (as coming from EKO) as the central ren scale.

Configuration database(s)

Current status

  • 1 theory database out of which a single record is selected for a fit
  • the record contains tentatively all information needed for evolution and DIS
  • (this is of course mainly due to historic reasons)

Problems

  • FNS actually should only matter for cross sections, but instead is also (ab-)used for evolution
    • FONLL is not involved in evolution and its threshold does not play any role in evolution
    • instead a dedicated threshold for cross section is needed, that can be chosen freely and independent of evolution thresholds
  • some of the settings are redundant: M_Z, M_W, sin(theta_w), G_F, alphaqed are not linearly independent
  • in principle the card could be divided into two parts (evolution <-> cross sections) with some few settings shared

Configurations ("o-cards")

  • both eko and yadism will be shipped default-less (in big contrast to APFEL)
  • both eko and yadism require some additional configurations:
    • in eko "operators": the target scale of the operators, the discretization and some numerical details
    • in yadism "observables": the discretization, DIS configurations (currents, hadron, lepton) and the target functions, e.g. F2total(x=0.1, Q2=90)
  • we require a mapping of the old settings to the new settings (which both programs already use at this point)
    • our current implementation of this remapping is given here
    • the current status of eko already ignores the FNS setting, but instead it has to be fed with the correct kcThr
    • yadism has to be fed with the correct kDIScThr (the name can of course be changed to kxscThr or similar) to determine the thresholds and FNS to determine the (re-)combination of heavy/light coefficient functions
    • in order to implement FONLL correctly (to our understanding) we need to deactivate the charm threshold in eko, but not in yadism

Proposed Workflow

  • pineko should determine a consistent configuration for both eko and yadism
  • ask them each in turn to compute their ingredients
  • join their respective outcome to provide what is needed: a mapping f_j(x,Q_0) -> theory prediction

Questions

  • how are the "observables" currently determined?
  • how can we organize and maintain the configurations?
  • how can we ensure the consistency between (experimental) dataset and theory?

Prepend `theory_` to theory card name

In order to have the possibility of storing all the theory cards in a single place I propose to prepend the string theory_ to the theory ID and change this line accordingly:

return configs.configs["paths"]["theory_cards"] / f"{theory_id}.yaml"

This should make the syntax compatible to the one used in pinefarm here and validphys:

Create the absolute theory

Rather than s trimming as proposed in #61 (or together with a trimming of unnecessary information). Let's define here in pineko the absolute theory. As discussed during this Wednesday code meeting, there is nothing in n3fit/vp that uses the theory card that is not used also by pineko (PTO being the only thing that is actually used but as mentioned, the order needs to be known by pineko to create the fktables/eko).

Once we have an absolute theory we just swap the one in the nnpdf repository for whatever we decide here.

Make opcard template theory dependent

A practical limitation is the current pineko design: because pineko currently assumes a global operator_card_template:

operator_card_template = "data/operator_cards/_template.yaml"

  • which defines the interpolation_xgrid which is the internal grid used to compute all operators
  • but it could even hold a inputgrid, which instead defines the grid exposed to the fit (consider that for eko input=fitting scale)
  • I guess we should change this design to make the template theory dependent, i.e. scope it by the theory id as all the other stuff

Originally posted by @felixhekhorn in #42 (comment)

Interpolation

This is something that it's not strictly related to pineko, but it's involving all the 3 projects pineappl, eko, and yadism, simply from the opposite end.

                    yadism
                 /          \
interpolation  --  pineappl  --  pineko
                 \          /
                     eko

Issue

At the moment eko, yadism and pineappl are all making use of interpolation, and in theory it's the same one, but in practice:

  • pineappl it's using its own implementation, that should be separately maintained and keep consistent (e.g.: weight function it is present in pineappl, but this does not mean that it is automatically used in the other two projects)
  • eko has its own implementation as well
  • yadism it's making use of the implementation of eko, but this yields a dependency on eko that conceptually it's not needed, indeed interpolation should be implemented somewhere, and by chance it was already implemented in eko

Main motivation

During the yadism-pineappl integration we are acquiring a new dependency in yadism, and so we are thinking about dropping the dependency in eko, since without alpha_s (stripped by the pineapplgrid filling) and without interpolation the two are going to be very loosely coupled, if not at all (there is just one residual small bit we could take care of our own).

Proposal

Should we join the three dependencies in a single package?

Downsides

  1. it might be implemented in rust, i.e. it should be the pineappl one, since we are able to write python bindings from rust (as for pineappl, so in particular they are already available) but not the opposite (if it even made sense at all...), but then we don't know if numba could complaint or not having rust in the game
  • this may also be just a worry, because maybe numba it's not interacting enough to complaint, but maybe yes...
  1. Christopher please add yours
  • I don't know how much effort would be for you to split interpolation in a separate crate

More pre-commit

Taking some inspiration from poetry https://github.com/python-poetry/poetry/blob/master/.pre-commit-config.yaml I suggest to update our pre-commit config

  1. by opting into
      - id: check-merge-conflict
      - id: check-case-conflict
      - id: check-ast
      - id: check-docstring-first
    
  2. adding
     - repo: https://github.com/pre-commit/pre-commit
       rev: v2.20.0
       hooks:
         - id: validate_manifest
    
  3. opting into https://pre-commit.ci/ (with autofix_prs: false) to enforce people running pre-commit (right @andreab1997 ? 🙃 )

what do you think? @alecandido @andreab1997 @niclaurenti @giacomomagni

Improve Couplings treatment

We can improve the computation of the necessary couplings for PineAPPL here

pineko/src/pineko/evolve.py

Lines 181 to 188 in 63af934

sc = eko.couplings.Couplings(
tcard.couplings,
tcard.order,
evmod,
quark_masses,
hqm_scheme=tcard.quark_masses_scheme,
thresholds_ratios=np.power(list(iter(tcard.matching)), 2.0),
)

by

  • using the physical masses, i.e. if we want MSbar masses actually use them
  • determine the $n_f$ of the central and pass that as nf_to

Grids used for new theories are ordered in the wrong way

with the help of @scarlehoff, I found the problem of the new theories. Doing for example pineappl convolute BCDMS_NC_EM_D_F2.pineappl.lz4 NNPDF40_nnlo_as_01180 for both the grids used for theory 405 (theory 4 in dom) and theory 424 (theory 24 in dom) what you get is

bin     Q2           x          F2d      scale uncertainty
---+-----+-----+-----+-----+------------+--------+--------
  0  8.75  8.75  0.07  0.07 3.8331501e-1  -10.65%    5.72%
  1 10.25 10.25  0.07  0.07 3.8615391e-1   -7.25%    5.42%
  2 10.25 10.25   0.1   0.1 3.6238832e-1   -6.00%    3.72%
  3 11.75 11.75   0.1   0.1 3.6313899e-1   -5.42%    3.55%
  4 11.75 11.75  0.14  0.14 3.3887777e-1   -4.40%    1.97%
  5 11.75 11.75  0.18  0.18 3.1553538e-1   -3.66%    1.25%
  6 11.75 11.75 0.225 0.225 2.8649654e-1   -2.97%    2.22%
  7 13.25 13.25   0.1   0.1 3.6377078e-1   -5.18%    3.40%
  8 13.25 13.25  0.14  0.14 3.3826829e-1   -4.19%    1.89%
  9 13.25 13.25  0.18  0.18 3.1408120e-1   -3.47%    1.18%
 10 13.25 13.25 0.225 0.225 2.8441184e-1   -2.80%    2.11%
bin     Q2           x          F2d      scale uncertainty
---+-----+-----+-----+-----+------------+--------+--------
  0  8.75  8.75  0.07  0.07 3.7227338e-1  -11.74%    5.16%
  1 10.25 10.25  0.07  0.07 3.7563029e-1  -10.90%    4.83%
  2 10.25 10.25   0.1   0.1 3.5371954e-1  -10.01%    4.38%
  3 11.75 11.75   0.1   0.1 3.5478369e-1   -9.34%    4.16%
  4 13.25 13.25   0.1   0.1 3.5576112e-1   -8.75%    3.96%
  5 11.75 11.75  0.14  0.14 3.3177244e-1   -8.04%    5.73%
  6 13.25 13.25  0.14  0.14 3.3149059e-1   -7.52%    5.51%
  7    15    15  0.14  0.14 3.3118411e-1   -7.02%    5.29%
  8    17    17  0.14  0.14 3.3076224e-1   -6.59%    5.09%
  9    19    19  0.14  0.14 3.3045852e-1   -6.21%    4.92%
 10 11.75 11.75  0.18  0.18 3.0924455e-1   -6.60%    7.35%

So, as you can see, while the grid for theory 424 is ordered by q2 (and this is the correct way since the commondata are ordered in the same way), the gird for theory 405 is ordered by x value. This problem then propagates to the FKs and then to the predictions causing the problems we have seen.
@felixhekhorn @alecandido @cschwan Do you have some idea on why the two grids are ordered in different ways? Maybe something is changed in the runcards (?)

FK Spec

Background

How the fktable is loaded to be used in the fit can be seen here. While the actual convolution is performed here and here.
Some variables that I use there and I'm going to use to define the stuff below:

  • ndata: number of data points
  • nbasis: number of channels that are different from 0
  • nx: number of points in the grid in x

Input

I will ask for the fktable by giving the following information:

theoryid: 200
dataset_inputs:
    - {dataset: ATLAS_WZ_TOT_13TEV, cfac: [NRM, QCD]}

If you apply the c-factors for me it is much appreciated. Otherwise validphys will do so.
The way it works right now is that validphys checks whether the theoryid exists in <prefix>/share/NNPDF/data and if it doesn't it downloads the json to see whether it exists in the server.

If it exists it downloads the theory and then loops over all datasets opening the file, reading the data, applying the c-factors if needed, etc...

Actual specs

Once the theory has been downloaded and opened I need to get the following information:

  • fktable
    A tensor of shape (ndata, nbasis, nx, nx) for hadronic processes or (ndata, nbasis, nx) for DIS processes.
  • xgrid
    The grid in x. If/when pineapple tables use the same grid for all fktables this can come from somewhere else or be a pointer to some common place.
  • basis
    List of channels that are different from 0. For DIS fits this is a list of PDF flavours like (1, 2, 3, -3) and anything more complicated than that I guess would be silly. For hadronic fits you can decide how to give me the information. The way I use it is by creating a PDF x PDF = Luminosity tensor and then masking it with a boolean tensor of size (flavours, flavours).

Then some metadata that would make my life easier (but the information is contained in the objects above):

  • ndata: number of experimental data points
  • nx: number of points in the x-grid
  • nbasis: number of non-zero channels
  • hadronic: boolean flag telling me whether the process is hadronic or not

Note that I don't put any constraints in the type of the different objects. To actually operate with them I will make them into tensorflow tensors so numpy arrays is the more natural choice but I can work with anything.

Wishlist

I'm not entirely sure you can compress the current .dat files much more than they currently are (I don't know how much effort was put into compressing them back in the day) but it would make me much happier if instead of downloading 3 GB to do a fit I could download just 1.

Scaffold from the theories repository

One thing that might be useful is to add an option to scaffold to download some relevant data from the theories repo.

So one can do:

pineko scaffold --theory 400

and it will automatically download the theory runcard for 400 (and put it in the right place) and the operator template.

In favour/against ?

Conflicting numpy requirements

Compiling PineAPPL's Python interface I get the following error:

   Compiling pineappl v0.5.0-beta.6 (/home/cschwan/projects/pineappl/pineappl)
   Compiling pineappl_py v0.5.0-beta.6 (/home/cschwan/projects/pineappl/pineappl_py)
    Finished release [optimized] target(s) in 15.83s
📦 Built wheel for CPython 3.9 to /tmp/.tmppj76Gp/pineappl-0.5.0_beta.6-cp39-cp39-linux_x86_64.whl
⚠️  Warning: pip raised a warning running ["-m", "pip", "--disable-pip-version-check", "install", "--force-reinstall", "/tmp/.tmppj76Gp/pineappl-0.5.0_beta.6-cp39-cp39-linux_x86_64.whl"]:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
numba 0.55.1 requires numpy<1.22,>=1.18, but you have numpy 1.22.1 which is incompatible.
🛠  Installed pineappl-0.5.0-beta.6

Allow explicit assumptions in the theory

Just the same as it is allowed with convolute.
We need it for the theory that allows for c-cbar.

Btw, convolute doesn't work for me.

pineko convolute data/grids/200/E906deut_bin_09.pineappl.lz4 data/ekos/200/E906deut_bin_09.tar test 0 2                                                                           
┌───────────────┐
│ Computing ... │
└───────────────┘
   data/grids/200/E906deut_bin_09.pineappl.lz4
 + data/ekos/200/E906deut_bin_09.tar
 = test
 with max_as=0, max_al=2, xir=1.0, xif=1.0
Traceback (most recent call last):
  File "/home/juacrumar/.cache/pypoetry/virtualenvs/pinefarm-HwI1B8mU-py3.10/bin/pineko", line 8, in <module>
    sys.exit(command())
  File "/home/juacrumar/.cache/pypoetry/virtualenvs/pinefarm-HwI1B8mU-py3.10/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/juacrumar/.cache/pypoetry/virtualenvs/pinefarm-HwI1B8mU-py3.10/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/juacrumar/.cache/pypoetry/virtualenvs/pinefarm-HwI1B8mU-py3.10/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/juacrumar/.cache/pypoetry/virtualenvs/pinefarm-HwI1B8mU-py3.10/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/juacrumar/.cache/pypoetry/virtualenvs/pinefarm-HwI1B8mU-py3.10/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/juacrumar/.cache/pypoetry/virtualenvs/pinefarm-HwI1B8mU-py3.10/lib/python3.10/site-packages/pineko/cli/convolute.py", line 52, in subcommand
    _grid, _fk, comp = evolve.evolve_grid(
  File "/home/juacrumar/.cache/pypoetry/virtualenvs/pinefarm-HwI1B8mU-py3.10/lib/python3.10/site-packages/pineko/evolve.py", line 133, in evolve_grid
    alphas_values = [op["alphas"] for op in operators["Q2grid"].values()]
  File "/home/juacrumar/.cache/pypoetry/virtualenvs/pinefarm-HwI1B8mU-py3.10/lib/python3.10/site-packages/pineko/evolve.py", line 133, in <listcomp>
    alphas_values = [op["alphas"] for op in operators["Q2grid"].values()]
KeyError: 'alphas'

More Unit Tests

I know this was almost the title of #25, but many of them relied on LHAPDF, so they have been moved as benchmarks (not a great name at this point, they are simply not unit tests).

But what is left is largely insufficient, so we definitely need more unit tests. Even mocking, if needed.

We need to keep track of cfactors names

Since we have different names for the datasets (and for the operands when the observable has an operation) we need either to:

  1. Keep track of the names from the compound in the yaml database for loading the cfactors.
  2. Change the names of the cfactors. <--- We might want to do this in the future (since the cfactors should eventually match the pineappl names) but at the moment I think we need option 1 (or create a new theory in which all names are changed).

One practical example, our yaml entry for D0WMASY is:

appl: true
conversion_factor: 1.0
operands:
- - D0WMASY-grid-40-6-15-3-Wplus_wly_pt25
- - D0WMASY-grid-40-6-15-3-Wminus_wly_pt25
operation: ASY
target_dataset: D0WMASY

but the cfactor files for those two tables are: CF_QCD_D0WMASY_WM.dat and CF_QCD_D0WMASY_WP.dat

When the name happens to be the same it works.

Simplify `max_as/l` application

When max_as/l are specified, to use grids at a lower order than available (e.g. computing an NLO theory from NNLO grids), everything might be done consistently just cutting the grids during loading: everything else (checks and operations) will be done adaptively according to the loaded grids, that will always be used completely.

This will have the advantage of only loading (or keeping in memory) the required subgrids, lowering the memory impact of Pineko in these cases, and the even greater advantage of simplifying the code, since there will be no need to separately consider the case of grids containing orders higher than needed.

MHOU implement scheme B+C

We should implement schemes B and C of https://inspirehep.net/literature/1741422

  • scheme B: use fact scale variation from evolution (of course ren scale still in processes)
    • fact scale variation should be resummed (exponentiated) or not (expanded) according to the chosen evolution mode, see EKO docs
  • pineko needs to request for scheme B the central scale (and not the shifted one as at the moment of writing is doing); effectively eko is taking fully care of the SV
  • scheme C: both scale variations coming from processes

Both schemes do not involve refitting (because they are defined this way).

SV check does not take into account the PTO

Currently our SV check does not take into account the fact that renormalization sv only start at NLO where NLO actually means next with respect to the first non-zero order of the process.
Therefore, it does not allow the computation of a sv Fktable for DIS when PTO=1 (which in DIS case is the first non-zero order).

Check available ekos

At this point we start having a few ekos computed, and as we found out with @felixhekhorn computing one of them is quite an expensive task (at least for development, since it's blocking us).

Moreover, I believe that a few eko can really be recycled, e.g. I think that most of LHC pineapplgrids will have the same x_grid and Q2_grid.

However it is a bit annoying to check which one of the available ekos is the correct one, but we have all the tools required to automatize the process.

Proposal

Let's make a trivial function (and maybe provide a subcommand) that given a folder explore the files inside (or even all the files in the tree) for ekos, and output all the compatible ones that has been found.

Note

This is a step forward towards the toolchain automation and assets management. I'm considering that maybe we'll simply need to progressively improve tasks that we're doing manually at the moment, instead of projecting an actual automation pipeline all-together.

Logo

I would put pineapple slips on top of this:

image

  • we can remove the eyes
  • we can just show the shadow

We can even avoid putting slips, and scramble colors: we push the top to greenish and the rest to oranges, to have a pine cone colored like a pineapple :)

Whenever I'll have spare time I'll provide candidates ^^

Missing metadata?

The following lines look a bit fishy:

pineko/src/pineko/evolve.py

Lines 186 to 194 in 60bb344

fktable.write_lz4(str(fktable_path))
# compare before/after
comparison = None
if comparison_pdf is not None:
comparison = comparator.compare(
grid, fktable, max_as, max_al, comparison_pdf, xir, xif
)
fktable.set_key_value("results_fk", comparison.to_string())
fktable.set_key_value("results_fk_pdfset", comparison_pdf)

You add two key-value-pairs, but the FK table is written before, and that's the only call to write_lz4 in the project. That probably means that the metadata is never propagated to the file on disk.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.