nnpdf / yadism Goto Github PK
View Code? Open in Web Editor NEWYet Another DIS Module
Home Page: https://yadism.readthedocs.io
License: GNU General Public License v3.0
Yet Another DIS Module
Home Page: https://yadism.readthedocs.io
License: GNU General Public License v3.0
Steps:
Functions integrated by scipy.integrate.quad
are currently composed by nesting multiple lambda
s.
Calling this objects can be a bottleneck, so:
yadism
while producing a "DIS operator"quad
by constructing a proper numba objectWe are wondering if it is possible on general grounds to reorganize something of this kind:
using constant coefficients for the distribution part:
If this is possible:
DistributionVec
a little in order to collect all the singular bits in a single one (so we will have a fixed amount of components equal to 3 for all orders)log(1-z) ** k / (1-z)
pdf(x/z)/z - pdf(x)
, that is available since what we currently are doing is multiplying each one of these two terms by a factor that is constant in the second expression (\tilde{c}_k
)some ideas of visualizations we may add to yadmark (as they do not belong into yadism itself)
for the true structure functions predicted by a certain PDF, we should show the traditional DIS plot (e.g. Figure 18.2 in PDG): y-axis: F, x-axis: x, Q2 by offset
for the DIS operators we could do smth similar to EKO:
in LO this should generate smth like:
| o o x
| o x o
| x o o
-------
because this represents the delta-function
master
/stable
from the base domain (or write a home page)
.nojekyll
, the redirecting, and the automatically deployed subfolders$dest_dir
(of course before building)It seems useful to be able to look at the developing docs online, furthermore if there are multiple authors working on it, or if you want to present something to someone else.
On the other hand it seems a really bad idea to replace the master documentation with some unstable version, so it is important that the main (landing) one it is the one related to the current release (or master for the time being).
Split ParentTest
class:
conftest.py
will contain everything needed at runtime, imported as a fixturesomething-else.py
will contain all the management - analysis toolsAdd _modified
/_creation_time
metadata to inputs:
Investigation work in progress.
At some point we might deserve another round of analysis on:
Steps:
run_dis
functionxspace-bench
Keeping investigating from #29 we decided to have a look into a proper way to replace and speed up the hard part of this library.
The situation is the following:
f(x) = sum(alpha**k * f_k(x))
, that involves multiplying and summing functionsWe are searching for a proper tool. Currently:
lambda
s, and the only efficiency measure is try to keep nesting as minimal as possible, based on a manual type check (the best we can achieve with lambda
s only)what we would like:
tensorflow
, but maybe what we need is just the compile part, that maybe is xla
(I'm still trying to understand the tf
internals), and so jax
would be enough (it just ship autograd
, that we don't need, together with xla
and nothing more, rather minimal if compared with tf
)tfp.math.ode.Solver
to target performance?Since apfel is static (from the point view of its implementation) as long as we keep the same input there is no reason to rerun separately in every workflow, so we can store the cache somewhere accessible from the workflow.
The following two strategies (and a mixed one) are available for generation:
In order to keep the test size constant (or not growing exponentially with features), we can always run the test for isolated features, but selecting randomly only some combinations for each run, since there is no principle to decide a priori that some combination are more important than others.
Both the light and heavy structure functions at NLO are still off by a few percent (even more in some cases).
This should be fixed, stripping the difference due to interpolation.
In order to import external formulas, like ones in vogt, that are usually encoded in small fortran functions, it would be a good a idea to download and collect them in a single 'external' folder, and write a Makefile to compile this folder in a corresponding one importable from python.
Than Makefile should provide an install command to install the new python importable coefficient functions' expressions in yadism package.
In the package everything should be organized as it is, but the ESF implementations (F2light, F2charm, ...) instead of defining themselves the coefficient functions should only import them.
Which is the number of active flavors?
Threshold
object is the main responsible for the number of flavors alwaysyadism
eko
MaxNfPDFs
should affect the number of flavors relative to the inputyadism
eko
MaxNfAlphas
should affect the number of flavors in evolutionCompare with a cache of APFEL's results, if available
Make a separate test for each observable: F2light
, FLlight
, F2charm
, ...
When introducing a new perturbative order the contribution of the new order should be compared with the correspondent from APFEL, instead of the full object.
Maybe the two things should be combined (if the correction is too irrelevant maybe it should not have to be too accurate)
Probably no one of the two things make sense on its own, so maybe if the relative error is big enough also the absolute one should be checked against something else (what?), if it is too small maybe it's not really an error.
x
s, Q2
sAPFEL is doing something strange about alpha_s evolution order and FNS (doing one thing for "FFNS && NC" and another one for "everything else").
We should answer to following two questions:
Make another object for managing ESF ouput structure, that for the moment consists in:
Furthermore it's quite uncomfortable not to be able to sum or multiply by a scalar output objects, since they are dictionaries, so it would be good to implement proper methods in the new class and use them for TMC calculation.
The keyword is decoupling.
Since an ESF
is a huge object collecting PartonicChannel
s and more (weights, is aware of computing methods) is a bad idea to pass to PartonicChannel
s a reference to all this mess.
Moreover ESF
itself holds a reference to SF
, and so literally to everything.
Q: What do we need for PartonicChannel
s from ESF
?
A: cd cc; grep SF && cd nc; grep SF
Seriously there are only 4 objects needed:
nf
, needed only for lightQ2
, M2hq
needed for both asy and heavyx
needed only for heavy (to set the threshold)Just find a way to pass down this stuff by value.
Since we are using a bunch of lambdas and functions calls in other functions (cf. all the full DistributionVec API) maybe at some point this can slow down the execution.
Instead of find a way to avoid this nesting and so on, rewriting stuff in a proper fashion with even more tricks, it would be nice to find a way to precompile the functions before using them (e.g. for integration in scipy.integrate.quad
where a lot of functions' calls are involved).
Of course, if needed, this can involve rewriting the functions definition in a proper fashion, adding suitable decorator or whatsoever, but the goal is to avoid to change critically the structure because of this.
My idea of precompiling: of course I'm not an expert, otherwise I would have already figured out the proper way, but in the worst scenario means define a function that takes another one as input and go through the calls collecting everything in a single expanded expression, to be evaluated again as python code (hopefully an external library can do this for us, I don't know if numba
or what else)
In order to make things easy for the user the current setup is not the best one.
There are two basis involved:
Since the first one is needed only for internal purpose we should:
Numerics question: is it a good idea to join R+S+L under one integral?
because
in order to explore this, one would need to compute an exact convolution with say x^a(1-x)^b
for various a and b and check the error and performance
a first but not followed attempt was made in #28
TL;DR: default and/or fallbacks with report parameters or raising errors for anything, including missing arguments?
I'm wondering how to do this (actually if to do this).
In apfel once that defaults are applied, conflicts are solved and so on, it prints the parameters that is actually sing for both the evolution (not in this project) and the DIS observables (if you initialize also apfelDIS).
Discussing with @felixhekhorn this week he suggested that this feature may be not reimplemented, instead it's better to raise an error if there is any conflict at all.
I quite agree with Felix, errors are more pythonic than hidden fallback for sure, and I think that it is also better in general, but now I'm considering if including any default at all.
So the question is: if there is any input, any default of any kind, I think that would be better to report parameters used in some way (dumping on a file, printing as output, ...), otherwise we should raise an error also for any missing argument.
Since the success in eko
new output format implementation NNPDF/eko#76 I propose to follow the same path here, and so to split the operator from the metadata.
Main reason supporting this is that we'll save a lot in loading and dumping time.
Here the typical structure of output is a bit different from those in eko
, and a bit more complicate (but still perfectly suitable).
An Output
object is still basically a dictionary, e.g. with the following keys:
[ins] In [19]: out.keys()
Out[19]: dict_keys(['XSHERACC', 'interpolation_is_log', 'interpolation_polynomial_degree', 'interpolation_xgrid', 'pids', 'projectilePID'])
At the moment we need an input runcard in order to be able to load the observable, because we don't know otherwise which are the observables (e.g. XSHERACC
) names in the output object.
There are two possible solutions:
obsvervables
in the object instanceobservables
entry with a list of names of observablesWhile every other value is just a scalar (at most a list/array of scalars), each observable is a very nested object.
Each observable is a list of ESF, potentially of different length. In turn, each ESF has the same structure.
[ins] In [14]: e = out["XSHERACC"][0]
[ins] In [15]: len(out["XSHERACC"])
Out[15]: 42
[ins] In [16]: np.array(e.get_raw()["orders"][0]["values"]).shape
Out[16]: (14, 50)
[ins] In [17]: e.get_raw()["orders"][0].keys()
Out[17]: dict_keys(['order', 'values', 'errors'])
[ins] In [18]: e.get_raw().keys()
Out[18]: dict_keys(['x', 'Q2', 'nf', 'orders'])
Thus we can make it an array, what we need is:
x
, Q2
, and nf
somewhere else, i..e in metadata)pids
and interpolation_xgrid
)[value, error]
We need:
x
, Q2
, and nf
for each ESF in each observable (so one list per observable, made of 3-tuples)Literature:
Issues:
IC=0
, current scenario is:
IC=0
for NC at the level of the coefficient functions (LO not provided for heavy structure functions), not sure at the level of the weights (maybe they are provided also for heavy inputs)IC=1
(but also IB=1
and IT=1
) for CC: all the weights are always implemented and coefficient functions as well, for both light and heavy inputsIntrinsicQuark
(=IQ
) specified in [m,n)
range should be:
n
(excluded), because all the heavy not intrinsic inputs should not be present as PDFsFNS
scheme should be consistent, so only the flavors up to m
(excluded) can be considered as light in the fitting region (than if it is a VFNS
it can change at higher Q2
)Note that the heavy object here is referred to the input, not to the output/object coupling to the vector, so the heavy structure functions should be always implemented for both heavy and light inputs, and are the weights that controls if they are needed or not, not the coefficient functions (so the fact that is actually working for the NC is just a chance)
Info printed during yadism
runs are still partial, to be included
To be reorganized:
get_weights
, but it's reasonable that some values are constant per-run, like MZ
and so on, and instead they are printed over and over
MZ
), while not logging for each and everydistribution_vec
?Properly deal with them, even though at the moment it looks like that doing it improperly is not an issue for some observables...
From the email exchange on 19.08.20 between SF, ERN, SC, CS, FH, AC we can already give two references
At the moment the theory runcard is defining three kinds of heavy quark masses:
mc
: the mass of the charm (e.g.)Qmc
: the reference scale at which mc
is evaluated as a Lagrangian parameter in a renormalization scheme (say MSbar)kcthr
: so the actual new mass is given by kcthr * mc
, and it is the threshold used by any VFNS for charmFor yadism
they all coincide currently, because:
POLE
are not implemented (so Qmc
is irrelevant)kcthr
is always considered oneLet's fix at least the second, and we will deal with the first as soon as HQMS will be implemented.
There are still some stuffs that I would like to improve at some point:
Steps:
Set up released based workflow
F2
-FL
, and also the new ones/ x * z
Just put:
Steps:
This code must contain at least 2 modules:
In the first module we should restrict ourselves to C-style programming, this is due to the limitations imposed by numba.
How can i add the dependence on
lhapdf
eko
in meta.yaml
?
The second one prevent me to import interpolation.py
(that I'm currently copying) and the first one is definitely causing the failure of the build.
At LO order APFEL is already dependent on Q2, while we are not: why?
Also:
xiF, xiR != 1.0
in the PineAPPL generation we're hard coding at the moment the hadronic target: this might become a problem with NuTeV and CHORUS
(the lepton is already correct:
https://github.com/N3PDF/yadism/blob/23c413de082d550b2161512b90a2157e2e69c3c2/src/yadism/output.py#L241
)
Some virtual contributions to DIS seems missing from calculations, there will be for sure a good reason but we haven't found it:
In particular the first diagram will make a virtual correction proportional to nf at NLO, and both to nf^2 at NNLO, but it is easy to see that there is nothing like that around...
Add log info (and verbose mode) with python logging
library.
instead of making quad
cluttering the output with integration errors logging them is the proper thing to do, so just use the full_output
argument of quad
, collect them and put in the global logs.
Document anything written up to this point.
yadism
Document also tests:
The Heavy Quark Mass Scheme implementation it's not so hard, it will require:
eko
Indeed at NLO there is nothing to do on NC (apart for the IntrinsicCharm case) because it is just LO Heavy Quark.
Since to convolute with a distribution a distribution is needed it makes sense to think the convolution function as a feature of a distribution itself (even more since the second operand can't be another distribution but should be a regular test function).
So:
convnd
a method of DistributionVector
__add__
method, able to sum dvec with dvec and dvec with regular functions
__radd__
and __iadd__
__mul__
method, to multiply by a scalar
__rmul__
and __imul__
__iter__
method to be used in convnd
function
__getitem__
once doneconvnd
: take care of integration limits properlyWe noticed that there is a common source of differences (rel_err[%]
) between us and APFEL coming from Q2
interpolation (or also anything APFEL-like, i.e. Q2
interpolating)
yadmark
we can get the numbers for the separate channels from yadism
(the "DIS operator") simply calling runner.get_result()
and only after calling .apply_pdf()
on the resultESF
where a cancellation between flavors is happening (just looking for the maximum of each flavor channel and computing the absolute ratio with the summed result)The actual number to choose for the absolute error is difficult to find, because the proper value should be the sum of absolute errors on the individual flavor channels, but in order to find this we should be able to break the APFEL's result as well on the individual flavors, but if these results where available we could have compared directly the channels instead of the recombined sum.
My personal guess for this number is a function of the sum/max(flavors)
ratio, protecting for yadism
recombined result going to 0 (and so ratio=0 -> rel_err=100%
) and protect for the full yadism
result where the cancellation is still present but it is more likely APFEL to go to 0 (1%<ratio<20% -> abs_err=yad_result
).
Define unit tests for all ...StructureFunction
s classes and logic:
StructureFunction
ESF
ESFTMC
Useful ideas:
pytest.monkeypatch
to fake intermediate object difficult to access (you can register functions/objects to be used in place of something else, even without entering inside)A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.