Coder Social home page Coder Social logo

Precompile nested functions about yadism HOT 17 CLOSED

alecandido avatar alecandido commented on August 22, 2024
Precompile nested functions

from yadism.

Comments (17)

alecandido avatar alecandido commented on August 22, 2024

Probably we should take a look in numba, that means #27 is maybe enough to solve also this one.

from yadism.

alecandido avatar alecandido commented on August 22, 2024

Numba is making more damages than giving advantages...

@scarrazza @scarlehoff have you any advice?

Fri Apr 17 17:32:26 2020    apfel_w_numbify.prof

         169133690 function calls (160232623 primitive calls) in 183.925 seconds

   Ordered by: cumulative time
   List reduced from 8601 to 11 due to restriction <11>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   1382/1    0.013    0.000  184.000  184.000 {built-in method builtins.exec}
        1    0.000    0.000  184.000  184.000 benchmark_against_apfel.py:4(<module>)
        7    0.001    0.000  182.688   26.098 benchmark_against_apfel.py:55(run_against_apfel)
        7    0.000    0.000  181.581   25.940 /home/alessandro/projects/N3PDF/dis/src/yadism/runner.py:190(apply)
        7    0.018    0.003  181.581   25.940 /home/alessandro/projects/N3PDF/dis/src/yadism/runner.py:140(__call__)
        7    0.000    0.000  181.116   25.874 /home/alessandro/projects/N3PDF/dis/src/yadism/runner.py:130(get_output)
        7    0.000    0.000  181.116   25.874 /home/alessandro/projects/N3PDF/dis/src/yadism/structure_functions/__init__.py:49(get_output)
      126    0.000    0.000  181.116    1.437 /home/alessandro/projects/N3PDF/dis/src/yadism/structure_functions/EvaluatedStructureFunction.py:65(get_output)
      126    0.002    0.000  181.115    1.437 /home/alessandro/projects/N3PDF/dis/src/yadism/structure_functions/EvaluatedStructureFunction.py:35(_compute)
      252    0.065    0.000  181.114    0.719 /home/alessandro/projects/N3PDF/dis/src/yadism/structure_functions/EvaluatedStructureFunction.py:48(_compute_component)
    14868    0.361    0.000  181.019    0.012 /home/alessandro/projects/N3PDF/dis/src/yadism/structure_functions/convolution.py:139(convolution)
Fri Apr 17 17:29:04 2020    apfel_wo_numbify.prof

         61928349 function calls (61900659 primitive calls) in 61.382 seconds

   Ordered by: cumulative time
   List reduced from 8070 to 11 due to restriction <11>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   1382/1    0.015    0.000   61.399   61.399 {built-in method builtins.exec}
        1    0.000    0.000   61.399   61.399 benchmark_against_apfel.py:4(<module>)
        7    0.001    0.000   60.217    8.602 benchmark_against_apfel.py:55(run_against_apfel)
        7    0.000    0.000   59.400    8.486 /home/alessandro/projects/N3PDF/dis/src/yadism/runner.py:190(apply)
        7    0.018    0.003   59.400    8.486 /home/alessandro/projects/N3PDF/dis/src/yadism/runner.py:140(__call__)
        7    0.000    0.000   58.927    8.418 /home/alessandro/projects/N3PDF/dis/src/yadism/runner.py:130(get_output)
        7    0.000    0.000   58.927    8.418 /home/alessandro/projects/N3PDF/dis/src/yadism/structure_functions/__init__.py:49(get_output)
      126    0.001    0.000   58.926    0.468 /home/alessandro/projects/N3PDF/dis/src/yadism/structure_functions/EvaluatedStructureFunction.py:65(get_output)
      126    0.002    0.000   58.926    0.468 /home/alessandro/projects/N3PDF/dis/src/yadism/structure_functions/EvaluatedStructureFunction.py:35(_compute)
      252    0.058    0.000   58.924    0.234 /home/alessandro/projects/N3PDF/dis/src/yadism/structure_functions/EvaluatedStructureFunction.py:48(_compute_component)
    14868    0.292    0.000   58.839    0.004 /home/alessandro/projects/N3PDF/dis/src/yadism/structure_functions/convolution.py:139(convolution)

from yadism.

felixhekhorn avatar felixhekhorn commented on August 22, 2024

see also

from yadism.

scarlehoff avatar scarlehoff commented on August 22, 2024

Without knowing what functions are you actually trying to compile:
numba doing worse than no-numba usually means you are compiling things that you shouldn't or that you are running through numba functions that do not need it.

As a general rule, try to have a function that take as argument as many things as you can and compile that one and pass everything as an argument.
i.e., ensure that you are not for instance compiling the call for each different value of the momentum.

Another possible pitfall is having intermediate objects that are not allowing numba to fully compile (basically, complicated python objects, where complicated I think starts at dictionary, can't remember right now though).

For more information I guess I'd have to read the code you are trying to numba.

from yadism.

alecandido avatar alecandido commented on August 22, 2024

Thanks, we thought that this could be the issue, and we will try to address in some way.

The main problem is that we are not really calling numba, inside yadism, but we are calling methods from eko that are numbified (BasisFunctions in particular), and as you said this will generate the issue.

from yadism.

scarlehoff avatar scarlehoff commented on August 22, 2024

Ah, right, there the structure/organization whatever of what is compiled and when is done thinking on the specific case for eko, which might be highly unoptimal here.

from yadism.

felixhekhorn avatar felixhekhorn commented on August 22, 2024

Well, I think not really - we're dealing with BasisFunctions here and there - they should compile once and forever (for a given configuration) - I don't know if we (in eko) really do this? and how can we actually check this?

from yadism.

scarlehoff avatar scarlehoff commented on August 22, 2024

A quick way is checking where nb.njit is called and printing "compiling!" just before.

(when used as a decorator it wouldn't be much of a problem because I think we only use it like that in a few functions that are very basic, nothing fancy there)

from yadism.

alecandido avatar alecandido commented on August 22, 2024

@scarlehoff we made a try and we discover that we are compiling once per BasisFunction, but the problem is that for the limited size of the required output it is not worth the effort, at this stage we are wondering what to do:

  • implement some kind of threshold (purely on our side, given to the user, we together with numba) to decide to compile or not to do it
  • parallelize compilation, but in order to do this we need eager compilation (while numba is lazy, of course) and find a way to tell python to continue even if compilation thread are going on their own until he hit a line when he need to evaluate function numba is compiling on another thread
    • or we can just make a compilation block for basis functions' compilation, when instantiating the InterpolatorDispatcher, or right after, and compile them in parallel waiting for the last one to finish

from yadism.

scarlehoff avatar scarlehoff commented on August 22, 2024

I think I put a flag to avoid compilation? Or maybe I just thought about putting it... I did it for vegasflow for sure :_

In any case, if compilation is a problem parallelizing just creates a problem in every thread, the right thing is not to compile.

(maybe I'm failing to understand the issue, but looking at the benchmark you linked before it looks like what you want is to avoid the compile, lowering the time from 1 minute will create more problems that it will solve)

from yadism.

felixhekhorn avatar felixhekhorn commented on August 22, 2024

relevant timings are:

  • /wo numba: 1.5s = 27s/17 = used time in log_evaluate_x / number of observables (=: nobs)
  • /w numba 2.5s = 145s/59 = used time in _compile_for_args / number of basis functions (=: n_bf)

so our current setup is

  • n_obs = 17 and n_bf = 59 and we are loosing 120s = 2.5*17 - 1.5*59
  • if we would have e.g. 180 observables, we would gain 120s

or more general, if and only if, we have n_obs/n_bf > 5/3 it is worth to run numba

The common number of data points in a DIS only PDF fit are ~ 3000

from yadism.

scarlehoff avatar scarlehoff commented on August 22, 2024

Ok, so you are "losing" time in the trial benchmark but "in real life" you would prefer to compile. Is that it?

If that's the case I would just avoid compilation during development (but keeping it in tests to ensure it works)

from yadism.

alecandido avatar alecandido commented on August 22, 2024

Ok, if all those points are going to be included in the relevant fits our threshold is for xgrids' size about 2000, so we can safely say that if you are going to use an xgrid with less than 1000 points it's useful to make numba compiling BasisFunctions.

from yadism.

alecandido avatar alecandido commented on August 22, 2024

relevant timings are:

* /wo numba: 1.5s =  27s/17 = used time in `log_evaluate_x` / number of observables (=: nobs)

* /w numba 2.5s = 145s/59 = used time in `_compile_for_args` / number of basis functions (=: n_bf)

so our current setup is

* n_obs = 17 and n_bf = 59 and we are loosing 120s = 2.5*17 - 1.5*59

* if we would have e.g. 180 observables, we would gain 120s

or more general, if and only if, we have n_obs/n_bf > 5/3 it is worth to run numba

The common number of data points in a DIS only PDF fit are ~ 3000

We updated the convolution function, using some clever caching in order to speed up. Because of this we also moved the trade off value, mainly because we optimized the number of calls to log_evaluate_x and the coefficient functions.

The trade off report updated:

  • /wo numba: 0.6s = 10s/17 = used time in log_evaluate_x / number of observables (=: nobs)
  • /w numba 2.6s = 157s/59 = used time in _compile_for_args / number of basis functions (=: n_bf)

Then the new trade off is: n_obs/n_bf > 5

Conclusion: still convenient, but exactly 3 times less...

from yadism.

alecandido avatar alecandido commented on August 22, 2024

Ok, the discussion on compiling eko with numba has been sufficiently investigated, and we also made some investigation about the chance of compiling coefficient functions, still with numba.

We concluded that with the current structure of the code is not possible to compile coefficient function inside python, because it's depending on external objects (quark_0 is a function, to be sum with quark_1 multiplied by eko's alpha, and they are three different functions to be joint), so we will keep python evaluation for the time being.

@scarlehoff if you have any idea about compiling sums and products of functions with numba it's perfect, but experimenting we found that it really does not like functional programming :)

from yadism.

scarlehoff avatar scarlehoff commented on August 22, 2024

it really does not like functional programming :)

Because it's python not js!

In any case, as I said, if timing is not a problem over-optimizing is not worth it.
But if you will need at some point better speed it is better to keep in mind the limitations of numba and slowly move away from passing lambdas around. In particular because eventually I want to work on making eko run on GPU* as the people from numba seem to be working heavily on GPU support now. This should be very good for eko.**

*personally I'm ignoring it for now but making eko run in GPU is in the "cool things to do" of my TODO list (cc @scarrazza, they claim to be compatible with AMD cards)

** @felixhekhorn (I'm using github as a slack chat at this point) about eko, maybe we need to have a chat at some point to decide priorities, whether we want to keep it on the side for now or whether we want to start pushing forward to have it for the summer meeting (I'll be busy for a few weeks on n3fit related stuff but will be much freer afterwards)

from yadism.

alecandido avatar alecandido commented on August 22, 2024

Probably if at the end speed will be an issue the solution of inlining lambdas is working out a proper computational graph with a suitable library.

Does this mean tensorflow?

from yadism.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.