Coder Social home page Coder Social logo

sweights / sweights Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 4.0 369 KB

Tools for producing sweights using classic methods or custom orthogonal weight functions (COWs) and for correcting covariance matrices for weighted data fits.

License: MIT License

Python 99.50% Makefile 0.50%

sweights's Introduction

sweights

pip install sweights

We provide several tools for projecting component weights ("sweights") in a control variable(s) using a discriminating variable(s), this includes using the traditional sPlot method and also Custom Orthogonal Weight functions (COWs). For details of these please see Dembinski, Kenzie, Langenbruch, Schmelling - arXiv:2112.04574 - published as NIM A 1040 (2022) 167270.

Please cite as:

Dembinski, H., Kenzie, M., Langenbruch, C. and Schmelling, M., ``Custom Orthogonal Weight functions (COWs) for event classification", NIMA 1040 (2022) 167270

We also provide tools for correcting the covariance matrix of fits to weighted data. For details please see Dembinski, Kenzie, Langenbruch, Schmelling - arXiv:2112.04574 (Sec IV) and Langenbruch - arXiv:1911.01303.

Documentation

Please head over to sweights.readthedocs.io for the latest documentation.

sweights's People

Contributors

matthewkenzie avatar

Stargazers

Xudong Yu avatar Giulio Dujany avatar Michael Eliachevitch avatar

Watchers

 avatar Hans Dembinski avatar Peter Hurck avatar

sweights's Issues

API (re)design

This is incomplete, but it is a start.

General principles

Moving to functional API

The current API uses an object-oriented approach, I propose to move to a functional API. Objects are great for data objects, but not well suitable for transformation algorithms. For transformations (data goes in, data goes out), it is natural to use functions. Functions are easier to reason about, since they do not have state, and they communicate clearly that you can run them in parallel and that they do not have other hidden dependencies. Since objects are so powerful, all this is not so clear when you have an object. In C++, it sometimes makes sense to use an object (functor), because C++ does not have keyword based arguments and passing a lot of positional arguments gets confusing. So people sometimes use classes in C++ if you need to set a lot of parameters for an algorithm, you can do that more elegantly with methods. In Python, this is not necessary. In this language it is normal for functions to have potentially lots of optional keyword-based arguments.

Examples of popular libraries with a functional API in Python: numpy, scipy, matplotlib.

Low-level and high-level functions

We want to accommodate casual and power users. For casual users we need high-level functions that take inputs and produce directly what most people need. These high-level functions should call a bunch of low-level functions which we also include to accommodate power users. Power users want to be able to replace parts of the calculations with their own stuff or have advanced use cases where they may want to avoid superfluous calculations that the high-level functions may do.

Here is a rough example. To compute sweights, we need the W matrix. So there should be a high-level function that takes the component pdfs, an array containing the discriminating variable and then returns an array with the sweights. There should also be a low-level function that just computes the W matrix, which is called internally by the high-level function.

Problems with COWs

Hello,

I am considering to use COWs for a case when the discriminating variable correlates with the control variable, however I have encountered a problem when the discriminating variable has negative values.
Even if only lower bound of mrange argument is negative, then all values of W-matrix are nan and the computation fails with the following exception:

/cvmfs/belle.cern.ch/el7/externals/v01-12-01/Linux_x86_64/common/lib/python3.8/site-packages/sweights/cow.py:112: IntegrationWarning: The maximum number of subdivisions (50) has been achieved.
  If increasing the limit yields no improvement it is advised to analyze 
  the integrand in order to determine the difficulties.  If the position of a 
  local difficulty can be determined (singularity, discontinuity) one will 
  probably gain from splitting up the interval and calling the integrator 
  on the subranges.  Perhaps a special-purpose integrator should be used.
  N = quad(f, *self.mrange)[0]
/cvmfs/belle.cern.ch/el7/externals/v01-12-01/Linux_x86_64/common/lib/python3.8/site-packages/sweights/cow.py:124: RuntimeWarning: divide by zero encountered in divide
  return self.gk[k](m) * self.gk[j](m) / self.Im(m)
Initialising COW:
/cvmfs/belle.cern.ch/el7/externals/v01-12-01/Linux_x86_64/common/lib/python3.8/site-packages/sweights/cow.py:127: IntegrationWarning: The maximum number of subdivisions (50) has been achieved.
  If increasing the limit yields no improvement it is advised to analyze 
  the integrand in order to determine the difficulties.  If the position of a 
  local difficulty can be determined (singularity, discontinuity) one will 
  probably gain from splitting up the interval and calling the integrator 
  on the subranges.  Perhaps a special-purpose integrator should be used.
  return quad(integral, self.mrange[0], self.mrange[1])[0]
    W-matrix:
	[[nan nan]
	  [nan nan]]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [291], line 10
      8 mrange = (binning[0], binning[-1])
      9 Im = 1
---> 10 cw = Cow(mrange, sig_pdf_obj, bkg_pdf_obj, Im, renorm=True, verbose=True)
     11 sweighter = SWeight(test_mc_df[c_var].values,
     12                     [sig_pdf_obj,bkg_pdf_obj],
     13                     [len(test_mc_df)*1.0-len(app_bkg_df),len(app_bkg_df)*1.],
   (...)
     16                     compnames=('sig','bkg'),
     17                     verbose=True, checks=False )

File /cvmfs/belle.cern.ch/el7/externals/v01-12-01/Linux_x86_64/common/lib/python3.8/site-packages/sweights/cow.py:105, in Cow.__init__(self, mrange, gs, gb, Im, obs, renorm, verbose)
    102     print("\t" + str(self.Wkl).replace("\n", "\n\t "))
    104 # invert for Akl matrix
--> 105 self.Akl = linalg.solve(self.Wkl, np.identity(len(self.Wkl)), assume_a="pos")
    106 if verbose:
    107     print("    A-matrix:")

File ~/.local/lib/python3.8/site-packages/scipy/linalg/_basic.py:140, in solve(a, b, sym_pos, lower, overwrite_a, overwrite_b, check_finite, assume_a, transposed)
    137 # Flags for 1-D or N-D right-hand side
    138 b_is_1D = False
--> 140 a1 = atleast_2d(_asarray_validated(a, check_finite=check_finite))
    141 b1 = atleast_1d(_asarray_validated(b, check_finite=check_finite))
    142 n = a1.shape[0]

File ~/.local/lib/python3.8/site-packages/scipy/_lib/_util.py:287, in _asarray_validated(a, check_finite, sparse_ok, objects_ok, mask_ok, as_inexact)
    285         raise ValueError('masked arrays are not supported')
    286 toarray = np.asarray_chkfinite if check_finite else np.asarray
--> 287 a = toarray(a)
    288 if not objects_ok:
    289     if a.dtype is np.dtype('O'):

File ~/.local/lib/python3.8/site-packages/numpy/lib/function_base.py:627, in asarray_chkfinite(a, dtype, order)
    625 a = asarray(a, dtype=dtype, order=order)
    626 if a.dtype.char in typecodes['AllFloat'] and not np.isfinite(a).all():
--> 627     raise ValueError(
    628         "array must not contain infs or NaNs")
    629 return a

ValueError: array must not contain infs or NaNs

One can obtain this result with the tutorial notebook by setting the nrange to (-1,1).

Make some proper documentation

At the moment documentation just exists in the README file and via comments in the code. Would be good to make this a bit more sophisticated with sphinx -> readthedocs or similar.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.