Coder Social home page Coder Social logo

gundam-organization / gundam Goto Github PK

View Code? Open in Web Editor NEW
13.0 4.0 10.0 46.73 MB

GUNDAM, for Generalized and Unified Neutrino Data Analysis Methods, is a suite of applications which aims at performing various statistical analysis with different purposes and setups.

License: GNU Lesser General Public License v2.1

CMake 2.32% Makefile 0.04% C++ 66.98% Shell 14.39% C 13.62% Dockerfile 0.05% Cuda 0.23% JavaScript 0.49% CSS 1.88%
physics likelihood uncertainties neutrino particle-physics statistical-analysis

gundam's People

Contributors

anfe29 avatar casparschloesser avatar clarkmcgrew avatar cuddandr avatar jaafar-chakrani avatar jiayuji avatar lgiannes avatar nadrino avatar riccioc avatar sjdolan avatar suvorov21 avatar ulyevarou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

gundam's Issues

Compilation issue for C++17

While compiling get this error:
/sps/t2k/amunoz/work/gundam/repo/gundam/src/Applications/src/gundamFitReader.cxx: In function 'int main(int, char**)':
/sps/t2k/amunoz/work/gundam/repo/gundam/src/Applications/src/gundamFitReader.cxx:122:6: error: call of overloaded 'readObject(std::shared_ptr<TFile, gnu_cxx::S_atomic>::element_type*, , main(int, char**)::<lambda(TNamed*)>)' is ambiguous
122 | });
| ^
In file included from /sps/t2k/amunoz/work/gundam/repo/gundam/src/Applications/src/gundamFitReader.cxx:8:
/sps/t2k/amunoz/work/gundam/repo/gundam/src/Utils/include/GundamUtils.h:72:45: note: candidate: 'static bool GundamUtils::ObjectReader::readObject(TDirectory*, const std::vector<std::basic_string >&, const std::function<void(T*)>&) [with T = TNamed]'
72 | template inline static bool readObject( TDirectory* f
, const std::vectorstd::string& objPathList
, const std::function<void(T*)>& action
= {} ){
| ^~~~~~~~~~
/sps/t2k/amunoz/work/gundam/repo/gundam/src/Utils/include/GundamUtils.h:87:45: note: candidate: 'static bool GundamUtils::ObjectReader::readObject(TDirectory*, const string&, const std::function<void(T*)>&) [with T = TNamed; std::string = std::basic_string]'
87 | template inline static bool readObject( TDirectory* f, const std::string& objPath, const std::function<void(T*)>& action_ = {} ){ return readObject(f_, std::vectorstd::string{objPath_}, action_); }
| ^~~~~~~~~~
Whereas while compiling with C++14 there is no issue

Include the implementation of Fritsh-Carlson

Fritsch-Carlson splines, also known as Fritsch-Carlson monotone cubic splines, are a type of interpolation
method used to construct smooth and monotonic curves from a given set of data points. These splines were
introduced by Fritsch and Carlson in 1980 as an improvement over the traditional cubic spline interpolation.

Unlike traditional cubic splines, Fritsch-Carlson splines guarantee monotonicity between adjacent data points.
This means that the resulting curve will not exhibit unwanted oscillations or overshoots, which can occur with
other interpolation methods.

Fritsch-Carlson splines achieve monotonicity by modifying the interpolation algorithm based on the local
behavior of the data. The main idea is to ensure that the slopes of the spline segments preserve the
monotonicity of the data.

The algorithm for constructing Fritsch-Carlson splines involves the following steps:

  1. Calculate the slopes at each data point using a suitable method, such as finite differences or local regression.
  2. Adjust the slopes to enforce monotonicity, ensuring that they do not change the direction between adjacent data points.
  3. Construct cubic spline segments between each pair of adjacent data points, using the adjusted slopes to determine the shape of the curve within that interval.

By incorporating monotonicity constraints, Fritsch-Carlson splines provide a reliable and smooth interpolation technique, particularly suitable for datasets that exhibit monotonic trends or require smooth monotonic curves. These splines have found applications in various fields, such as computer graphics, data visualization, and numerical analysis.

Nominal weight formula option

Based on the main branch (checked at 2023.11.21), the traditional nominal weight formula

nominalWeightFormula: "weight"

does not work anymore. It works if the variable is defined in this way:

nominalWeightFormula:
  - "weight"

[Feature request] Generalizing variable expressions

As we are validating the latest OA with GUNDAM, the p-value study requires to select different truth variables depending on the neutrino event that was selected by the psyche toys.

One idea to generalise the event variable definition is to handle the TTree reading with a proper memory buffer. This class should handle the parsing of variable expression (using a TLeaf or a TTreeFormula) and point to the place in memory a TTree (via TBranch) has been asked to write the entry data.

This will allow the user to define abitrary expressions for the variables he needs for the engine to run.

Screenshot 2023-09-12 at 15 31 31

Some versions of NVCC are being given invalid compilation options

The way NVCC handles command line options has apparently changed between 11.2.62 and 11.2.152. In the later version (152), some compile options (e.g. -Wall) added in the CMakeLists.txt file are accepted, but they cause errors in the earlier version (62). An untested fix may be to change add_compile_option(blah) to add_compile_option($<COMPILE_LANGUAGE:CXX> blah).

Finish USE_NEW_DIALS implementation in Cache::Manager

The USE_NEW_DIALS changes (e.g. generalized dials) have not been fully implemented in Cache::Manager, so this functionality has been lost. All that needs to be done to add it is to finish the Cache::Manager::Build implementation which copies the spline information to the GPU.

Save MCMC information needed to calculate the PPP

All of the information needed to calculate the PPP for a chain is created when as the chain is being generated, but isn't being saved into the output file. All that will need to be saved is the data "histogram", and the MC expectation "histogram" (contents and errors) for a sub-sample of the steps.

1.7.0 crash because of the configScan

When we add "llhStatPerSamplePerBin: true" and/or "weightPerSamplePerBin: true" in the configScan.yaml file, the 1.7.0 version crashes before doing the breakdown event.

Links between wiki pages are broken

The wiki page links are working during the edit preview, but in the "live" version of the wiki they result in a 404.

My guess is that either the format of the links being used are wrong, or there is some setting needed to make the wiki visible.

DialBase uses extra vtable pointer

The DialBase class tree is using multiple-inheritance as part of the visible public interface. Besides exposing the internal implementation to other libraries, this adds an extra VTABLE pointer to the object (this class is speed and size critical).

The class tree can be reimplemented using single inheritance, while maintaining the same visible public interface. Transitioning to single-inheritance will also let us eventually make the visible interface more concise.

Proposed Changes:

  1. Move the "handler" implementation into the main dial classes (e.g. GeneralSplineHandler and GeneralSpline get merged). This will eliminate the use of multiple inheritance
  2. Change DialBaseCache into a "curiously recurring template" that moves the cache implementation from a base class to the top of the class tree (and hides the implementation).

I have a testbed implementation essentially finished, and will push it for discussion as soon as I have time.

Test coverage for DialCollection code paths

Some of the code paths for DialCollection are not covered by the current tests (mostly the binned spline paths). This issue is to remind us (me!) that we need to make sure all of the (important) "dialSetDefinition" YAML options have a test. The code paths being flagged for checks are are in DialCollection.cpp.

Cache::Manager miscalculation predicted model statistical error

The statistical error on the filled MC histograms is being overestimated by the Cache::Manager filling. This only affects the Cache::Manager (so mostly the GPU, and mostly the MCMC which is too slow without the Cache::Manager). This showed up will calculating the PPP for the MCMC posteriors, but would change the Minuit fit if the Cache::Manager is used.

Note: The cache manager weight calculation exactly matches the propagator weight calculation. This has a small effect on the LLH when Barlow-Beeston likelihoods are used, but not on the expected bin rates.

Removing deprecated old Dial class

As the dial classes are now handled separatly from the ParameterSet lib, we could safelty remove the old implementation.

This would imply to remove all the USE_NEW_DIALS checks along with the code

EventDialCache thread safety

The EventDialCache locks don't ensure thread-safety. It should either be made thread-safe, or made to explicitly crash when it cannot maintain thread safety.

The EventDialCache is serving bare pointers to the indexed cache entries in fetchNextCacheEntry() which applies a lock so it can get a new index into the vector. But, it can also allocate a new indexed cache entry and that might force the cache vector to be moved which will invalidate the pointers being used in any other thread accessing the cache.

The fetchNextCacheEntry allocation is mostly prevented by the preallocation of the cache in DataDispenser::preAllocateMemory. The check should probably be changed into a trap against capacity and throw if the indexed cache vector needs to be resized. An alternative would be to change to a collection that won't invalidate the pointer (e.g. forward_list).

Set target C++ standard in CMakeList.txt

This needs to compile with several different versions of gcc, each with a different default version of c++. Support for C++17 exists since gcc5, so that's probably the best target. I don't see any compiler with full C++20 support, so it's off the table.

Implement Catmull-Rom splines for non-uniform point spacing

The current Catmull-Rom implementation (CalculateCompactSpline.h and CalculateMonotonicSpline.h) are specifically targeted at minimizing the amount of memory required, and they add the constraint that the knots need to be uniformly spaced. That's not required by the Catmull-Rom spline slope calculation.

We need a slope calculation that will apply the Catmull-Rom slope calculation independent of the restricted implementations.

Joint probability Barlow-Beeston divide by zero

There is a (rare) special case in all of the OA2021 Barlow-Beeston joint probability calculations that results in a divide-by-zero. It happens when the data and predicted values are exactly equal. I've added a divide-by-zero check for now, but this needs a more careful look after the collaboration meeting

[I think this shows up in the MCMC because it calls the likelihood so many times for values well away from the minimum]

Towards dials that require event variables to evaluate

Starting here a thread about implementing dials that need PhysicsEvent stored variable to evaluate.

As an example, the oscillation (survival) probability for SBL analysis:
$P(\nu_e \rightarrow \nu_e) = 1 - \sin^2(2\theta_{14})\sin^2\left(\Delta m^2_{41}L/4E\right)$.

Two fit parameters need to evaluate this formula $\theta_{14}$ and $\Delta m^2_{41}$, while $E$ is the neutrino energy which is different for each event.

DialInputBuffer seems to be the right candidate to handle the different inputs.

[TO BE CONTINUED]

MnSeedGenerator Negative G2 found

I am running the OA2021 data fit using the develop branch of Gundam. Gundam could not find the minimum and didn't run HESSE. The message printed out was:

12/09/2023 16:43:44 [FitterEngine] FitterEngine.cpp:458: Skipping post-fit error calculation since the minimizer did not converge.

and the error message was:

12/09/2023 16:19:43 [LikelInfo in : MnSeedGenerator Negative G2 found - new state:

If I run v1.7.2 the fit converge and HESSE is run but the error message just above (MnSeedGenerator Negative G2 found) is print anyway.

Enable multithreaded CPU calculations with hemi

The HEMI code used by the Cache::Manager calculation is structured for a single CPU thread, but that's because the single-thread branch was used to make debugging easier. The multi-CPU flavor is a bit simple (not much control), but it would be easy to adapt for our needs.

Complete and edit GUNDAM v1.7.0 release note

The draft has been created under Releases. We need to

  1. Finish writing the release notes
  2. Make sure they are complete
  3. Edit for grammar
  4. Create the tag
  5. Make the release!

I've assigned this to a wide cast of characters, but I'd guess @nadrino and I are the leads.

I think we should wait on sign-off from @LenaOsu and @ulyevarou. There is currently an on-going test with @LenaOsu to isolate errors that are seen.

buildDial() virtual methods in DialBase

Is it absolutly necessary that we predefine those buildDial virtual methods?

With this I understand we don't need to recast to the derived class in order to call the specific instance. However, I think buildDial() itself makes no sense in DialBase as it introduces unused dependencies (like #include "TGraph.h"). Also this means each new derivative would required to add a specific instance of the virtual method.

Wouldn't it be better to actually cast back to the derived class within the dial factories and use specific setters and buildDial method instead?

/// Build the dial with no input arguments. Mostly here for completeness!
virtual void buildDial(const std::string& option_="") {throw std::runtime_error("Not implemented");}
/// Build the dial using up to up to three vectors of doubles. Mostly
/// an internal tool to make spline and graph builders work the same.
virtual void buildDial(const std::vector<double>& v1,
const std::vector<double>& v2,
const std::vector<double>& v3,
const std::string& option_="") {throw std::runtime_error("Not implemented");}
/// Build the dial using a graph (usually a leaf in the input file
virtual void buildDial(const TGraph& grf, const std::string& option_="") {throw std::runtime_error("Not implemented");}
/// Build the dial using a spline (usually a leaf in the input file.
virtual void buildDial(const TSpline3& spl, const std::string& option_="") {throw std::runtime_error("Not implemented");}
/// Generic builder
virtual void buildDial(void* inputDataPtr_, const std::string& option_="") {throw std::runtime_error("Not implemented");}

Add the ability to load data histograms from an external file

As we are trying to reproduce a particular toy fit from the BANFF, the only way to get the same pseudo-experiment loaded is to dumb and load the histogram content from BANFF output file.

This will be implemented by adding a feature in the DatasetLoader. The BANFF toy fit will be loaded as a "data fit" not as a "toy fit" as no throwing is required.

Make Calculate[]Spline function interfaces uniform

The Calculate{Monotonic,Compact,Uniform,General}Spline functions all define an input parameter named dim, but it means different things for each of the functions. Specifically

  1. CalculateMonotonicSpline -- dim means the number of knots
  2. CalculateCompactSpline -- dim means the number of knots
  3. CalculateUniformSpline -- dim means the number of elements in the input array (it is 2+2*knots)
  4. CalculateGeneralSpline -- dim means the number of elements in the input array (it is 2+3*knots)

The least surprising interface will be to have dim always be in the input array size. That would mean that for Monotonic and Compact, dim will be changed to be equal to (2+knots).

Software development policy

I wrote a paragraph on the software policy in the TN on the validation but now I think a better place is the home page of GUNDAM so that every user can be aware of the few rules we have agreed on internally.

[Critical] 1.7.0 does not compile with the c++ 14

1.7.0 does not compile with the c++ 14 and ROOT6 builds 14 or later. The problem is in simple-cpp-logger.

Scanning dependencies of target GundamUtils
[  1%] Building CXX object src/Utils/CMakeFiles/GundamUtils.dir/src/GlobalVariables.cpp.o
In file included from /home/riccioc/gundam/v1.7.0/submodules/simple-cpp-logger/include/Logger.h:278,
                 from /home/riccioc/gundam/v1.7.0/src/Utils/src/GlobalVariables.cpp:6:
/home/riccioc/gundam/v1.7.0/submodules/simple-cpp-logger/include/implementation/Logger.impl.h: In static member function ‘static void {anonymous}::Logger::setIsMuted(bool)’:
/home/riccioc/gundam/v1.7.0/submodules/simple-cpp-logger/include/implementation/Logger.impl.h:57:5: error: ‘_isMuted_’ was not declared in this scope; did you mean ‘isMuted’?
   57 |     _isMuted_ = isMuted_;
      |     ^~~~~~~~~
      |     isMuted_

Fix dial type interface before "public" release

In the next release, the internal dial interface will more flexible and efficient. The YAML dial interface needs to be upgraded to support the new functionality in a way that describes the functionality of a dial, not how it is implemented.

Background: The existing (1.6.1 and before) interface supported a limited number of dial types (e.g. normalization, spline, graph) while the new dial interface can support multiple types of splines, and even multi-dimensional dials. For debugging, the user interface has been tied very closely to the dial implementation, but this will become unmaintainable as the number of implemented dial types increases.

Proposal: Design a YAML dial interface that describes how a dial should behave, but which is not closely tied to the implementation. (Comments will include proposal options)

Fixing X-section calculator app

With the introduction of the new dial engine, events don't keep a reference of the list of dials their weight will be affected by.

Therefore events from the truth samples can't be associated by simple copies as the propagation engine wouldn't reweight them anymore.

This fix will lead into a redesign of the gundamCalcXsec app, taking advantage of the override feature of config introduced in 1.7.1. This will allow to reload the events in the propagator with the same data loader as the fitter.

Clean up unused code

There is code that was imported from the xsLLhFitter that is not used and can be removed.

Get rid of the parallel workers defined in the global scope

In order to handle the parallelisation, I implemented parallel workers that would execute lambdas of each class where parallelisation could be used.

These parallel workers have been implemented in the GlobalObjects namepace for making sure every class would have access to it. However, whenever debugging is required global object are difficult to track.

The only parameter one need to have in global is the number of threads GUNDAM would define in a runtime.

Record parameter physical bounds seperately from allowed range

Since parameters can be extended into non physical regions and still give mathematically (if not physically) correct results, we should keep track of the range where the parameter is defined separately from the range where it is physically meaningful. For example a two neutrino disappearance can be fit with (alpha*sin^2(1.27*L/E)) where alpha is interpreted as sin^(2*theta). The function is physical for alpha between zero and one, but defined for -inf to inf.

The physical range for a parameter should be saved separately from the range over which the parameter can be defined. This will be useful when looking at the difference between frequentist and Bayesian results (i.e. comparing MINUIT and MCMC results). It will also give us tools to speak about where the likelihood is defined as mathematical function versus where it is physically meaningful.

I propose adding new fitParameter fields analogous to parameterRange and parameterLimits called physicalRange and physicalLimits. The parameter range fields will be documented as the domain over which the model is defined, and the physical range fields will be documented as the domain over which the model is physically meaningful.

Note: The range over which the model is mirrored is a third range. It's already separately defined by the dials and accessible through the FitParameter objects.

Poisson likelihood evaluated with wrong formula

The function to compute the Poisson likelihood PoissonLLH::eval inside src/FitSamples/src/JointProbability.cpp is using a formula which employs the Gamma function (likely to compute a factorial). However, since we are actually computing a likelihood ratio, these factorials cancel out and there is no need for a Gamma function.

The current implementation returns a nonzero chi2_stat for Asimov fits, when the number of "MC" and "data" events is identical. It should be changed back to this:
https://github.com/nadrino/gundam/blob/78bf753189f6bb5155c0abb79219b06002146155/src/Utils/include/Likelihoods.hh

Documentation

Let me start a broader discussion using the issues tracker. Since we have the GitHub wiki I'd consider moving our documentation there as we did for BANFF. Thoughts are welcome.

Bug with CompactSpline eval

I've made a few tests and printouts to splot where the evaluation is failing when using CompactSplines. Here is the printout.

It seems that the code is looking for the element of the input array data[4] while its size is only 4.

@ClarkMcGrew I don't understand very well all these numbers, do you know what are they supposed to do?

13/05/2023 15:34:24 [EventDialCache]: Invalid dial response:
13/05/2023 15:34:24 [CompactSpline]: d43 = -nan
13/05/2023 15:34:24 [CompactSpline]: data[2+d43_1] = -nan
13/05/2023 15:34:24 [CompactSpline]: dim = 4
13/05/2023 15:34:24 [CompactSpline]: d43_1 = 2
13/05/2023 15:34:24 [CompactSpline]: d43_0+1 = 2
13/05/2023 15:34:24 [CompactSpline]: ix+1 = 1
13/05/2023 15:34:24 [EventDialCache]: CompactSpline: input({ Cross-Section Systematics/#57_MPi_Multi_Shape=0 }) supervisor(minResponse=0, maxResponse=100) response=-nan
13/05/2023 15:34:24 [EventDialCache]: CompactSpline: spline data = { 0, 1, 1, 1.667 }
13/05/2023 15:34:24 [EventDialCache]: CompactSpline: defined bounds = { 0, 1 }
13/05/2023 15:34:24 [EventDialCache]: CompactSpline: allow extrapolation ? 1
13/05/2023 15:34:24 [EventDialCache]: CompactSpline: EVAL MANUAL at 0:-nan

Provide generalized interface to the spline calculation

In future oscillation analysis fits, we need to use more general spline implementations (e.g. monotonic splines for detector systematics). There is an urgent request from the OA to have this implemented by "last week" so it can be used in the upcoming analysis.

I've created an RFC for a possible design and placed it on the (newly created) gundam wiki.

Spline point uniformity causes exceptions

Inputs expecting Catmull-Rom rom splines are currently failing because the code is expecting that the input point spacing is uniform with a tolerance of ~1e-8. However, we see points out of tolerance by a few parts per million. Two paths can be followed

  1. Do not throw an error and only print messages
  2. Allow the user to set a tolerance
    My current bias is to follow both for 1.7.0, and then revert to throw an error for later releases (e.g. 1.7.1 or later).

Validate CompactSpline with two knots

The compact spline is used when there are only two knots since it is the most efficient spline calculation. We have suspicions that it might not be working as expected, so there needs to be a check that a two knot compact spline is equivalent to a line.

Syntax error, apparently in Logger.h

Still need to look at the code, but here's the message

/home/mcgrew/work/projects/gundam/submodules/simple-cpp-logger/include/Logger.h:219:38: error: in-class initialization of static data member 'std::unordered_set<void*> {anonymous}::Logger::onceLogList' of non-literal type

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.