Coder Social home page Coder Social logo

danielwinterbottom / ichiggstautau Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 28.0 5.37 GB

CMSSW package for Imperial group analysis code

Home Page: http://danielwinterbottom.github.io/ICHiggsTauTau

Makefile 0.07% C++ 37.08% C 6.12% Shell 4.55% TeX 0.13% Python 52.05%

ichiggstautau's People

Contributors

adewit avatar ajgilbert avatar albertdow avatar amagnan avatar danielwinterbottom avatar gputtley avatar mhassans avatar padraic-padraic avatar uttleygp avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

ichiggstautau's Issues

Re-integrate the electron conversion veto flag into this package

It is currently calculated and provided (as a ValueMap) by this external package: https://github.com/ajgilbert/ICAnalysis-ElectronConversionCalculator

At the time this was due to compiler problems in CMSSW_4_X_Y and early CMSSW_5_X_Y releases.

Someone should check:

  • if the two plugins defined in this package compile ok in our current supported releases
  • If not, can we use the #if CMSSW_MAJOR_VERSION >= X preprocessor checks to work around this
  • Is there a new definition/algorithm for the conversion veto in CMSSW_7_X_Y for run 2? If so, this should be added as a new plugin that produces a ValueMap in the same format.
    • Related to this: check how this is calculated in PAT/MINIAOD in 7_3_X/7_4_X and make sure we know how to produce it consistently in AOD directly on the gedGsfElectrons
  • Is there a possibility to re-calculate on the level of miniAOD? (probably not, but worth knowing)

Clean up of FnPredicates and FnPairs

Inspired by #26 - should go through FnPredicates and FnPairs and:

  • remove old functions that aren't used anywhere (and won't be used again)
  • organise and document functions, ideally with links to twiki pages or other documentation for ID/iso selectors. Should aim to replicate style of Plotting.h, which leads to
    this

This can be considered a low priority task :-)

Corrupted double-linked list

If running ./bin/HTT on a local file with EventChecker enabled it will run to the end and write the output tree, then crash complaining about a corrupted double-linked list. This never happens when running off files on dcache, or when running on a local file without using EventChecker.
So there is some memory problem, though atm I fail to understand why this is only a problem under these very specific circumstances.

Anyway, low priority, but one to fix should we run out of stuff to do.

New electron ID variable

Appears the new cut-based electron ID has a new variable we don't save at the moment:

https://twiki.cern.ch/twiki/bin/viewauth/CMS/CutBasedElectronIdentificationRun2#Working_points_for_2016_data_for

dEtaInSeed => appears to be calculated as:

http://cmslxr.fnal.gov/source/RecoEgamma/ElectronIdentification/plugins/cuts/GsfEleDEtaInSeedCut.cc?v=CMSSW_8_0_21#0030

also same definition here:
https://twiki.cern.ch/twiki/bin/view/CMS/HEEPElectronIdentificationRun2#Selection_Cuts_HEEP_V5_1

If we want to add this should check it works ok on miniaod (should do), i.e. that:

ele->superCluster().isNonnull() && ele->superCluster()->seed().isNonnull()

are both non-null.

Event numbers for CheckEvents

Currently CheckEvents reads the events to be checked from within HTTSequence, which means that you have to recompile every time you add/remove events numbers. Should be modified so that it can read from an external file to

Memory leak (in GetPtr/GetPtrVec?)

I'm not sure if anybody else is affected by this, but as I can't figure out how to solve this problem...

This is what happens:

  • When running HTT.cpp with more than 39 input files the job crashes throwing a bad_alloc. The same thing happens when running HiggsTauTau.cpp after 35 input files.
  • When I added a dryrun through the filelist to determine the needed vector size and reserved enough space in the vector of files, the job crashes at the same file, now spitting out a bunch of these errors:
R__unzipLZMA: error 5 in lzma_code
Error in <TBasket::ReadBasketBuffers>: fNbytes = 4312, fKeylen = 91, fObjlen = 30012, noutot = 0, nout=0, nin=4221, nbuf=30012
Error in <TBranchElement::GetBasket>: File: root://xrootd.grid.hep.ph.ic.ac.uk//store/user/adewit/July08_MC_74X/DYJetsToLL_M-50_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8/crab_DYJetsToLL-2/150710_084817/0000/EventTree_964.root at byte:187660, branch:genParticles.pdgid_, entry:2, badread=1, nerrors=1, basketnumber=0

before throwing a bad_alloc

  • I ran with valgrind but I'm not sure the output is hugely useful, for example
==10381== 7,024 (80 direct, 6,944 indirect) bytes in 2 blocks are definitely lost in loss record 269,416 of 269,644
==10381==    at 0x4806FB5: operator new(unsigned long) (in /cvmfs/cms.cern.ch/slc6_amd64_gcc481/external/valgrind/3.10.0/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==10381==    by 0x4935D60: std::vector<ic::Tau*, std::allocator<ic::Tau*> >& ic::TreeEvent::GetPtrVec<ic::Tau>(std::string const&, std::string) (in /vols/cms04/amd12/CMSSW_7_2_0/src/UserCode/ICHiggsTauTau/Analysis/HiggsTauTau/lib/libICHiggsTauTau.so)
==10381==    by 0x49DB4F7: ic::SimpleFilter<ic::Tau>::Execute(ic::TreeEvent*) (in /vols/cms04/amd12/CMSSW_7_2_0/src/UserCode/ICHiggsTauTau/Analysis/HiggsTauTau/lib/libICHiggsTauTau.so)
==10381==    by 0x4B13F1F: ic::AnalysisBase::RunAnalysis() (in /vols/cms04/amd12/CMSSW_7_2_0/src/UserCode/ICHiggsTauTau/Analysis/Core/lib/libICCore.so)
==10381==    by 0x40DF11: main (in /vols/cms04/amd12/CMSSW_7_2_0/src/UserCode/ICHiggsTauTau/Analysis/HiggsTauTau/bin/HTT)

This suggests GetPtrVec is causing a memory leak. I don't see what the issue with it is though. Has anybody else seen this before/any ideas where else to look for the problem?

Package is broken in 7_6_0...

... and I mean really broken:

  • this doesn't compile anymore:

      edm::RefToBaseProd<T> reftobase = edm::RefToBaseProd<T>((handle->refAt(0)));
    
  • we are now forced to use the edm::consumes mechanism. Info here and here. This isn't available in 5_3_X so will need some sort of workaround - ideally without a huge amount of #ifdef'ing.

  • Need to check if CMSSW producers work ok in multi-threaded mode. Is this enabled by default?

Large number of SVFit jobs with new workflow

For @ajgilbert and @adewit, it is clear that the number of events in the output trees is now a lot larger than it was in Run 1, due to moving cuts like isolation to post output tree. This poses a problem for our SVfit workflow, which currently requires more than 10 000 jobs for the full set of MC samples (if I use 7000 events per job, which I think was what we had in Run 1 and still makes the jobs several hours long), which is only going to get larger as we add the exclusive samples. The question is whether we have to live with this and we simply risk having to wait longer for SVFit jobs (given that so many of our cuts are now at ntuple level, in principle we could have to run this less often, although this has not necessarily reflected reality so far with the fairly frequent changes of triggers and other pre-output ntuple level choices), or whether there could be value to altering our workflow to apply some preselection and only run the calculation for a subset of the events in the tree (and when creating the tree, fill the SVFit mass branch with -999 for the events for which there is no calculation). Thoughts? Experiences of trying to get >10000 jobs of that length through any kind of batch system?

Need to rename Analysis/*/data directories

We currently store various root files and inputs that are needed for the analyses here (JEC files, mva trainings etc). Unfortunately it turns out that crab scans recursively for any directory named data under $CMSSW_BASE/src and ships this off with each job. This wastes storage space and time packing and sending these files to the crab server. I suggest we rename the folder from "data" to "input" to avoid this

Treatment of GenParticles in miniAOD

Have to deal with two separate collections:
https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookMiniAOD#MC_Truth

The prunedGenParticles are a normal reco::GenParticle collection and should contain everything we need in the analysis (matrix element, full tau chain decay, final state electrons/muons). We can just save this as normal with ICGenParticleProducer.

The packedGenParticles are the new pat::PackedGenParticle type and contain all status 1 up to some high rapidity value, and are mainly for clustering gen jets.

Plan A:

  • Would like to be able to produce an ic::GenParticle from a PackedGenParticle. Can either template ICGenParticleProducer on the type, or write a brand new producer
  • As pointed out on the miniAOD twiki some care must be taken with following mother/daughter relations that span the two collections. Currently it wouldn't be possible to do this is we saved two separate ic::GenParticle collections. So if we want to be able to do this have to come up with a recipe - possibly trying to merge the two collections into one before we write it

Plan B:

  • decide we don't care about the packedGenParticles at all in the analysis and don't bother with any of this

Jet flavour calculator for miniAOD reclustered jets

The jet flavour calculator for reclustered miniAOD jets fails because of course the prunedGenParticles don't contain all status 2 and 3 particles. I don't think there's much we can do about this as these particles are simply dropped from the event. If we did need the jet flavour for reclustered jets we could match them to the slimmedJets collection and use the stored flavour (if the reclustered jets are ak4 CHS too).

handling event weights in aMC@NLO (or other NLO generators)

Two issues here:

  • Need to extract signed event weight for NLO MC events (until now we haven't needed this). These weights also have a magnitude, such that summing over all weights in a sample gives the NLO xsec calculated by the generator. We probably don't care about the magnitude, and these numbers are often superseded or replaced by a higher order calculation. Therefore I'd propose we only store the sign of the weight. From a filling-histograms point of view it's also easier to then normalise to luminosity. Need to adapt ICEventInfoProducer to extract this weight from the LHE.
  • Not as urgent, but still interesting, is that the MadGraph5_aMC@NLO samples should contain additional weights that account for systematic variations, e.g. there should be weights for shifts of the renormalisation and factorisation scales. Could be useful to add an option to store these too - we already have the ability to add a weight in ic::EventInfo but have it disabled in the total_weight() calculation. In the analysis doing the systematic shifts would then be as simple as switching the desired weight on.

I found these slides useful in explaining the details:
https://indico.cern.ch/event/388914/contribution/0/material/slides/0.pdf

I will try and look into this next week, but if someone else wants to get started in the meantime please go ahead.

ICGenParticlePruner doesn't copy status flags

@adewit @rcl11 While working with the new status flags I found that our ICGenParticlePruner doesn't keep the status flag information. The ICGenParticlePruner seems to be a copy of an older version of the cmssw GenParticlePruner, the new version of which does keep the status flags. I've switched from ICGenParticlePruner to GenParticlePruner in my config and not seen any issues.

Need to revisit job splitting

In the latest of IC batch improvements we can now only use < 1/3 of the short queue at the same time, so need to do something slightly cleverer with the hundreds of jobs that we have at the moment... Or face half day-long waits to rerun everything.
I haven't got any ideas (or time to mess about with it), but definitely something to sort out in the next few months.

JECs in CMSSW 7_6_X

Not very urgent but something I noticed when testing the code in CMSSW 7_6:
The L1FastjetCorrectionESProducer (which we currently use to apply JECs to reclustered jets) doesn't work anymore. Haven't found a straightforward workaround/alternative yet, though I'm almost certain there must be one. If not I think we can fix the module ourselves as the only reason it doesn't work is that getByLabel is used without a consumes call, which shouldn't be too hard to add in.

Code updates towards future analyses

Non-exhaustive list of code updates that will be needed in the next 3-4 months, towards being able to analyse the full 2016 dataset and future analyses

CMSSW-facing part of the code/CMSSW config:

  • Switch to at least CMSSW 8_0_20
  • Switch off filtering mode of met filters
  • Add new recipe for updating T1 corrected PFMet / extracting covariance matrix
  • Check if there are any other updated ID's and store extra variables where necessary (e.g. #161 but possibly other cases too)
  • Clean up config (which still contains snippets of code for running on AOD even though we do not need this anymore/possibly loads of other unused code)
  • Test the on-the-fly miniAOD generation (in case we do want to run on AOD, though we probably won't want to do this any time soon so not urgent)
    (update 25/11):
  • Rewrite jet producer (can drop some of the jet sources/remove support for calo and jpt jets --> can get rid of the jetSrcHelper and jetDestHelper. Could take a little while to do)

Analysis code:

  • Remove 8 TeV code (already in progress)
  • Adapt for full 2016 rereco dataset (+ MC when it becomes available)
  • Apply tau energy scale shift to the full tau collection before selecting taus (more correct than current implementation). This requires gen matching of all reconstructed hadronic taus, not just the ones part of a selected pair, at the start of the chain.
  • Rewrite plotting code (to make more flexible and understandable)
  • Include option for making 2D datacards/plots in case this remains the norm in H->tautau
  • Implement fake rate method
  • Many other analysis modules could probably be rewritten to run faster (I suspect the PairGenInfo module could be more efficient), but probably not urgent

Need to use FileBased splitting with 1 file per job for ntuple production when including mvamet

_NOT AN ISSUE ON OUR SIDE_ but still relevant to anyone trying to use the code to produce ntuples with mvamet included (so mainly @pjdunne and @amagnan) - in CMSSW_7_4_12 (and _15), puJetIdForPFMVAMEt crashes if you try to run on more than one file in the same job.
The solution (for now) is to use FileBased splitting and run one file per job - this behaviour has been flagged up to the puJetID people, will report back when I know more.

Issues blocking 5_3_7 OOTB

Compilation:
/afs/cern.ch/work/a/agilbert/CMSSW_TEST/CMSSW_5_3_7/src/UserCode/ICHiggsTauTau/plugins/ICMetProducer.cc:16:62: fatal error: DataFormats/METReco/interface/PFMEtSignCovMatrix.h: No such file or directory
--> PFMEtSignCovMatrix no longer needed, as we won't be importing an external MET covariance matrix anymore. Will remove this option from the code. Possible knock-on effect to any cfg files where the option "InputSig" is defined.

/afs/cern.ch/work/a/agilbert/CMSSW_TEST/CMSSW_5_3_7/src/UserCode/ICHiggsTauTau/plugins/ICPhotonProducer.hh:10:73: fatal error: EgammaAnalysis/ElectronTools/interface/PFIsolationEstimator.h: No such file or directory
--> Pending

Daughters and status codes in new MC have changed

The new pythia 8 status codes don't provide a direct replacement for status 3. The new status 21-29 is similar, but differs in that the lepton from a W->lnu decay for instance isn't always status 21-29.

I've emailed Josh Bendavid to ask if there is a recommendation, and it also appears from slides from Gen group meetings in early May that they are preparing a new set of "status flags" to try and solve this issue.

Job output (number of open files)

The IC disk server doesn't really like too many files being open at the same time, which means running systematic shifts as well as the central values from the same job makes vols super slow (and our colleagues who then can't do any work super annoyed). Can work around this by running syst shifts separately, but at some point should investigate better options, for example writing a separate tree for each shift into one file, then writing them to separate files after the jobs finish. Not sure this would actually be better than current workaround/wouldn't overload the disk server in some other way, but we should check.

JECs in ICPFJetProducer (for jets from PFCands)

The JECs calculated by ICPFJetProducer for jets built from (packed) PF candidates are different from the ones calculated by PAT, need to understand why if we want to produce jets without running the pat jet sequence.

What to do with HTTAnalysisTools

This concerns @adewit and @ajgilbert - so far we have kept HTTSequence capable of running the paper2013 strategy to produce flat trees in sync with those we made for the run1 analyses. However, it is potentially more complicated to keep the flat tree reading code (i.e. HTTAnalysisTools linking up with HiggsTauTauPlot4) compatible with both old and new strategies. Background methods and aliases are coded very specifically for the run 1 selections (using the "method") quantity, and HTTAnalysisTools is already 1700 lines long - do we want to try to keep both strategies available in this part of the code? Of course if we don't then we cannot remake 8 TeV datacards without using an old branch. If we are to keep both then I expect defining a new set of "method"s would probably be the simplest. Thoughts?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.