Coder Social home page Coder Social logo

pandaphysics / pandatree Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 25.0 15.88 MB

Data format for Panda.

Home Page: https://codedocs.xyz/PandaPhysics/PandaTree/

C++ 88.99% Python 10.34% Objective-C 0.09% HTML 0.02% PHP 0.25% JavaScript 0.12% CSS 0.12% Perl 0.07%

pandatree's People

Contributors

blallen avatar dabercro avatar dr-stringfellow avatar dylanhsu avatar sidnarayanan avatar yiiyama avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

pandatree's Issues

Deep double B

maybe related to #111
but this requires CMSSW_9_4_6
RecoBTag/Combined/python/deepFlavour_cff.py has changed subtly between CMSSW_9_4_4 and CMSSW_9_4_6

Add a run & lumi summary tree

In Bambu, one of the framework modules used to save a TGraph with the list of runs and lumis that were processed. We can save a similar structure to the panda tree so that e.g. lumi calculation can be simpler. It can be a tree structure with branches run:lumi:number of events.

All gen particles

For the deep substructure studies, we'll need the full shower history of each jet at truth level. Unfortunately this means saving all gen particles (huge!). I don't want to hold up 003 because of this, so let's target 004. Proposal:

  • Define a tiny gen particle class (4 mom, pdgid, parent Ref, isfinal)

  • Save extra gen particles in a separate branch, so that most analyses can avoid reading this extra stuff.

I'm even okay making this configurable, so it can be turned on at will. I would just need to set up private submission on SubMIT, with stage-out to T2.

Trigger degeneracy

Trigger bits should be stored in 32 bit unsigned ints to avoid misalignment of bits. Workaround in #3 for 002.

Signal weights missing

We originally assumed that any signal weight would have a non-integer ID (integer IDs are reserved for QCD variations), but it appears this is not always the case. So the signal weights fail [1] and possibly incorrectly enter the scale/PDF uncertainty calculations. Unfortunately the full weight name is also missing in the LHE XML header. For the sample (montop scalar) in question, the weight IDs are 1...25. Is it safe to exclude this range from what's assumed to be QCD variations? @maierbenedikt

[1] https://github.com/PandaPhysics/PandaProd/blob/master/Producer/src/WeightsFiller.cc#L272

Duplicate gen particles?

With the following snippet of code:

for (auto &p : genParticles) {
  if (!p.finalState)
    continue;
  for (auto &q : genParticles) {
    if (!q.finalState)
      continue;
    if (&p == &q)
      continue;
    bool parentage = (q.parent.isValid() && q.parent.get() == &p) ||
                     (p.parent.isValid() && p.parent.get() == &q);
    if (DeltaR2(q.eta(), q.phi(), p.eta(), p.phi()) < 0.00001) {
      PDebug("",Form("%f,%f,%f,%i matches with %f,%f,%f,%i; parentage=%s",
                     q.pt(), q.eta(), q.phi(), q.pdgid, p.pt(), p.eta(), p.phi(), p.pdgid,
                     parentage ? "true" : "false"));
    }
  }
}

run on t3home000:/tmp/snarayan/zptt.root, I see the following:

0.553223,-3.487533,1.761084,2212 matches with 0.553223,-3.487167,1.761181,2212; parentage=false
0.294189,-3.343608,2.775576,11 matches with 0.294189,-3.343608,2.775576,11; parentage=false
0.027145,-3.230445,2.885635,-11 matches with 0.027145,-3.230445,2.885733,-11; parentage=false
7.058594,0.346263,-0.127346,310 matches with 7.058594,0.346263,-0.127444,310; parentage=false
33.750000,0.375195,-0.093265,310 matches with 33.750000,0.375195,-0.093363,310; parentage=false
67.000000,0.417859,-0.246882,2212 matches with 67.000000,0.417859,-0.246882,2212; parentage=false
1.167969,2.477676,2.851837,2212 matches with 1.167969,2.477859,2.851837,2212; parentage=false
0.424561,-0.197028,-2.128674,2212 matches with 0.424561,-0.197211,-2.128674,2212; parentage=false
0.553223,-3.487167,1.761181,2212 matches with 0.553223,-3.487533,1.761084,2212; parentage=false
1.167969,2.477859,2.851837,2212 matches with 1.167969,2.477676,2.851837,2212; parentage=false
0.424561,-0.197211,-2.128674,2212 matches with 0.424561,-0.197028,-2.128674,2212; parentage=false
67.000000,0.417859,-0.246882,2212 matches with 67.000000,0.417859,-0.246882,2212; parentage=false
7.058594,0.346263,-0.127444,310 matches with 7.058594,0.346263,-0.127346,310; parentage=false
33.750000,0.375195,-0.093363,310 matches with 33.750000,0.375195,-0.093265,310; parentage=false
2.439453,1.076144,2.817473,22 matches with 9.750000,1.073946,2.817370,321; parentage=false
0.927246,-3.398175,-0.225309,2212 matches with 0.927246,-3.398358,-0.225211,2212; parentage=false
0.173096,-0.156377,1.809136,11 matches with 0.173096,-0.156194,1.809038,11; parentage=false
0.060577,-0.224311,1.880624,-11 matches with 0.060577,-0.224311,1.880527,-11; parentage=false
9.750000,1.073946,2.817370,321 matches with 2.439453,1.076144,2.817473,22; parentage=false
0.173096,-0.156194,1.809038,11 matches with 0.173096,-0.156377,1.809136,11; parentage=false
0.060577,-0.224311,1.880527,-11 matches with 0.060577,-0.224311,1.880624,-11; parentage=false
0.927246,-3.398358,-0.225211,2212 matches with 0.927246,-3.398175,-0.225309,2212; parentage=false
0.294189,-3.343608,2.775576,11 matches with 0.294189,-3.343608,2.775576,11; parentage=false
0.027145,-3.230445,2.885733,-11 matches with 0.027145,-3.230445,2.885635,-11; parentage=false

Note that:

  • The duplicate particles appear to be identical in pT, but only approximately identical in eta/phi
  • These are final state particles (supposedly)
  • When duplicates are found, neither particle is a parent of the other.

Either:

  • This is somehow a consequence of the Monte Carlo itself, in which case we don't have to worry
  • We are saving duplicated gen particles, in which case we should worry

Photon isolation variables

Photon isolation variables are corrected inconsistently in 002. We are using Spring16 set of effective areas for rho correction but the leakage correction (pt-dependent part) is Spring15.

DeepFlavor configuration changed

@SIDN Do you know what we should do with the configuration change in DeepFlavor in 94X?

With these lines in setupBTag.py
pfDeepCSVJetTags = btag.pfDeepCSVJetTags.clone( src = cms.InputTag(deepCSVInfosName) )

in 80X we get

cms.EDProducer("DeepFlavourJetTagsProducer", NNConfig = cms.FileInPath('RecoBTag/Combined/data/DeepFlavourNoSL.json'), src = cms.InputTag("pfDeepCSVTagInfosPuppi") )

but with 94X we get

cms.EDProducer("DeepFlavourJetTagsProducer", NNConfig = cms.FileInPath('RecoBTag/Combined/data/DeepFlavourNoSL.json'), checkSVForDefaults = cms.bool(False), meanPadding = cms.bool(False), src = cms.InputTag("pfDeepCSVTagInfosAK8PFchsSubjets"), toAdd = cms.PSet( probcc = cms.string('probc') ) )

Apparently toAdd sets up tags to be imported externally and not from the json (?). The EDProducer does not produce the products given in toAdd, so the jets sequence crash because we request probcc in makeJets.

PF MET significance

is complicated, is already computed in MiniAOD, and should be saved as an event-wide float.

Trigger objects

We now store all trigger objects, so there is in principle no need to have the trigger object pre-matched to leptons and photons. Is anyone using triggerMatch bits? I am, but if no one else is, maybe we can drop it to reduce the maintenance load (we need to make sure we have the right HLT filter names every time there is a new version).

Need a release validation mechanism

I've made too many mistakes which could have been avoided if we simply made a plot dump of all branches. We should write something like event.dumpPlots(tree) that creates histograms for all branches and saves it into a web page.

Add Vertex Info

Monophoton needs to measure rate of correctly identifying the primary vertex in order to improve a background estimate and remove an inefficient cut from the photon ID.

To first order, I believe we just need to save the reconstructed primary vertex and the generator level primary vertex in MC events. Depending on the results of the first studies, we might also want to include enough information to recalculate the PV in Z events after removing the leptons.

GenVertex

genVertex in 003 is filled from vx, vy, vz of the 0th genParticle. Turns out it's identically 0,0,0 for the 0th. Need to go over the gen particles and find the first non-null position.

Verbosity of BranchList

Setting the event BranchList should have some verbose option, that should do either/or of:

  • Print everything each pattern matches

  • Warn if a pattern matches nothing

This way, if a branch name changes, it doesn't just fail silently. Ideally, it would also be impossible to access the Event member if the branch isn't activated, but I don't think that's trivial to do.

I can work on this feature. Just putting it here to remind myself

GenJet quantities

There are 4 members of GenJet, all of which are wrong:

  • pdgid is not the PDG ID, but rather the hadron flavor. This should be renamed
  • partonFlavor, numB, and numC are not even filled

Not sure how this happened, but it needs to be fixed in a new release.

MadGraph reweights from LHE

In MadGraph (LO) samples, GenEventInfo.weight() and LHEEventProduct.originalXWGTUP() have distinct values. We are supposed to normalize LHE reweights with LHE originalXWGTUP, but instead are using eventInfo.weight().
In 002 we were doing this properly, but during the restructuring for 003 I switched to using only the weight() for code simplicity. Was checking on aMC@NLO and powheg samples only..
Since this only affects the values of genReweight brances and not the data format, we will treat the fix as a patch to 003. Existing 003 files will be moved to a temporary location while the new files are being produced.

More lepton variables

So far I got r9 for electrons. Suggestion is to make it a SuperCluster variable. Let's see if that's what CMSSW does too.

Auto-generate library

Since we are not using any custom streamers, we should actually be able to use SCRAM's library generation. Need to figure out how.

Tracks

Add a track collection with the following information:

  • ptError
  • dxy
  • dz
    Order the tracks in synch with PFCandidates so that the two collections can be linked "offline". Impact parameters to be computed wrt the associated vertex of the PF candidate.

Make getEntry work with TChain

Each CollectionBase currently holds a pointer to TBranch to look ahead the collection size (to prepare for potentially resize & reallocation of the collection). This causes a segfault when we pass a TChain as an input, because the TBranch pointer is only valid for the first tree of the chain. We need to think of a better way.

This update will not affect the data format, so we should be able to accommodate it within 003.

Important HLTBits bug

I found that the prod-004 tag of PandaTree was reading triggers incorrectly in 004 files. I found a number of events in which the trigger should have fired, but was not picked up as such (weirdly enough it was firing a different trigger...might be the 32/64 bit issue?). Anyway, I reverted to a branch that I knew worked [1]. Will investigate this later (this was discovered trying to do some last-minute studies), but this needs to be understood and properly tagged.

[1] https://github.com/sidnarayanan/PandaTree/tree/testing-hlt

Update electron MVA id

2017 electron MVA id is split into Iso and NoIso. I guess we can reuse the existing mva90 and mva80 branches for the NoIso bits and add mva90Iso and mva80Iso branches. Or the other way around. I don't really care either way.

Missing MET filters

We somehow lost two of the filters in [1]. Snippet to include them:

process.load('RecoMET.METFilters.BadPFMuonFilter_cfi')
process.BadPFMuonFilter.muons = cms.InputTag("slimmedMuons")
process.BadPFMuonFilter.PFCandidates = cms.InputTag("packedPFCandidates")
process.BadPFMuonFilter.taggingMode = cms.bool(True)

process.load('RecoMET.METFilters.BadChargedCandidateFilter_cfi')
process.BadChargedCandidateFilter.muons = cms.InputTag("slimmedMuons")
process.BadChargedCandidateFilter.PFCandidates = cms.InputTag("packedPFCandidates")
process.BadChargedCandidateFilter.taggingMode = cms.bool(True)

These are less important given the muon fix, but we might want to keep it anyway.

[1] https://twiki.cern.ch/twiki/bin/viewauth/CMS/MissingETOptionalFiltersRun2#Moriond_2017

Fix PF Met from re-miniAOD

PF Met taken out of the box from the re-miniAOD is broken. We can either run an official CMS producer from Zeynep or we can add Sid's private fix, but we should fix this for version 003 so we don't need to run it on top of panda anymore.

Add genDict to PandaTree

Hey @sidnarayanan can you add your upgraded genDict script to PandaTree? We've already put the LinkDef and related files in the dict/ directory of Framework and Objects, and there isn't really a reason not to support CLING with Panda.

statusFlags of genparticles

We are taking status 1 particles from packedGenParticles, which destroys the statusFlags information. We can instead decide to ignore the packed candidate when there is a match between pruned and packed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.