pandaphysics / pandatree Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 25.0 15.88 MB

Data format for Panda.

Home Page: https://codedocs.xyz/PandaPhysics/PandaTree/

C++ 88.99% Python 10.34% Objective-C 0.09% HTML 0.02% PHP 0.25% JavaScript 0.12% CSS 0.12% Perl 0.07%

pandatree's People

Contributors

Watchers

Forkers

sidnarayanan blallen dabercro arapyan siewyan mcremone lpc-dm jrtalbot ellenlee1 kpark1 riemanntensor1729 penguin2207

pandatree's Issues

Deep double B

maybe related to #111
but this requires CMSSW_9_4_6
RecoBTag/Combined/python/deepFlavour_cff.py has changed subtly between CMSSW_9_4_4 and CMSSW_9_4_6

.dump() prints weird things for chars

I notice ptype and charge print outs with strange symbols in the printout when calling event.dump()

In Bambu, one of the framework modules used to save a TGraph with the list of runs and lumis that were processed. We can save a similar structure to the panda tree so that e.g. lumi calculation can be simpler. It can be a tree structure with branches run:lumi:number of events.

egamma GSFix - original matching

Seems like MET POG is using a simple dR < 0.1 matching between GSFixed and original egamma objects.
cms-met/cmssw@ea443b6

We should align the implementation in ElectronsFiller and PhotonsFiller.

All gen particles

For the deep substructure studies, we'll need the full shower history of each jet at truth level. Unfortunately this means saving all gen particles (huge!). I don't want to hold up 003 because of this, so let's target 004. Proposal:

Define a tiny gen particle class (4 mom, pdgid, parent Ref, isfinal)
Save extra gen particles in a separate branch, so that most analyses can avoid reading this extra stuff.

I'm even okay making this configurable, so it can be turned on at will. I would just need to set up private submission on SubMIT, with stage-out to T2.

Run GenHFHadronMatcher in 010 producer

Not sure how we want to store it yet, but we should somehow include the official CMSSW GenHFHadronMatcher subroutine described here:

https://twiki.cern.ch/twiki/bin/view/CMSPublic/GenHFHadronMatcher

The CFI we want to run is here
https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/JetMCAlgos/python/GenHFHadronMatcher_cfi.py

Trigger degeneracy

Trigger bits should be stored in 32 bit unsigned ints to avoid misalignment of bits. Workaround in #3 for 002.

Signal weights missing

We originally assumed that any signal weight would have a non-integer ID (integer IDs are reserved for QCD variations), but it appears this is not always the case. So the signal weights fail [1] and possibly incorrectly enter the scale/PDF uncertainty calculations. Unfortunately the full weight name is also missing in the LHE XML header. For the sample (montop scalar) in question, the weight IDs are 1...25. Is it safe to exclude this range from what's assumed to be QCD variations? @maierbenedikt

[1] https://github.com/PandaPhysics/PandaProd/blob/master/Producer/src/WeightsFiller.cc#L272

genReweight booked many times

WeightsFiller has a bug and the genReweight branch is booked (Number of events in the tree) - 100 times.

Remove recoil filter for substructure calculations

I totally forgot this was in place until I just tried to do an inclusive study. We've already established this isn't the most computationally intensive part of the code anyway.

Fix copy construction of Elements

Is copy Ctor even implemented? Maybe I meant to do it later and never worked on it..

Electron matchedGen is identically invalid

electrons.matchedGen_ is always -1. MINIAOD does have non-null genParticleRef from electrons, so somewhere in the panda production we lose information.

Duplicate gen particles?

With the following snippet of code:

for (auto &p : genParticles) {
  if (!p.finalState)
    continue;
  for (auto &q : genParticles) {
    if (!q.finalState)
      continue;
    if (&p == &q)
      continue;
    bool parentage = (q.parent.isValid() && q.parent.get() == &p) ||
                     (p.parent.isValid() && p.parent.get() == &q);
    if (DeltaR2(q.eta(), q.phi(), p.eta(), p.phi()) < 0.00001) {
      PDebug("",Form("%f,%f,%f,%i matches with %f,%f,%f,%i; parentage=%s",
                     q.pt(), q.eta(), q.phi(), q.pdgid, p.pt(), p.eta(), p.phi(), p.pdgid,
                     parentage ? "true" : "false"));
    }
  }
}

run on t3home000:/tmp/snarayan/zptt.root, I see the following:

0.553223,-3.487533,1.761084,2212 matches with 0.553223,-3.487167,1.761181,2212; parentage=false
0.294189,-3.343608,2.775576,11 matches with 0.294189,-3.343608,2.775576,11; parentage=false
0.027145,-3.230445,2.885635,-11 matches with 0.027145,-3.230445,2.885733,-11; parentage=false
7.058594,0.346263,-0.127346,310 matches with 7.058594,0.346263,-0.127444,310; parentage=false
33.750000,0.375195,-0.093265,310 matches with 33.750000,0.375195,-0.093363,310; parentage=false
67.000000,0.417859,-0.246882,2212 matches with 67.000000,0.417859,-0.246882,2212; parentage=false
1.167969,2.477676,2.851837,2212 matches with 1.167969,2.477859,2.851837,2212; parentage=false
0.424561,-0.197028,-2.128674,2212 matches with 0.424561,-0.197211,-2.128674,2212; parentage=false
0.553223,-3.487167,1.761181,2212 matches with 0.553223,-3.487533,1.761084,2212; parentage=false
1.167969,2.477859,2.851837,2212 matches with 1.167969,2.477676,2.851837,2212; parentage=false
0.424561,-0.197211,-2.128674,2212 matches with 0.424561,-0.197028,-2.128674,2212; parentage=false
67.000000,0.417859,-0.246882,2212 matches with 67.000000,0.417859,-0.246882,2212; parentage=false
7.058594,0.346263,-0.127444,310 matches with 7.058594,0.346263,-0.127346,310; parentage=false
33.750000,0.375195,-0.093363,310 matches with 33.750000,0.375195,-0.093265,310; parentage=false
2.439453,1.076144,2.817473,22 matches with 9.750000,1.073946,2.817370,321; parentage=false
0.927246,-3.398175,-0.225309,2212 matches with 0.927246,-3.398358,-0.225211,2212; parentage=false
0.173096,-0.156377,1.809136,11 matches with 0.173096,-0.156194,1.809038,11; parentage=false
0.060577,-0.224311,1.880624,-11 matches with 0.060577,-0.224311,1.880527,-11; parentage=false
9.750000,1.073946,2.817370,321 matches with 2.439453,1.076144,2.817473,22; parentage=false
0.173096,-0.156194,1.809038,11 matches with 0.173096,-0.156377,1.809136,11; parentage=false
0.060577,-0.224311,1.880527,-11 matches with 0.060577,-0.224311,1.880624,-11; parentage=false
0.927246,-3.398358,-0.225211,2212 matches with 0.927246,-3.398175,-0.225309,2212; parentage=false
0.294189,-3.343608,2.775576,11 matches with 0.294189,-3.343608,2.775576,11; parentage=false
0.027145,-3.230445,2.885733,-11 matches with 0.027145,-3.230445,2.885635,-11; parentage=false

Note that:

The duplicate particles appear to be identical in pT, but only approximately identical in eta/phi
These are final state particles (supposedly)
When duplicates are found, neither particle is a parent of the other.

Either:

This is somehow a consequence of the Monte Carlo itself, in which case we don't have to worry
We are saving duplicated gen particles, in which case we should worry

Photon isolation variables

Photon isolation variables are corrected inconsistently in 002. We are using Spring16 set of effective areas for rho correction but the leakage correction (pt-dependent part) is Spring15.

DeepFlavor configuration changed

@SIDN Do you know what we should do with the configuration change in DeepFlavor in 94X?

With these lines in setupBTag.py
pfDeepCSVJetTags = btag.pfDeepCSVJetTags.clone( src = cms.InputTag(deepCSVInfosName) )

in 80X we get

cms.EDProducer("DeepFlavourJetTagsProducer", NNConfig = cms.FileInPath('RecoBTag/Combined/data/DeepFlavourNoSL.json'), src = cms.InputTag("pfDeepCSVTagInfosPuppi") )

but with 94X we get

cms.EDProducer("DeepFlavourJetTagsProducer", NNConfig = cms.FileInPath('RecoBTag/Combined/data/DeepFlavourNoSL.json'), checkSVForDefaults = cms.bool(False), meanPadding = cms.bool(False), src = cms.InputTag("pfDeepCSVTagInfosAK8PFchsSubjets"), toAdd = cms.PSet( probcc = cms.string('probc') ) )

Apparently toAdd sets up tags to be imported externally and not from the json (?). The EDProducer does not produce the products given in toAdd, so the jets sequence crash because we request probcc in makeJets.

PF MET significance

is complicated, is already computed in MiniAOD, and should be saved as an event-wide float.

Add Charged PF Veto for Photons

Need to implement a new type of electron veto for photons.

@yiiyama already implemented this in the monophoton analysis code [1] and as far as data structures go, it's just adding another boolean to the Photon object.

We should probably have this in for the next main version, definitely by the version that is meant to run on 2016 and 2017 together.

[1] https://github.com/MiT-HEP/MonoX/blob/master/monophoton/main/operators.cc#L1835-L1861

add Significance of the impact parameter for the leptons?

Add muon isStandalone flag

Various isXYZ flags to do out-in efficiency studies.
New selectors.

Trigger objects

We now store all trigger objects, so there is in principle no need to have the trigger object pre-matched to leptons and photons. Is anyone using triggerMatch bits? I am, but if no one else is, maybe we can drop it to reduce the maintenance load (we need to make sure we have the right HLT filter names every time there is a new version).

GenParticle limit should be raised

It's set to 256 now, but we save all gen particles, so this limit is too low.

https://github.com/PandaPhysics/PandaTree/blob/master/panda.def#L453

Need to check if this messes up the hard-scattering particles that all analyses need. If not, this fix can go in for 004.

Fix GenParticlesFiller on Sherpa

GenParticlesFiller (PNodeWithPtr::fillPanda) goes into a recursive infinite loop on sherpa samples.

Need a release validation mechanism

I've made too many mistakes which could have been avoided if we simply made a plot dump of all branches. We should write something like event.dumpPlots(tree) that creates histograms for all branches and saves it into a web page.

Add Vertex Info

Monophoton needs to measure rate of correctly identifying the primary vertex in order to improve a background estimate and remove an inefficient cut from the photon ID.

To first order, I believe we just need to save the reconstructed primary vertex and the generator level primary vertex in MC events. Depending on the results of the first studies, we might also want to include enough information to recalculate the PV in Z events after removing the leptons.

GenVertex

genVertex in 003 is filled from vx, vy, vz of the 0th genParticle. Turns out it's identically 0,0,0 for the 0th. Need to go over the gen particles and find the first non-null position.

Verbosity of BranchList

Setting the event BranchList should have some verbose option, that should do either/or of:

Print everything each pattern matches
Warn if a pattern matches nothing

This way, if a branch name changes, it doesn't just fail silently. Ideally, it would also be impossible to access the Event member if the branch isn't activated, but I don't think that's trivial to do.

I can work on this feature. Just putting it here to remind myself

GenJet quantities

There are 4 members of GenJet, all of which are wrong:

pdgid is not the PDG ID, but rather the hadron flavor. This should be renamed
partonFlavor, numB, and numC are not even filled

Not sure how this happened, but it needs to be fixed in a new release.

MadGraph reweights from LHE

In MadGraph (LO) samples, GenEventInfo.weight() and LHEEventProduct.originalXWGTUP() have distinct values. We are supposed to normalize LHE reweights with LHE originalXWGTUP, but instead are using eventInfo.weight().
In 002 we were doing this properly, but during the restructuring for 003 I switched to using only the weight() for code simplicity. Was checking on aMC@NLO and powheg samples only..
Since this only affects the values of genReweight brances and not the data format, we will treat the fix as a patch to 003. Existing 003 files will be moved to a temporary location while the new files are being produced.

merge.py creates "events" tree with multiple cycle numbers for large files

This leads to problems down the line. Need to use TFile::Delete the older cycle numbers if they exist or write the tree with TObject::kOverwrite

More lepton variables

So far I got r9 for electrons. Suggestion is to make it a SuperCluster variable. Let's see if that's what CMSSW does too.

Review trigger objects

Do we have the right set of trigger objects?

Auto-generate library

Since we are not using any custom streamers, we should actually be able to use SCRAM's library generation. Need to figure out how.

Run DNN b-jet regression in Panda 010

https://twiki.cern.ch/twiki/bin/view/Main/BJetRegression

Tracks

Add a track collection with the following information:

ptError
dxy
dz
Order the tracks in synch with PFCandidates so that the two collections can be linked "offline". Impact parameters to be computed wrt the associated vertex of the PF candidate.

Pruned mass

Needed for monojet

Missing electron IP

IPs were removed from VID. Two proposals:

Save dz and dxy
Apply the eta-dependent cuts [1] as part of the electron ID bits

Any strong feelings either way @maierbenedikt and @yiiyama ?

[1] https://twiki.cern.ch/twiki/bin/viewauth/CMS/CutBasedElectronIdentificationRun2

Gen particles

Copy http://cmslxr.fnal.gov/source/DataFormats/HepMCCandidate/interface/GenStatusFlags.h?v=CMSSW_8_0_20 to GenParticles class
and add status to GenParticle

Need general purpose Electron MVA value

Information on the electron MVA:

https://twiki.cern.ch/twiki/bin/view/CMS/MultivariateElectronIdentificationRun2#MVA_recipes_for_2016_data_and_Sp

This will take a little work to get it working in PandaProd

Make getEntry work with TChain

Each CollectionBase currently holds a pointer to TBranch to look ahead the collection size (to prepare for potentially resize & reallocation of the collection). This causes a segfault when we pass a TChain as an input, because the TBranch pointer is only valid for the first tree of the chain. We need to think of a better way.

This update will not affect the data format, so we should be able to accommodate it within 003.

Important HLTBits bug

I found that the prod-004 tag of PandaTree was reading triggers incorrectly in 004 files. I found a number of events in which the trigger should have fired, but was not picked up as such (weirdly enough it was firing a different trigger...might be the 32/64 bit issue?). Anyway, I reverted to a branch that I knew worked [1]. Will investigate this later (this was discovered trying to do some last-minute studies), but this needs to be understood and properly tagged.

[1] https://github.com/sidnarayanan/PandaTree/tree/testing-hlt

Update electron MVA id

2017 electron MVA id is split into Iso and NoIso. I guess we can reuse the existing mva90 and mva80 branches for the NoIso bits and add mva90Iso and mva80Iso branches. Or the other way around. I don't really care either way.

Missing MET filters

We somehow lost two of the filters in [1]. Snippet to include them:

process.load('RecoMET.METFilters.BadPFMuonFilter_cfi')
process.BadPFMuonFilter.muons = cms.InputTag("slimmedMuons")
process.BadPFMuonFilter.PFCandidates = cms.InputTag("packedPFCandidates")
process.BadPFMuonFilter.taggingMode = cms.bool(True)

process.load('RecoMET.METFilters.BadChargedCandidateFilter_cfi')
process.BadChargedCandidateFilter.muons = cms.InputTag("slimmedMuons")
process.BadChargedCandidateFilter.PFCandidates = cms.InputTag("packedPFCandidates")
process.BadChargedCandidateFilter.taggingMode = cms.bool(True)

These are less important given the muon fix, but we might want to keep it anyway.

[1] https://twiki.cern.ch/twiki/bin/viewauth/CMS/MissingETOptionalFiltersRun2#Moriond_2017

pandaphysics / pandatree Goto Github PK

pandatree's People

Contributors

Watchers

Forkers

pandatree's Issues

Recommend Projects

Recommend Topics

Recommend Org