pandaphysics / pandatree Goto Github PK
View Code? Open in Web Editor NEWData format for Panda.
Home Page: https://codedocs.xyz/PandaPhysics/PandaTree/
Data format for Panda.
Home Page: https://codedocs.xyz/PandaPhysics/PandaTree/
maybe related to #111
but this requires CMSSW_9_4_6
RecoBTag/Combined/python/deepFlavour_cff.py
has changed subtly between CMSSW_9_4_4 and CMSSW_9_4_6
I notice ptype
and charge
print outs with strange symbols in the printout when calling event.dump()
We need all 100.
In Bambu, one of the framework modules used to save a TGraph with the list of runs and lumis that were processed. We can save a similar structure to the panda tree so that e.g. lumi calculation can be simpler. It can be a tree structure with branches run:lumi:number of events.
Seems like MET POG is using a simple dR < 0.1 matching between GSFixed and original egamma objects.
cms-met/cmssw@ea443b6
We should align the implementation in ElectronsFiller and PhotonsFiller.
For the deep substructure studies, we'll need the full shower history of each jet at truth level. Unfortunately this means saving all gen particles (huge!). I don't want to hold up 003 because of this, so let's target 004. Proposal:
Define a tiny gen particle class (4 mom, pdgid, parent Ref, isfinal)
Save extra gen particles in a separate branch, so that most analyses can avoid reading this extra stuff.
I'm even okay making this configurable, so it can be turned on at will. I would just need to set up private submission on SubMIT, with stage-out to T2.
Not sure how we want to store it yet, but we should somehow include the official CMSSW GenHFHadronMatcher subroutine described here:
https://twiki.cern.ch/twiki/bin/view/CMSPublic/GenHFHadronMatcher
The CFI we want to run is here
https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/JetMCAlgos/python/GenHFHadronMatcher_cfi.py
Trigger bits should be stored in 32 bit unsigned ints to avoid misalignment of bits. Workaround in #3 for 002.
We originally assumed that any signal weight would have a non-integer ID (integer IDs are reserved for QCD variations), but it appears this is not always the case. So the signal weights fail [1] and possibly incorrectly enter the scale/PDF uncertainty calculations. Unfortunately the full weight name is also missing in the LHE XML header. For the sample (montop scalar) in question, the weight IDs are 1...25. Is it safe to exclude this range from what's assumed to be QCD variations? @maierbenedikt
[1] https://github.com/PandaPhysics/PandaProd/blob/master/Producer/src/WeightsFiller.cc#L272
WeightsFiller has a bug and the genReweight branch is booked (Number of events in the tree) - 100 times.
I totally forgot this was in place until I just tried to do an inclusive study. We've already established this isn't the most computationally intensive part of the code anyway.
Is copy Ctor even implemented? Maybe I meant to do it later and never worked on it..
electrons.matchedGen_ is always -1. MINIAOD does have non-null genParticleRef from electrons, so somewhere in the panda production we lose information.
With the following snippet of code:
for (auto &p : genParticles) {
if (!p.finalState)
continue;
for (auto &q : genParticles) {
if (!q.finalState)
continue;
if (&p == &q)
continue;
bool parentage = (q.parent.isValid() && q.parent.get() == &p) ||
(p.parent.isValid() && p.parent.get() == &q);
if (DeltaR2(q.eta(), q.phi(), p.eta(), p.phi()) < 0.00001) {
PDebug("",Form("%f,%f,%f,%i matches with %f,%f,%f,%i; parentage=%s",
q.pt(), q.eta(), q.phi(), q.pdgid, p.pt(), p.eta(), p.phi(), p.pdgid,
parentage ? "true" : "false"));
}
}
}
run on t3home000:/tmp/snarayan/zptt.root
, I see the following:
0.553223,-3.487533,1.761084,2212 matches with 0.553223,-3.487167,1.761181,2212; parentage=false
0.294189,-3.343608,2.775576,11 matches with 0.294189,-3.343608,2.775576,11; parentage=false
0.027145,-3.230445,2.885635,-11 matches with 0.027145,-3.230445,2.885733,-11; parentage=false
7.058594,0.346263,-0.127346,310 matches with 7.058594,0.346263,-0.127444,310; parentage=false
33.750000,0.375195,-0.093265,310 matches with 33.750000,0.375195,-0.093363,310; parentage=false
67.000000,0.417859,-0.246882,2212 matches with 67.000000,0.417859,-0.246882,2212; parentage=false
1.167969,2.477676,2.851837,2212 matches with 1.167969,2.477859,2.851837,2212; parentage=false
0.424561,-0.197028,-2.128674,2212 matches with 0.424561,-0.197211,-2.128674,2212; parentage=false
0.553223,-3.487167,1.761181,2212 matches with 0.553223,-3.487533,1.761084,2212; parentage=false
1.167969,2.477859,2.851837,2212 matches with 1.167969,2.477676,2.851837,2212; parentage=false
0.424561,-0.197211,-2.128674,2212 matches with 0.424561,-0.197028,-2.128674,2212; parentage=false
67.000000,0.417859,-0.246882,2212 matches with 67.000000,0.417859,-0.246882,2212; parentage=false
7.058594,0.346263,-0.127444,310 matches with 7.058594,0.346263,-0.127346,310; parentage=false
33.750000,0.375195,-0.093363,310 matches with 33.750000,0.375195,-0.093265,310; parentage=false
2.439453,1.076144,2.817473,22 matches with 9.750000,1.073946,2.817370,321; parentage=false
0.927246,-3.398175,-0.225309,2212 matches with 0.927246,-3.398358,-0.225211,2212; parentage=false
0.173096,-0.156377,1.809136,11 matches with 0.173096,-0.156194,1.809038,11; parentage=false
0.060577,-0.224311,1.880624,-11 matches with 0.060577,-0.224311,1.880527,-11; parentage=false
9.750000,1.073946,2.817370,321 matches with 2.439453,1.076144,2.817473,22; parentage=false
0.173096,-0.156194,1.809038,11 matches with 0.173096,-0.156377,1.809136,11; parentage=false
0.060577,-0.224311,1.880527,-11 matches with 0.060577,-0.224311,1.880624,-11; parentage=false
0.927246,-3.398358,-0.225211,2212 matches with 0.927246,-3.398175,-0.225309,2212; parentage=false
0.294189,-3.343608,2.775576,11 matches with 0.294189,-3.343608,2.775576,11; parentage=false
0.027145,-3.230445,2.885733,-11 matches with 0.027145,-3.230445,2.885635,-11; parentage=false
Note that:
Either:
Photon isolation variables are corrected inconsistently in 002. We are using Spring16 set of effective areas for rho correction but the leakage correction (pt-dependent part) is Spring15.
@SIDN Do you know what we should do with the configuration change in DeepFlavor in 94X?
With these lines in setupBTag.py
pfDeepCSVJetTags = btag.pfDeepCSVJetTags.clone( src = cms.InputTag(deepCSVInfosName) )
in 80X we get
cms.EDProducer("DeepFlavourJetTagsProducer", NNConfig = cms.FileInPath('RecoBTag/Combined/data/DeepFlavourNoSL.json'), src = cms.InputTag("pfDeepCSVTagInfosPuppi") )
but with 94X we get
cms.EDProducer("DeepFlavourJetTagsProducer", NNConfig = cms.FileInPath('RecoBTag/Combined/data/DeepFlavourNoSL.json'), checkSVForDefaults = cms.bool(False), meanPadding = cms.bool(False), src = cms.InputTag("pfDeepCSVTagInfosAK8PFchsSubjets"), toAdd = cms.PSet( probcc = cms.string('probc') ) )
Apparently toAdd
sets up tags to be imported externally and not from the json (?). The EDProducer does not produce the products given in toAdd, so the jets sequence crash because we request probcc in makeJets.
is complicated, is already computed in MiniAOD, and should be saved as an event-wide float.
Need to implement a new type of electron veto for photons.
@yiiyama already implemented this in the monophoton analysis code [1] and as far as data structures go, it's just adding another boolean to the Photon object.
We should probably have this in for the next main version, definitely by the version that is meant to run on 2016 and 2017 together.
[1] https://github.com/MiT-HEP/MonoX/blob/master/monophoton/main/operators.cc#L1835-L1861
We now store all trigger objects, so there is in principle no need to have the trigger object pre-matched to leptons and photons. Is anyone using triggerMatch bits? I am, but if no one else is, maybe we can drop it to reduce the maintenance load (we need to make sure we have the right HLT filter names every time there is a new version).
It's set to 256 now, but we save all gen particles, so this limit is too low.
https://github.com/PandaPhysics/PandaTree/blob/master/panda.def#L453
Need to check if this messes up the hard-scattering particles that all analyses need. If not, this fix can go in for 004.
GenParticlesFiller (PNodeWithPtr::fillPanda) goes into a recursive infinite loop on sherpa samples.
I've made too many mistakes which could have been avoided if we simply made a plot dump of all branches. We should write something like event.dumpPlots(tree) that creates histograms for all branches and saves it into a web page.
Monophoton needs to measure rate of correctly identifying the primary vertex in order to improve a background estimate and remove an inefficient cut from the photon ID.
To first order, I believe we just need to save the reconstructed primary vertex and the generator level primary vertex in MC events. Depending on the results of the first studies, we might also want to include enough information to recalculate the PV in Z events after removing the leptons.
genVertex in 003 is filled from vx, vy, vz of the 0th genParticle. Turns out it's identically 0,0,0 for the 0th. Need to go over the gen particles and find the first non-null position.
Setting the event BranchList should have some verbose option, that should do either/or of:
Print everything each pattern matches
Warn if a pattern matches nothing
This way, if a branch name changes, it doesn't just fail silently. Ideally, it would also be impossible to access the Event member if the branch isn't activated, but I don't think that's trivial to do.
I can work on this feature. Just putting it here to remind myself
There are 4 members of GenJet, all of which are wrong:
pdgid
is not the PDG ID, but rather the hadron flavor. This should be renamedpartonFlavor
, numB
, and numC
are not even filledNot sure how this happened, but it needs to be fixed in a new release.
In MadGraph (LO) samples, GenEventInfo.weight() and LHEEventProduct.originalXWGTUP() have distinct values. We are supposed to normalize LHE reweights with LHE originalXWGTUP, but instead are using eventInfo.weight().
In 002 we were doing this properly, but during the restructuring for 003 I switched to using only the weight() for code simplicity. Was checking on aMC@NLO and powheg samples only..
Since this only affects the values of genReweight brances and not the data format, we will treat the fix as a patch to 003. Existing 003 files will be moved to a temporary location while the new files are being produced.
This leads to problems down the line. Need to use TFile::Delete the older cycle numbers if they exist or write the tree with TObject::kOverwrite
So far I got r9 for electrons. Suggestion is to make it a SuperCluster variable. Let's see if that's what CMSSW does too.
Do we have the right set of trigger objects?
Since we are not using any custom streamers, we should actually be able to use SCRAM's library generation. Need to figure out how.
Add a track collection with the following information:
Needed for monojet
IPs were removed from VID. Two proposals:
Save dz and dxy
Apply the eta-dependent cuts [1] as part of the electron ID bits
Any strong feelings either way @maierbenedikt and @yiiyama ?
[1] https://twiki.cern.ch/twiki/bin/viewauth/CMS/CutBasedElectronIdentificationRun2
Copy http://cmslxr.fnal.gov/source/DataFormats/HepMCCandidate/interface/GenStatusFlags.h?v=CMSSW_8_0_20 to GenParticles class
and add status to GenParticle
Information on the electron MVA:
This will take a little work to get it working in PandaProd
Each CollectionBase currently holds a pointer to TBranch to look ahead the collection size (to prepare for potentially resize & reallocation of the collection). This causes a segfault when we pass a TChain as an input, because the TBranch pointer is only valid for the first tree of the chain. We need to think of a better way.
This update will not affect the data format, so we should be able to accommodate it within 003.
I found that the prod-004 tag of PandaTree was reading triggers incorrectly in 004 files. I found a number of events in which the trigger should have fired, but was not picked up as such (weirdly enough it was firing a different trigger...might be the 32/64 bit issue?). Anyway, I reverted to a branch that I knew worked [1]. Will investigate this later (this was discovered trying to do some last-minute studies), but this needs to be understood and properly tagged.
[1] https://github.com/sidnarayanan/PandaTree/tree/testing-hlt
2017 electron MVA id is split into Iso and NoIso. I guess we can reuse the existing mva90 and mva80 branches for the NoIso bits and add mva90Iso and mva80Iso branches. Or the other way around. I don't really care either way.
We somehow lost two of the filters in [1]. Snippet to include them:
process.load('RecoMET.METFilters.BadPFMuonFilter_cfi')
process.BadPFMuonFilter.muons = cms.InputTag("slimmedMuons")
process.BadPFMuonFilter.PFCandidates = cms.InputTag("packedPFCandidates")
process.BadPFMuonFilter.taggingMode = cms.bool(True)
process.load('RecoMET.METFilters.BadChargedCandidateFilter_cfi')
process.BadChargedCandidateFilter.muons = cms.InputTag("slimmedMuons")
process.BadChargedCandidateFilter.PFCandidates = cms.InputTag("packedPFCandidates")
process.BadChargedCandidateFilter.taggingMode = cms.bool(True)
These are less important given the muon fix, but we might want to keep it anyway.
[1] https://twiki.cern.ch/twiki/bin/viewauth/CMS/MissingETOptionalFiltersRun2#Moriond_2017
Tiny bit bigger, but saves a lot of time offline for special studies.
Needed for VBF
PF Met taken out of the box from the re-miniAOD is broken. We can either run an official CMS producer from Zeynep or we can add Sid's private fix, but we should fix this for version 003 so we don't need to run it on top of panda anymore.
Hey @sidnarayanan can you add your upgraded genDict script to PandaTree? We've already put the LinkDef and related files in the dict/ directory of Framework and Objects, and there isn't really a reason not to support CLING with Panda.
We are taking status 1 particles from packedGenParticles, which destroys the statusFlags information. We can instead decide to ignore the packed candidate when there is a match between pruned and packed.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.