Coder Social home page Coder Social logo

flashgg's People

Contributors

alesauva avatar andreh7 avatar arnabpurohit avatar bmarzocc avatar camilocarrillo avatar cippy avatar edjtscott avatar emanueledimarco avatar ferriff avatar gkrintir avatar innakucher avatar junquantao avatar kmondal avatar malcles avatar martinamalberti avatar matteosan1 avatar maxgalli avatar mdonega avatar michelif avatar musella avatar olivierbondu avatar panwarlsweet avatar saghosh avatar sam-may avatar sethzenz avatar simonepigazzini avatar threiten avatar vtavolar avatar yhaddad avatar youyingli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flashgg's Issues

Clean up configurations and analysis scripts

Except maybe for a few things like simple_Producer_test.py and simple_Tag_test.py, we should move configurations and analysis scripts to a common place outside the producer directories. (Perhaps a folder under Commissioning?) I think the divide should be as follows:

  • Well-defined validation procedure uses a few configurations/scripts in producer test directories. Validation procedure and expected output is documented and used to be sure that new code doesn't break anything.
  • All others are moved.

unused beamspot

MicroAODProducers/test/simple_Producer_test.py

This is a leftover, not used anymore:
L28 BeamSpotTag=cms.untracked.InputTag('offlineBeamSpot'),

Add examples of workspace/ntuple dumping to simple_Tag_test.py

Currently, in evaluating PR's that I'm not directly familiar with, I rely heavily on the standard MicroAOD followed by running Taggers/test/simple_Tag_test.py. It's clear that a lot of the dumpers etc aren't being tested, and this is often the functionality that new PR's are changing. I would welcome either an update to the tag test or suggestions for additional standard jobs to use to exercise everything that might break.

Tags: interleaved sorting

Allow, for example, VBF 0 > Untagged 0 > VBF 1 > ... Configure TagSorter with VPSet list of tuple(tagName,minCat,maxCat).

Change object format for DzVertexMap object

TLDR: we should get rid of maps we put in the event. I will take care of this but let me know if you need a map and I can help you replace it. More details below if you're curious.

I have been advised by CMSSW memory management experts that maps can create issues or be unstable, and that map<T,vector<U> > is especially bad. [Here T=Ptr<vertex> and U=PackedCandidate.] The use of this construct for a one-to-many map can be replaced with some clever handling of vector<pair<T,U> >, in which all the U's corresponding to a given T are in sequence and you rely on that to look at only the right part of the vector.

Also, for reference, a one-to-one map<T,U> should simply be replaced with a vector for which you rely on the fact that the new collection is in the same order as the old. I had significant trouble creating the map anyway, because CMSSW implicitly requires a vector<T> dictionary for a map<T,U> even if you don't think you're using vector<T>. I eventually got rid of the map (in code I will do a pull request for later this morning) and just used the vector<U>.

Modify README to clone upstream, not personal repo

As the instruction are, the users will clone the master repo on their own branch and then they add cms-analysi/flashgg as upstream.

This is error prone because in general the user's master branch will not be in sync with the flashgg one.

The instructions should be modified such that the user's clone the flashgg master. They can then set the upstream repo to their personal one.

I can make the necessary changes if people are fine.

Diphoton candidates are stored twice

Hi,

for now the diphoton candidates are stored twice in the final diphoton collection:
https://github.com/cms-analysis/flashgg/blob/master/MicroAODProducers/plugins/DiPhotonProducer.cc#L87
and
https://github.com/cms-analysis/flashgg/blob/master/MicroAODProducers/plugins/DiPhotonProducer.cc#L95

We are currently working with @swagata87 on implementing the photon 4-momentum kinematics changes due to the vertex assignment around these lines. So we will probably correct this whenever we are ready to pull-request, this issue is to make sure people are aware of this in the meantime.

cleanup ?

In MicroAODProducers/test/simple_Producer_test.py I would move

process.flashggVertexMapUnique = cms.EDProducer('FlashggDzVertexMapProducer',
…
process.flashggVertexMapNonUnique = cms.EDProducer('FlashggDzVertexMapProducer',
…
process.flashggJets = cms.EDProducer('FlashggJetProducer',

in python fragments in /python

MicroAOD size tuning

Current content (as of phys14 v2 production) is ~50% of MiniAOD.

We need to achieve a factor 2.5 reduction in order to meet the goal 1/5 goal.

I made a small excercie to see what could be done to meet the target.
https://musella.web.cern.ch/musella/higgs/flashgg/miniAOD.xls

Summary is that we need to:

  • avoid photon info duplication in di-photon
  • we should preselect leptons
  • drop cluster information and store subset in photons
  • only store photon ID info for vertexes associated with at least one di-photon pair
  • reduce size of jet collection

prepareCrabJobs.py

flashgg/MetaData/work/prepareCrabJobs.py

108 if options.dumpCfg:
109 print ( dumpCfg(cfg) )
110 exit(0)

it should be:
109 print ( dumpCfg(options) )

Keep all vertexes info in VertexSelector

In order to be able to perform a training of the vertex selection algorithm, the information about all vertexes should be stored in the DiPhoton candidates.

For space reasons, one could limit the total number of vertexes to be stored, but the functionalities of the algorithms and the data format need to be extended.

"Reroute" diphoton processing in tag step

The default tag sequence currently uses the DiPhotonCollection directly as input. I propose to switch it to start with the preselectedDiPhotonCollection and use the central value of the new systematics producer. This should not have any drastic downstream effects, and has to be done anyway, but I will check that the tag output is more or less the same. Any other comments/concerns?

./prepareCrabJobs.py --load <previous_config.json>

The "--load" features doesn't work because of two bugs in optpars_utils.py

The first one is:
L32 if origin:
L33 origin += ",%s",value

should be

L32 if origin:
L33 origin += " "
L34 origin += ''.join(value)

The second is:
L50 if attr and type(attr) == list:
L51 attr.extend(v)
L52 setter(dest,k,v)

L50 if attr and type(attr) == list:
L51 attr.extend(v)
L52 setter(dest,k,attr)
L53 else:
L54 setter(dest,k,v)

Switch to PFCHSLeg jets

With the improved track-vertex association for PFCHS collection building (see talk linked from https://twiki.cern.ch/twiki/bin/viewauth/CMS/FLASHggFramework#2015_03_02 ), the PFCHSLeg jet collection is ok. I propose to move this to the default for the tags that use jets -- not as a final choice, but it's sensible in a way that PFCHS0 is not. We should watch carefully that this does not create memory problems on the grid, but I have run successfully with the default 2 GB. Any comments/concerns on making this change in the default sequence?

Migrate to 73X

It appears that CMSSW_7_3_2 is a sufficiently stable migration target. However, some of the effort (e.g. on jet tools) is non-trivial. I propose we do this after the Higgs Workshop to avoid confusion.

oldval / newval print statements from GBRLikelihood

We get these statements from the diphoton MVA code:

oldval = 0.010000, newval = -1.289817, evaluate = 0.010000
oldval = 1.000000, newval = -0.111341, evaluate = 1.000000
oldval = 2.000000, newval = -1.542650, evaluate = 2.000000
oldval = 2.000000, newval = -1.542650, evaluate = 2.000000

They come from here:

https://github.com/bendavid/GBRLikelihood/blob/4fda233acf853c38ce657313fd7259957bd874b7/src/RooHybridBDTAutoPdf.cc#L281

We get that version by using the following tag:

git clone -b hggpaperV8 https://github.com/bendavid/GBRLikelihood

TODO: fork and make a flashgg branch (as we already do for GBRLikelihoodEGTools) to get rid of this messages, then update instructions.

Reduce size of PileupSummaryInfos

Currently PileupSummaryInfos_addPileupInfo takes up the second-largest amount of space in the expanded MicroAOD:

File file:myMicroAODOutputFile.root Events 259
Branch Name | Average Uncompressed Size (Bytes/Event) | Average Compressed Size (Bytes/Event)
flashggJets_flashggJets__FLASHggMicroAOD. 76707.5 9590.08
PileupSummaryInfos_addPileupInfo__HLT. 14749.1 5354.72

TODO: filter or otherwise reduce the size of this collection

Conversions in VertexSelector

LegacyVertexSelector.cc#L481

float nConv = conversionsVector.size();

The MVA should receive as input the how many, out of the two photons, are converted (i.e., 0,1,2)
Like I guess it's looking at the number of conversions in the event that it gets from the DiPhotonProducer:
Handle<Viewreco::Conversion > conversions;
evt.getByToken(conversionToken_,conversions);
const PtrVectorreco::Conversion& conversionPointers = conversions->ptrVector();

A possible patch for the code could be:

L477


  float nConv = 0;
  if (IndexMatchedConversionLeadPhoton != -1) ++nconv;
  if (IndexMatchedConversionTrailPhoton != -1) ++nconv;

  float pull_conv = -999;
  if (nconv !=0){

double zconv=0;
double szconv=0;
zconv=getZFromConvPair(g1,g2,IndexMatchedConversionLeadPhoton,IndexMatchedConversionTrailPhoton,conversionsVector,beamSpot);
szconv=getsZFromConvPair(g1,g2,IndexMatchedConversionLeadPhoton,IndexMatchedConversionTrailPhoton,conversionsVector); 

if (szconv != 0) pull_conv = fabs(vtx->position().z()-zconv)/szconv;
else pull_conv = 10.;   

if (pull_conv > 10.) pull_conv = 10.;  
  }

  logsumpt2_=log(sumpt2_in+sumpt2_out);
  ptbal_=ptbal;
  pull_conv_=pull_conv;
  nConv_=nConv;

I xchecked with Pasquale that this is indeed correct:
if (pull_conv > 10.) pull_conv = 10.;

Can't read in flashggPreselectedDiPhotons

For packages downstream from diphoton preselection, this works:

DiPhotonTag=cms.untracked.InputTag('flashggDiPhotons')

But this doesn't:

DiPhotonTag=cms.untracked.InputTag('flashggPreselectedDiPhotons')

Error message below. To-do: figure out why not.

----- Begin Fatal Exception 06-Oct-2014 16:50:58 CEST-----------------------
An exception of category 'ProductNotFound' occurred while
[0] Processing run: 1 lumi: 39 event: 3801
[1] Running path 'p'
[2] Calling event method for module FlashggDiPhotonMVAProducer/'flashggDiPhotonMVA'
Exception Message:
Principal::getByToken: Found zero products matching all criteria
Looking for a container with elements of type: flashgg::DiPhotonCandidate
Looking for module label: flashggPreselectedDiPhotons
Looking for productInstanceName:

Tags: sorted collections

Set up Tag producers so that tags are organized with operator< automatically from each tag producer. Use ordered CMSSW collections if possible; check how it's done for PatCandidates. (Check if this feature simplifies tag sorter logic.)

Tab length

This annoys me regularly, so I am opening an issue, but feel free to close it if I am the only one in this case... One example of how this affect code readability would be [1].

Is there a way to agree on some common setting ? Like tab implemented as 4 spaces ?

There used to be some settings like that in globe [2], but they are file-by-file, would anyone know a way to define them for the full repository ?

Cheers,
Olivier

[1] InnaKucher@e5b6f70#diff-0
[2] https://github.com/h2gglobe/h2gglobe/blob/master/PhotonAnalysis/src/PhotonAnalysis.cc#L6902-L6908

Add electron ID and MVA cuts to tags using electrons

Since PR #152 there are no longer any cuts to electrons in the ElectronProducer. (I kept the code but added an ApplyCuts flag that is set to false by default.) These cuts should be put into the Tag Producers that use electrons instead.

BeamSpotHandle not valid

In this file:

root://eoscms//eos/cms/store/cmst3/user/gpetrucc/miniAOD/v1/GluGluToHToGG_M-125_13TeV-powheg-pythia6_Flat20to50_PAT.root

which appears to contain the beamspot:

reco::BeamSpot "offlineBeamSpot" "" "RECO"

the beam spot handle IsValid() method is false (and if you try to read anything you get a ProductNotFound exception). To-do: figure out why.

method in conversions

MicroAODAlgos/plugins/ZerothVertexSelector.cc

The variable

int method=0;

appears in different places each time initialised to the default value of zero. It should become global to avoid going out of synch in different functions.

Additional gen information

Additional gen-level information should be kept. In particular:

  • all hard process particles should be added to the flashggPrunedGenParticles.
  • MC matching should be added to the photon producer.

Metadata and AAA

Metadata scripts will always look for files on eos unless useAAA=1 is set. This is OK for FWL but in
CMSSW (full framework) by default when encounters a /store/... should use site configuration.

Make eventContent cff for microAOD

I guess that it should go to

MicroAODProducers/python/flashggMIcroAODOutpuCommands_cff.py

or something similar.

Current output in test scripts is:

outputCommands = cms.untracked.vstring("drop *",
"keep *_flashgg*_*_*",
"drop *_flashggVertexMap*_*_*",
"keep *_offlineSlimmedPrimaryVertices_*_*",
"keep *_reducedEgamma_reduced*Clusters_*",
"keep *_reducedEgamma_*PhotonCores_*",
"keep *_slimmedElectrons_*_*",
"keep *_slimmedMuons_*_*",
"keep *_slimmedMETs_*_*",
"keep *_slimmedTaus_*_*",
"keep *_fixedGridRhoAll_*_*"
)

One needs at least to add beamspot and gen info.

EB cut

MicroAODAlgos/plugins/ZerothVertexSelector.cc

Every where (exe L205 L246 L278) :
pho->eta() <1.5 ---> fabs(pho->eta()) <1.5

Use standard PileupJetId recipes/recommendations

Eventually there should be a method for JME-recommended pileup jet id selection, and we should use that instead of a raw MVA value cut. However, currently all we have is a float MVA output, from an MVA training based on antiKt5 jets in Run1. In MiniAOD, in fact, the float output for this is all that is saved for the jets. It's run before the jet is slimmed for miniAOD, and then saved in the miniAOD jets as a userFloat: see https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookMiniAOD#Jets for more on this.

For MicroAOD, we have written a new method to use the DzVertexMap and rerun the MVA w.r.t. a non-standard vertex. In fact, this method is ahead of JME, which as of the last time I checked had not provided any code yet for computing Pu Jet ID on miniAOD. I have been in communication with the JME person responsible for this update and provided him with our code in case it's useful.

As they become available, we should adopt the most standard tools that do the job we need, and we should provide feedback to JME if there is an additional feature they should add that would let us use a more standardized recipe.

MetaData scripts completion

Opening this issue to make sure we don't forget this.

As they are, MetaData scripts need to be completed to:

  • Have ability to resubmit failed jobs.
  • Be more accurate at ensuring reproducibility of task running.
    For the most important jobs we should store tgz with config and libraries and
    set up scram scratch space at run time (a la crab).
  • Compute per-sample PU distribution and generate PU weights.
  • Keep an eye on duty cycle and automatically resubmit stuck jobs to cope
    with eos shortcomings.
  • Define generic batch interface to support on-LSF clusters.
  • Use fwk job report to ensure that full dataset was processed

Embed rechits in flashgg::Photon

Some recHits should be embedded in the flashgg::Photons.

The seed crystal recHit should be the very least. We may also consider to keep the full 5x5.

One potential issue with the 5x5 may be the data duplication in the photon and di-photon collections. It this becomes an issue, we could zero all the rechits except the seed just before copying to the di-photon object.

This is in fact a more general issue, related to the duplication of several variables which we inherit directly from the pat and reco photons.

Some flashgg::Photon methods/data members shadow pat::Photon ones

In particular those related to regression inputs:

https://github.com/cms-analysis/flashgg/blob/master/DataFormats/interface/Photon.h#L25

http://cmslxr.fnal.gov/lxr/source/DataFormats/PatCandidates/interface/Photon.h#0234

Also, in the case where the methods are not explicitely shadowed, we end up generating confusion on which method is being called.

I would suggest to add a "fgg" prefix (or similar) to both the data member name and the getters/setters.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.