dmwm / t0 Goto Github PK

View Code? Open in Web Editor NEW

7.0 25.0 59.0 2.78 MB

License: Apache License 2.0

Shell 5.40% Python 94.60%

t0's Introduction

Welcome to the CMS Tier 0 Repository!

t0's People

Contributors

Stargazers

Watchers

Forkers

dballesteros7 hufnagel samircury ticoann lucacopa jocasall johncasallas ericvaandering johnhcasallasl ebohorqu blallen franzoni deguio alexanderrichards amaltaro xmniu vytjan cronot99 cmst0 watson-ij gkfthddk cpluos andresfelquintero germanfgv khurtado jhonatanamado drkovalskyi mapellidario dvk-test silviodonato ggiraldo-test jamadotest fabferro francescobrivio tocheng jfernan2 mmusich tvami rappoccio jeyserma fabiocos malbouis mpresill ttedeschi gpetruc mdmorris tyjyang sarafiorendi ebrondol davidlange6 saumyaphor4252 cmstier0 anpicci gyunghwan03 kyoon-mit mandrenguyen itytell linarestoine lucalavezzo

t0's Issues

are generic WMCore job splitters usable for Tier0 PromptReco and AlcaSkim ?

Need to take a look at what WMCore offers and if this can be used.

are generic WMCore job splitters usable for Tier0 DQM and Alca Harvesting ?

Need to look at EndOfRun and Periodic jobsplitters in WMCore and decide if they can be used as-is for the Tier0 harvesting.

Since filesets in the Tier0 are now very restrictive and only contain the data that is harvested together, the jobsplitters task for the harvesting is very simple indeed, either all files into one job when the fileset is closed or all files periodically until the fileset is closed, then one last job on all.

migrate InsertSplitLumis and MarkStreamersUsed DAOs

The can be taken as-is from CVS.

T0/src/python/T0/State/Database/Oracle/InsertSplitLumis.py
T0/src/python/T0/State/Database/Oracle/MarkStreamersUsed.py

since they are already WMBS conform and are still needed for repack and express job splitting.

As for the code structure of the T0 SVN module and where we want to put the DAOs, how about

T0/src/python/T0/WMBS/Oracle

Would be uniform with the way things are setup in the WMCore module.

Create Repack WMSpec

We need a WMSpec that will run the repacking. It needs two tasks, first the actual repacking and then a merge step.

Each Repack WMSpec is stream specific. Embedded in the WMSpec is the dataset to trigger path mapping for the given stream. This is passed at runtime to Configuration.DataProcessing and returns a valid repacking configuration. This system is not commissioned yet, so for early testing we can also make up a repacking configuration, store it in the ConfigCache and embed the id in the WMSpec.

The after repacking merge step can not be implemented as a standard WMCORE merge step. This is because of error datasets. If repacker size protections kick in, we need to decide at merge time whether the output goes to the normal dataset or an error datasets. The way we'll implement this is with a custom repack merge job splitting algorithm that passes the normal/error dataset decision to the job. At runtime the job then evaluates this flag and configures one of the day normal/error dataset output modules. Both output module to fileset mappings need to be defined in the WMSpec though.

Requires #2481 and #3096

Create T0.WMBS Create and Destroy modules

These are needed to run Tier0 unit tests that use the database. Might have to be adjusted later depending on how #2196 is resolved.

Repack and Express jobsplitters need to be location aware

Even though the actual scheduling is not using location, I need to load it and set it in the File object, otherwise the pickled job object has no location information, which screws up job submission.

unmerged file cleanup in Tier0

Decide whether to use the standard WMAgent cleanup jobs or a special cleanup component like in the old Tier0 to handle unmerged file cleanup. Tier0 has some special workflows that have unmerged input and unmerged output, have to check if the WMAgent cleanup support can handle this.

send notifications to P5 after completion of repack or express

The Tier0 needs to send a notification about the processing status of streamer files to P5, because the StorageManager uses this to decide when it's safe to delete a streamer file.

Likely will be part of the Tier0Auditor with a check on when a streamer is processed and calling an external script to inject a messages into the P5->CERN transfer system. Might need a new status field in the streamer table.

create a setup.py file for T0

We need a setup.py file for the T0 to perform build and installation. At the moment just collecting all the python files and putting them into lib is sufficient, there is nothing else to do yet.

Separation into core, datasvc and monitoring will come later.

transaction handling for data feed in Tier0Feeder

Assume patch from #1960.

The Tier0Feeder.FeedStreamers DAO contains two separate queries since AFAICT there is no way to handle a multi-table insert and an update in a third table as a single SQL statement.

To make the operation atomic, I tried wrapping them into a transaction and protecting it with a rollback, but that failed. The second query did not see modification made by the first (which it needs to).

For now the calls are committed separately, but that exposes us to a potential race condition.

Find a way to handle this better (PL/SQL or better transaction handling or ...)

Split out PromptReco configuration from run/stream configuration

At the moment the RunConfig code that configures run/stream settings also does configure everything happening with PromptReco and later.

This should be factored out into settings needed immediately for run/stream handling (repacking and RAW file handling) and settings only required later when we release PromptReco for the run.

Include workflow name in the LSF JOB_NAME

A feature we had since PA times is that the job submission includes -J <WF_TYPE>-<RUN_NUMBER>-<JOB_INDEX> at least.

We can survive without, but it will be great (and it seems not much effort) if we can make the same type of submission and identify what we're talking about when we look at LSF queue.

The piece of LsfPlugin that does it is :

{{{
# // Submit LSF job
# //
command = 'bsub'
command += ' -q %s' % self.queue

                if self.resourceReq != None:
                    command += ' -R "%s"' % self.resourceReq

                command += ' -g %s' % self.jobGroup
                command += ' -J %s' % self.scriptFile

}}}

We just need to find where to find the WF name, maybe we would have to pass along from the previous script.

{{{
206643634 cmsprod PEND cmst0 vocms13 - WMAgentJob Jan 27 10:37
206646633 cmsprod PEND cmst0 vocms13 - WMAgentJob Jan 27 11:04
206654938 cmsprod PEND cmst0 vocms13 - /data/cmsprod/wmagent/0.8.23/apps/wmcore/etc/submit.sh Jan 27 11:46
206656778 cmsprod PEND cmst0 vocms13 - /data/cmsprod/wmagent/0.8.23/apps/wmcore/etc/submit.sh Jan 27 11:50
}}}

If I'm unable to do it, you can assign this to myself.

release run for promptreco

Add run release for promptreco to Tier0Auditor. Create all needed PromptReco WMSpecs and subscriptions.

Tied in with #2723 as the configuration of all PromptReco settings should be moved from the Tier0Feeder to the Tier0Auditor at the same time.

Requires #1961

Create Express WMSpec

We need a WMSpec that will run Express.

It will look like a mix of other WMSpecs. The express processing part is very similar to the repacking, except the configurations are more complex. Secondary steps are merges for FEVT and DQM and alcaskims for ALCRECO. Both of them look like their bulk counterparts (standard WMCore merges and ReReco AlcaSkim) with the addition of latency triggers and enforcing lumi order in the job splitting.

Also needs to include DQM and Alca Harvesting.

Requires #2481, #1968, #3097 and #3098

Some prompt skimming T0AST queries that could be DAS'ed up

Oli is concerned about these sorts of things -- a query I use (and then he asked for the inverse) is this:

select distinct blockid,blockstatus,runnum,hoursold,pds,runstatus,lastupdate,site,skimname,procver from (select t1skim_config.run_id as runnum,((SELECT (sysdate - to_date('01-JAN-1970','DD-MON-YYYY')) * (86400) AS dt FROM dual)-run.start_time)/3600 as hoursold, substr(primary_dataset.name,1,20) as pds, block.id as blockid,substr(block_status.status,1,10) as blockstatus,substr(run_status.status,1,30) as runstatus,((SELECT (sysdate - to_date('01-JAN-1970','DD-MON-YYYY')) * (86400) AS dt FROM dual)-run.last_updated)/3600 as lastupdate,substr(storage_node.name,1,20) as site, substr(t1skim_config.skim_name,1,20) as skimname, substr(t1skim_config.proc_version,1,20) as procver from block inner join block_run_assoc ON
block.id=block_run_assoc.block_id inner join t1skim_config ON
t1skim_config.run_id=block_run_assoc.run_id inner join primary_dataset ON
primary_dataset.id=t1skim_config.primary_dataset_id inner join dataset_path ON
dataset_path.primary_dataset=t1skim_config.primary_dataset_id inner join block_status ON
block_status.id=block.status inner join run ON
run.run_id=t1skim_config.run_id inner join run_status ON
run.run_status=run_status.id inner join phedex_subscription ON
phedex_subscription.run_id = t1skim_config.run_id inner join storage_node ON
phedex_subscription.node_id=storage_node.id WHERE
phedex_subscription.primary_dataset_id=t1skim_config.primary_dataset_id and storage_node.name<>'T0_CH_CERN_MSS' and dataset_path.id=block.dataset_path_id and block.status < 6 order by t1skim_config.run_id,primary_dataset.name,block.status) order by runnum,site,skimname;

This gives everybody waiting to be skimmed that should be. Its probably horribly unoptimized but gives what I want formatted sufficiently usefully.

The inverse of this was just flipping hte status from <6 to 6. Oli wanted to see the list of runs that should have been skimmed.

select distinct blockid,blockstatus,runnum,hoursold,pds,runstatus,lastupdate,site,skimname,procver from (select t1skim_config.run_id as runnum,((SELECT (sysdate - to_date('01-JAN-1970','DD-MON-YYYY')) * (86400) AS dt FROM dual)-run.start_time)/3600 as hoursold, substr(primary_dataset.name,1,20) as pds, block.id as blockid,substr(block_status.status,1,10) as blockstatus,substr(run_status.status,1,30) as runstatus,((SELECT (sysdate - to_date('01-JAN-1970','DD-MON-YYYY')) * (86400) AS dt FROM dual)-run.last_updated)/3600 as lastupdate,substr(storage_node.name,1,20) as site, substr(t1skim_config.skim_name,1,20) as skimname, substr(t1skim_config.proc_version,1,20) as procver from block inner join block_run_assoc ON
block.id=block_run_assoc.block_id inner join t1skim_config ON
t1skim_config.run_id=block_run_assoc.run_id inner join primary_dataset ON
primary_dataset.id=t1skim_config.primary_dataset_id inner join dataset_path ON
dataset_path.primary_dataset=t1skim_config.primary_dataset_id inner join block_status ON
block_status.id=block.status inner join run ON
run.run_id=t1skim_config.run_id inner join run_status ON
run.run_status=run_status.id inner join phedex_subscription ON
phedex_subscription.run_id = t1skim_config.run_id inner join storage_node ON
phedex_subscription.node_id=storage_node.id WHERE
phedex_subscription.primary_dataset_id=t1skim_config.primary_dataset_id and storage_node.name<>'T0_CH_CERN_MSS' and dataset_path.id=block.dataset_path_id and block.status >= 6 order by t1skim_config.run_id,primary_dataset.name,block.status) order by runnum desc,site,skimname;

migrate repack job splitting to SVN

CVS:

T0/src/python/T0/JobSplitting/Repack.py
T0/src/python/T0/State/Database/Oracle/ListFilesForRepack.py

Both job splitting algorithm and main data discovery DAO need to be migrated. Some changes need to be made as well.

The job splitting will change to be based on an input fileset and available files. Data discovery is still going to be non-standard though because it filters on closed lumis, ie. just because a streamer file is available does not mean it can be assigned to a job. Another option is to apply that filter in the data feeders populating the repack input filesets. Then we could get by with a standard data discovery based on available files here.

Data discovery also handles run closeout currently, but we might want to move that to a separate query/DAO. It's somewhat easy to return run status in the current query, but the new one it would be a stretch. Better to break up the queries.

Job splitting will stay more or less the same, but we have to rework where the split parameters are defined. Development can get by with reasonable defaults, but eventually they need to come from the Tier0 configuration.

fix repack and express job splitter

The repack and express job splitter were broken due to some schema changes introduced with the implementation of RunConfig and Tier0Feeder.

RunConfig related code and daos

Add RunConfig related code and daos, needed by #1960

complete the repack/express job splitter unit tests

Current unit tests cover all the basic functionality, except for one thing, the creation of active split lumi records.

Some placeholder code is there commented out, to complete this I need to finish adding database constraints and writing DAOs to populate RunConfig releated database records.

Optimize express data discovery query

Do not return filesize for express streamers because it's not needed for the express jobsplitting.

streamer file cleanup in Tier0

This is a place holder ticket. At the moment, streamer files are stored on a separate castor pool and cleanup is taken care of by a 30 day garbage collection policy. So nothing to do for us.

In the long run streamers will go to EOS. Does EOS support garbage collection or special garbage collection per directory ? If not, we would need to handle the cleanup ourself, either from within the Tier0 or running externally (manual, cron jobs etc).

support switch of Tier0 head node

Restarting from scratch when loosing a head node is not an option for the Tier0. We need to be able install a new Tier0/WMAgent on a different machine, point it to the Oracle instance of the old system and resync from there. For the most part that should be possible, but the WMAgent design stores some files on the local filesystem that are required (WMSpecs etc).

For the WMSpecs I imagine looping over all Repack, Express and PromptReco workflows that are not fully finished and recreate the WMSpecs etc from information in RunConfig.

If there are job directories etc we rely on, we need to be able to retry the jobs so they can succeed.

handle data feeding from Repack to PromptReco

When PromptReco is released, we need to know the RAW merged output fileset based on run and dataset. That information needs to be stored in T0AST in an association table during the Repack WMSpec creation. Either we pass in an id/name to the Repack WMSpec creation which is then used internally for that fileset. Or after the Repack WMSpec creation we have a way to retrieve the RAW merged output fileset id from the spec.

write new job splitter for repack merges

Base this on code in the current Tier0. Regular merge, except we have to deal with split lumis and wait for them to be fully available.

T0 unit tests : repack job splitting

Currently the repack job splitting has a unit test, but that test only checks that the class can be instantiated. Once the problem of setting up a Oracle test instance and using it in unit tests has been solved, we want to expand on that.

There are already placeholders for the basic checks we want to run in the unit test.

Had a chat with Simon, who suggested to write the unit tests for just the splitting algorithm, without database. It's non-trivial because the repack jobsplitting uses custom DAOs. Not sure it's worth it, given that I have to figure out a way to run with the database anyways.

some tweaks needed to get express jobs to succeed at runtime

All the major pieces are in place, a few minor things still need to be tweaked to get express jobs to run successfully.

run release for express processing

Add run release for express to Tier0Auditor (port PopConLogDB check over from old Tier0). Also add the check for express release to the express job splitting data discovery query so that no express jobs will be created until the run is released for express.

Requires #1961

migrate express job splitting to SVN

CVS:

T0/src/python/T0/JobSplitting/Express.py
T0/src/python/T0/State/Database/Oracle/ListFilesForExpress.py

Both job splitting algorithm and main data discovery DAO need to be migrated. Some changes need to be made as well.

new component Tier0Feeder

When the Tier0 detects new data, it will have to read it's configuration file and setup RunConfig in T0AST.Then it has to create WMSpecs for repacking and express and setup the input filesets for these. It then has to employ data feeders to populate these filesets. With the WMSpecs and input filesets work requests can go into an internal (shallow) Tier0 workqueue.

For he data feeders to work, they will use association tables between data type (run/stream) and input filesets. These association tables can be populated in the component, which means the data feeder can be somewhat generic based on the content of these association tables.

All the RunConfig related database code should be reviewed and possible simplified and improved in the process if porting it over. It should also be changed to DAOs to be consistent with other database access code in WMBS/Tier0.

standalone deployment of WMAgent based Tier0

Relies on #2149 and #2967. In addition needs to support setting up the extra schema and extra config options for the Tier0.

new component Tier0Auditor

The new component Tier0Auditor will handle all the processing checks and advancements for data/work that is inside the Tier0.

Closing runs and lumis

Consistency checks against the StorageManager and T0AST databases to determine whether we have complete data for lumis and runs. We should forget about the currently used closeout code and only port the still to be deployed new system based on the EoR and EoLS records from the StorageManager.

Check completion of split processed streamer lumis

Streamer files can be processed in units smaller than a lumi section (normal for express, error overflow case for repack). Special accounting is done to track when all streamers for a lumi are fully processed.

Run closeout advancement checks

Checks conditions for advancing run state and changes the run state if conditions are met.

Release runs for Express

Determine if express processing for run can start (PopConLogDB check).

Release runs for PromptReco

Determine if PromptReco for run can start (simple timeout).

configuring scram versions for Tier0 workflows

At the moment the scram versions are hardcoded in the specs. I could add them to the Tier0 configuration, but is this really needed ? Changing scram versions is a big deal, maybe it's enough to sync that switch with a switch of the deployed Tier0 code.

If we add them to the Tier0 configuration, at what granularity ? Adding a scram version for every cmssw version used for every stream or dataset config is the most flexible, but might be overkill...

proper unit tests for edge cases in run/lumi closeout

This is a placeholder ticket for more extensive unit tests covering lumi and run closeout. What is currently in place in the Tier0Feeder unit test covers the normal conditions, using a run picked at random and checking that everything works.

Problem is, what we are really interested in are the cases where conditions are not normal. Given that such conditions are temporary and fix themself or are fixed manually later in the StorageManager database, there is no input to test these cases though.

Will bring up the issue with the StorageManager team. We might have to setup some special StorageManager database records (either fake runs/lumis in the standard SM DB or another SM DB altogether) to be able to test everything we want.

get away from storing Tier0 configuration in flat python file

The Tier0 configuration (OfflineConfiguration.py) is currently stored in a flat python file. We want to get away from that eventually.

Interaction with the configuration is already factored out and only done in the two components which call an imported method to read the configuration.

Whatever we replace this with should preferably not change the return format of the config loading method, just replace the backend storage.

handle lumi closing in Tier0Feeder

Only closed lumis are processed by the Tier0. The old system had the filter in the repack/express data discovery, but in the new system this would leave filesets/subscriptions with input data that is never processed, something we want to avoid.

New system will only feed closed lumis to the repack/express input filesets by putting an inner join on closed lumis into the FeedStreamers query. Therefore the code handling lumi closing should also run in the Tier0Feeder just before calling the FeedStreamers DAO.

Doing it in the Tier0Auditor as originally intended would cause more latency for no good reason and also would not be conceptually clean anymore as lumi closing is now essential to the data feeding functionality.

run, run/stream or run/dataset closeout

The old Tier0 had a concept of run closeout, progressing through various run states that triggered various options.

The new system does not have the equivalent anymore because everything is tracked in terms of run/stream or run/dataset units. Even for those, once we close the top level filesets in a subscription, closeout should automatically trickle down the processing chain, with lower fileset closing as well as processing finishes.

Add top level fileset closing for run/stream (EoR) and run/dataset (repacking ends) to Tier0Auditor.

Tier -0 deploy/manage and RPMs needed for patch review

We will need the Tier 0 dependency system available as an RPM via the normal deploy/manage system to enable reviewers to do complete tests of patches for the Tier 0

Review/Improve DBCore for (bulk) Oracle operations

Tier0 will make heavy use of the Oracle DBCore interface. The methods should be reviewed for correctnes and possible improvements, especially for bulk operations. Some of this will happen gradually as we start devloping and testing a WMAgent based Tier0, some of it might require some campaign and going through the code.

autoconfigure Tier0 configuration

At the moment the deploy/manage scripts do not fully configure the Tier0, there are still some tweaks needed. The reason is that we need to add component settings and database settings that Tier0 specific and therefore cannot be known to the generic WMAgent config template and config modification script.

I do not want to add Tier0 specifics to the agent templates or scripts and I also do not want to have a fully independent template for the Tier0 (would duplicate a lot of settings and code).

What I want to do is to use the agent template, configure the agent settings and then in a second step call a Tier0 modification script that configures the Tier0 specific settings.

Add docs generation via sphinx for T0

Code audit: T0Mon

make sure Tier0 processing is complete

Output of the Tier0 needs to be as complete as possible, either because otherwise we would have unrecoverable data loss (repack), we would get bad conditions for the reconstruction (express) or we would provide incomplete statistics for analysis (promptreco).

Due to the interconnected nature of the Tier0 workflows and/or the special latency requirements, the usual procedure to handle incomplete workflows via ACDC and retries is not useful. Definitely not for repack or express and even for promptreco it could cause problems with the integration with promptskimming.

What exactly will be needed to implement this depends on #3114

Tier0 CHEP2012

Tier0 presentation / paper CHEP 2012 abstract

internal DMWM deadline 23/09
conference abstract submission deadline 30/09

T0 unit tests : express job splitting

Currently the express job splitting has a unit test, but that test only checks that the class can be instantiated. Once the problem of setting up a Oracle test instance and using it in unit tests has been solved, we want to expand on that.

There are already placeholders for the basic checks we want to run in the unit test.

Had a chat with Simon, who suggested to write the unit tests for just the splitting algorithm, without database. It's non-trivial because the express jobsplitting uses custom DAOs. Not sure it's worth it, given that I have to figure out a way to run with the database anyways.

Modify InsertSplitLumis DAO to work in new system

The InsertSplitLumis DAO currently expects run/lumi/stream as bind variables. In the new system with the way repack and express job splitting work, this changes to subscription/lumi.

new job splitter for express DQM and Alca Harvesting

Get rid of the use of Periodic and EndOfRun splitting algorithms and especially the use of parent/child filesets and triggering on fileset closing for EndOfRun. Do it all in one job splitter, with one input fileset and one subscription. Needed features:

trigger job release at end of run (where end of run is defined by custom query against T0 database)
optionally support periodic jobs of whatever is available
make sure periodic jobs don't overlap
make sure new periodic job only goes out if there is new data
make sure the last periodic job and the end of run job don't overlap

Create PromptReco WMSpec

We'll need a WMSpec for PromptReco and everything after (DQM, RECO and AOD merges, AlcaSkimming, ALCARECO merges, DQM harvesting).

To first order we should be able to use the ReReco WMSpec from WMCore, this is a placeholder ticket to make sure it really supports all we need from it.

Requires #3097 and #3098

complete the T0AST schema

Complete the schema, needed for #1960

write new job splitter for express merges

Based on CVS:

T0/src/python/T0/JobSplitting/ExpressMerge.py
T0/src/python/T0/State/Database/Oracle/ListFilesForExpressMerge.py

The old code should not be used as-is, because in the new system we are dealing with subscriptions for only a single dataset, not the one global expressmerge subscriptions.

We still cannot use the standard merge subscription because for express merges we need two special features:

latency based merge triggers (based on data arrival or lumi closing)
enforcing lumi order without holes

These features are already supported in the current code and the code fragment dealing with them should be usable.

PS: We do not need to support splitting express data into primary datasets during the express merge. That is a feature which is supported by the current Tier0, but which has never been used.

something wrong with repack/express workflows created by Tier0Feeder

Assumes patch from #1960. Output of JobCreator:

2011-12-22 19:10:11,135:INFO:JobCreatorPoller:Beginning JobCreator.pollSubscriptions() cycle.
2011-12-22 19:10:11,155:ERROR:JobCreatorPoller:WMWorkloadURL ./TestWorkload/WMSandbox/WMWorkload.pkl is empty
2011-12-22 19:10:11,156:ERROR:JobCreatorPoller:Have no task for workflow 1
Aborting Subscription 1
2011-12-22 19:10:11,163:ERROR:JobCreatorPoller:WMWorkloadURL ./TestWorkload/WMSandbox/WMWorkload.pkl is empty
2011-12-22 19:10:11,163:ERROR:JobCreatorPoller:Have no task for workflow 3
Aborting Subscription 3