Welcome to the CMS Tier 0 Repository!
dmwm / t0 Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Welcome to the CMS Tier 0 Repository!
Need to take a look at what WMCore offers and if this can be used.
Need to look at EndOfRun and Periodic jobsplitters in WMCore and decide if they can be used as-is for the Tier0 harvesting.
Since filesets in the Tier0 are now very restrictive and only contain the data that is harvested together, the jobsplitters task for the harvesting is very simple indeed, either all files into one job when the fileset is closed or all files periodically until the fileset is closed, then one last job on all.
The can be taken as-is from CVS.
T0/src/python/T0/State/Database/Oracle/InsertSplitLumis.py
T0/src/python/T0/State/Database/Oracle/MarkStreamersUsed.py
since they are already WMBS conform and are still needed for repack and express job splitting.
As for the code structure of the T0 SVN module and where we want to put the DAOs, how about
T0/src/python/T0/WMBS/Oracle
Would be uniform with the way things are setup in the WMCore module.
We need a WMSpec that will run the repacking. It needs two tasks, first the actual repacking and then a merge step.
Each Repack WMSpec is stream specific. Embedded in the WMSpec is the dataset to trigger path mapping for the given stream. This is passed at runtime to Configuration.DataProcessing and returns a valid repacking configuration. This system is not commissioned yet, so for early testing we can also make up a repacking configuration, store it in the ConfigCache and embed the id in the WMSpec.
The after repacking merge step can not be implemented as a standard WMCORE merge step. This is because of error datasets. If repacker size protections kick in, we need to decide at merge time whether the output goes to the normal dataset or an error datasets. The way we'll implement this is with a custom repack merge job splitting algorithm that passes the normal/error dataset decision to the job. At runtime the job then evaluates this flag and configures one of the day normal/error dataset output modules. Both output module to fileset mappings need to be defined in the WMSpec though.
These are needed to run Tier0 unit tests that use the database. Might have to be adjusted later depending on how #2196 is resolved.
Even though the actual scheduling is not using location, I need to load it and set it in the File object, otherwise the pickled job object has no location information, which screws up job submission.
Decide whether to use the standard WMAgent cleanup jobs or a special cleanup component like in the old Tier0 to handle unmerged file cleanup. Tier0 has some special workflows that have unmerged input and unmerged output, have to check if the WMAgent cleanup support can handle this.
The Tier0 needs to send a notification about the processing status of streamer files to P5, because the StorageManager uses this to decide when it's safe to delete a streamer file.
Likely will be part of the Tier0Auditor with a check on when a streamer is processed and calling an external script to inject a messages into the P5->CERN transfer system. Might need a new status field in the streamer table.
We need a setup.py file for the T0 to perform build and installation. At the moment just collecting all the python files and putting them into lib is sufficient, there is nothing else to do yet.
Separation into core, datasvc and monitoring will come later.
Assume patch from #1960.
The Tier0Feeder.FeedStreamers DAO contains two separate queries since AFAICT there is no way to handle a multi-table insert and an update in a third table as a single SQL statement.
To make the operation atomic, I tried wrapping them into a transaction and protecting it with a rollback, but that failed. The second query did not see modification made by the first (which it needs to).
For now the calls are committed separately, but that exposes us to a potential race condition.
Find a way to handle this better (PL/SQL or better transaction handling or ...)
At the moment the RunConfig code that configures run/stream settings also does configure everything happening with PromptReco and later.
This should be factored out into settings needed immediately for run/stream handling (repacking and RAW file handling) and settings only required later when we release PromptReco for the run.
A feature we had since PA times is that the job submission includes -J <WF_TYPE>-<RUN_NUMBER>-<JOB_INDEX> at least.
We can survive without, but it will be great (and it seems not much effort) if we can make the same type of submission and identify what we're talking about when we look at LSF queue.
The piece of LsfPlugin that does it is :
{{{
# // Submit LSF job
# //
command = 'bsub'
command += ' -q %s' % self.queue
if self.resourceReq != None:
command += ' -R "%s"' % self.resourceReq
command += ' -g %s' % self.jobGroup
command += ' -J %s' % self.scriptFile
}}}
We just need to find where to find the WF name, maybe we would have to pass along from the previous script.
{{{
206643634 cmsprod PEND cmst0 vocms13 - WMAgentJob Jan 27 10:37
206646633 cmsprod PEND cmst0 vocms13 - WMAgentJob Jan 27 11:04
206654938 cmsprod PEND cmst0 vocms13 - /data/cmsprod/wmagent/0.8.23/apps/wmcore/etc/submit.sh Jan 27 11:46
206656778 cmsprod PEND cmst0 vocms13 - /data/cmsprod/wmagent/0.8.23/apps/wmcore/etc/submit.sh Jan 27 11:50
}}}
If I'm unable to do it, you can assign this to myself.
We need a WMSpec that will run Express.
It will look like a mix of other WMSpecs. The express processing part is very similar to the repacking, except the configurations are more complex. Secondary steps are merges for FEVT and DQM and alcaskims for ALCRECO. Both of them look like their bulk counterparts (standard WMCore merges and ReReco AlcaSkim) with the addition of latency triggers and enforcing lumi order in the job splitting.
Also needs to include DQM and Alca Harvesting.
Oli is concerned about these sorts of things -- a query I use (and then he asked for the inverse) is this:
select distinct blockid,blockstatus,runnum,hoursold,pds,runstatus,lastupdate,site,skimname,procver from (select t1skim_config.run_id as runnum,((SELECT (sysdate - to_date('01-JAN-1970','DD-MON-YYYY')) * (86400) AS dt FROM dual)-run.start_time)/3600 as hoursold, substr(primary_dataset.name,1,20) as pds, block.id as blockid,substr(block_status.status,1,10) as blockstatus,substr(run_status.status,1,30) as runstatus,((SELECT (sysdate - to_date('01-JAN-1970','DD-MON-YYYY')) * (86400) AS dt FROM dual)-run.last_updated)/3600 as lastupdate,substr(storage_node.name,1,20) as site, substr(t1skim_config.skim_name,1,20) as skimname, substr(t1skim_config.proc_version,1,20) as procver from block inner join block_run_assoc ON
block.id=block_run_assoc.block_id inner join t1skim_config ON
t1skim_config.run_id=block_run_assoc.run_id inner join primary_dataset ON
primary_dataset.id=t1skim_config.primary_dataset_id inner join dataset_path ON
dataset_path.primary_dataset=t1skim_config.primary_dataset_id inner join block_status ON
block_status.id=block.status inner join run ON
run.run_id=t1skim_config.run_id inner join run_status ON
run.run_status=run_status.id inner join phedex_subscription ON
phedex_subscription.run_id = t1skim_config.run_id inner join storage_node ON
phedex_subscription.node_id=storage_node.id WHERE
phedex_subscription.primary_dataset_id=t1skim_config.primary_dataset_id and storage_node.name<>'T0_CH_CERN_MSS' and dataset_path.id=block.dataset_path_id and block.status < 6 order by t1skim_config.run_id,primary_dataset.name,block.status) order by runnum,site,skimname;
This gives everybody waiting to be skimmed that should be. Its probably horribly unoptimized but gives what I want formatted sufficiently usefully.
The inverse of this was just flipping hte status from <6 to 6. Oli wanted to see the list of runs that should have been skimmed.
select distinct blockid,blockstatus,runnum,hoursold,pds,runstatus,lastupdate,site,skimname,procver from (select t1skim_config.run_id as runnum,((SELECT (sysdate - to_date('01-JAN-1970','DD-MON-YYYY')) * (86400) AS dt FROM dual)-run.start_time)/3600 as hoursold, substr(primary_dataset.name,1,20) as pds, block.id as blockid,substr(block_status.status,1,10) as blockstatus,substr(run_status.status,1,30) as runstatus,((SELECT (sysdate - to_date('01-JAN-1970','DD-MON-YYYY')) * (86400) AS dt FROM dual)-run.last_updated)/3600 as lastupdate,substr(storage_node.name,1,20) as site, substr(t1skim_config.skim_name,1,20) as skimname, substr(t1skim_config.proc_version,1,20) as procver from block inner join block_run_assoc ON
block.id=block_run_assoc.block_id inner join t1skim_config ON
t1skim_config.run_id=block_run_assoc.run_id inner join primary_dataset ON
primary_dataset.id=t1skim_config.primary_dataset_id inner join dataset_path ON
dataset_path.primary_dataset=t1skim_config.primary_dataset_id inner join block_status ON
block_status.id=block.status inner join run ON
run.run_id=t1skim_config.run_id inner join run_status ON
run.run_status=run_status.id inner join phedex_subscription ON
phedex_subscription.run_id = t1skim_config.run_id inner join storage_node ON
phedex_subscription.node_id=storage_node.id WHERE
phedex_subscription.primary_dataset_id=t1skim_config.primary_dataset_id and storage_node.name<>'T0_CH_CERN_MSS' and dataset_path.id=block.dataset_path_id and block.status >= 6 order by t1skim_config.run_id,primary_dataset.name,block.status) order by runnum desc,site,skimname;
CVS:
T0/src/python/T0/JobSplitting/Repack.py
T0/src/python/T0/State/Database/Oracle/ListFilesForRepack.py
Both job splitting algorithm and main data discovery DAO need to be migrated. Some changes need to be made as well.
The job splitting will change to be based on an input fileset and available files. Data discovery is still going to be non-standard though because it filters on closed lumis, ie. just because a streamer file is available does not mean it can be assigned to a job. Another option is to apply that filter in the data feeders populating the repack input filesets. Then we could get by with a standard data discovery based on available files here.
Data discovery also handles run closeout currently, but we might want to move that to a separate query/DAO. It's somewhat easy to return run status in the current query, but the new one it would be a stretch. Better to break up the queries.
Job splitting will stay more or less the same, but we have to rework where the split parameters are defined. Development can get by with reasonable defaults, but eventually they need to come from the Tier0 configuration.
The repack and express job splitter were broken due to some schema changes introduced with the implementation of RunConfig and Tier0Feeder.
Add RunConfig related code and daos, needed by #1960
Current unit tests cover all the basic functionality, except for one thing, the creation of active split lumi records.
Some placeholder code is there commented out, to complete this I need to finish adding database constraints and writing DAOs to populate RunConfig releated database records.
Do not return filesize for express streamers because it's not needed for the express jobsplitting.
This is a place holder ticket. At the moment, streamer files are stored on a separate castor pool and cleanup is taken care of by a 30 day garbage collection policy. So nothing to do for us.
In the long run streamers will go to EOS. Does EOS support garbage collection or special garbage collection per directory ? If not, we would need to handle the cleanup ourself, either from within the Tier0 or running externally (manual, cron jobs etc).
Restarting from scratch when loosing a head node is not an option for the Tier0. We need to be able install a new Tier0/WMAgent on a different machine, point it to the Oracle instance of the old system and resync from there. For the most part that should be possible, but the WMAgent design stores some files on the local filesystem that are required (WMSpecs etc).
For the WMSpecs I imagine looping over all Repack, Express and PromptReco workflows that are not fully finished and recreate the WMSpecs etc from information in RunConfig.
If there are job directories etc we rely on, we need to be able to retry the jobs so they can succeed.
When PromptReco is released, we need to know the RAW merged output fileset based on run and dataset. That information needs to be stored in T0AST in an association table during the Repack WMSpec creation. Either we pass in an id/name to the Repack WMSpec creation which is then used internally for that fileset. Or after the Repack WMSpec creation we have a way to retrieve the RAW merged output fileset id from the spec.
Base this on code in the current Tier0. Regular merge, except we have to deal with split lumis and wait for them to be fully available.
Currently the repack job splitting has a unit test, but that test only checks that the class can be instantiated. Once the problem of setting up a Oracle test instance and using it in unit tests has been solved, we want to expand on that.
There are already placeholders for the basic checks we want to run in the unit test.
Had a chat with Simon, who suggested to write the unit tests for just the splitting algorithm, without database. It's non-trivial because the repack jobsplitting uses custom DAOs. Not sure it's worth it, given that I have to figure out a way to run with the database anyways.
All the major pieces are in place, a few minor things still need to be tweaked to get express jobs to run successfully.
Add run release for express to Tier0Auditor (port PopConLogDB check over from old Tier0). Also add the check for express release to the express job splitting data discovery query so that no express jobs will be created until the run is released for express.
Requires #1961
CVS:
T0/src/python/T0/JobSplitting/Express.py
T0/src/python/T0/State/Database/Oracle/ListFilesForExpress.py
Both job splitting algorithm and main data discovery DAO need to be migrated. Some changes need to be made as well.
The job splitting will change to be based on an input fileset and available files. Data discovery is still going to be non-standard though because it filters on closed lumis, ie. just because a streamer file is available does not mean it can be assigned to a job. Another option is to apply that filter in the data feeders populating the repack input filesets. Then we could get by with a standard data discovery based on available files here.
Job splitting will stay more or less the same, but we have to rework where the split parameters are defined. Development can get by with reasonable defaults, but eventually they need to come from the Tier0 configuration.
When the Tier0 detects new data, it will have to read it's configuration file and setup RunConfig in T0AST.Then it has to create WMSpecs for repacking and express and setup the input filesets for these. It then has to employ data feeders to populate these filesets. With the WMSpecs and input filesets work requests can go into an internal (shallow) Tier0 workqueue.
For he data feeders to work, they will use association tables between data type (run/stream) and input filesets. These association tables can be populated in the component, which means the data feeder can be somewhat generic based on the content of these association tables.
All the RunConfig related database code should be reviewed and possible simplified and improved in the process if porting it over. It should also be changed to DAOs to be consistent with other database access code in WMBS/Tier0.
Relies on #2149 and #2967. In addition needs to support setting up the extra schema and extra config options for the Tier0.
The new component Tier0Auditor will handle all the processing checks and advancements for data/work that is inside the Tier0.
Consistency checks against the StorageManager and T0AST databases to determine whether we have complete data for lumis and runs. We should forget about the currently used closeout code and only port the still to be deployed new system based on the EoR and EoLS records from the StorageManager.
Streamer files can be processed in units smaller than a lumi section (normal for express, error overflow case for repack). Special accounting is done to track when all streamers for a lumi are fully processed.
Checks conditions for advancing run state and changes the run state if conditions are met.
Determine if express processing for run can start (PopConLogDB check).
Determine if PromptReco for run can start (simple timeout).
At the moment the scram versions are hardcoded in the specs. I could add them to the Tier0 configuration, but is this really needed ? Changing scram versions is a big deal, maybe it's enough to sync that switch with a switch of the deployed Tier0 code.
If we add them to the Tier0 configuration, at what granularity ? Adding a scram version for every cmssw version used for every stream or dataset config is the most flexible, but might be overkill...
This is a placeholder ticket for more extensive unit tests covering lumi and run closeout. What is currently in place in the Tier0Feeder unit test covers the normal conditions, using a run picked at random and checking that everything works.
Problem is, what we are really interested in are the cases where conditions are not normal. Given that such conditions are temporary and fix themself or are fixed manually later in the StorageManager database, there is no input to test these cases though.
Will bring up the issue with the StorageManager team. We might have to setup some special StorageManager database records (either fake runs/lumis in the standard SM DB or another SM DB altogether) to be able to test everything we want.
The Tier0 configuration (OfflineConfiguration.py) is currently stored in a flat python file. We want to get away from that eventually.
Interaction with the configuration is already factored out and only done in the two components which call an imported method to read the configuration.
Whatever we replace this with should preferably not change the return format of the config loading method, just replace the backend storage.
Only closed lumis are processed by the Tier0. The old system had the filter in the repack/express data discovery, but in the new system this would leave filesets/subscriptions with input data that is never processed, something we want to avoid.
New system will only feed closed lumis to the repack/express input filesets by putting an inner join on closed lumis into the FeedStreamers query. Therefore the code handling lumi closing should also run in the Tier0Feeder just before calling the FeedStreamers DAO.
Doing it in the Tier0Auditor as originally intended would cause more latency for no good reason and also would not be conceptually clean anymore as lumi closing is now essential to the data feeding functionality.
The old Tier0 had a concept of run closeout, progressing through various run states that triggered various options.
The new system does not have the equivalent anymore because everything is tracked in terms of run/stream or run/dataset units. Even for those, once we close the top level filesets in a subscription, closeout should automatically trickle down the processing chain, with lower fileset closing as well as processing finishes.
Add top level fileset closing for run/stream (EoR) and run/dataset (repacking ends) to Tier0Auditor.
We will need the Tier 0 dependency system available as an RPM via the normal deploy/manage system to enable reviewers to do complete tests of patches for the Tier 0
Tier0 will make heavy use of the Oracle DBCore interface. The methods should be reviewed for correctnes and possible improvements, especially for bulk operations. Some of this will happen gradually as we start devloping and testing a WMAgent based Tier0, some of it might require some campaign and going through the code.
At the moment the deploy/manage scripts do not fully configure the Tier0, there are still some tweaks needed. The reason is that we need to add component settings and database settings that Tier0 specific and therefore cannot be known to the generic WMAgent config template and config modification script.
I do not want to add Tier0 specifics to the agent templates or scripts and I also do not want to have a fully independent template for the Tier0 (would duplicate a lot of settings and code).
What I want to do is to use the agent template, configure the agent settings and then in a second step call a Tier0 modification script that configures the Tier0 specific settings.
Output of the Tier0 needs to be as complete as possible, either because otherwise we would have unrecoverable data loss (repack), we would get bad conditions for the reconstruction (express) or we would provide incomplete statistics for analysis (promptreco).
Due to the interconnected nature of the Tier0 workflows and/or the special latency requirements, the usual procedure to handle incomplete workflows via ACDC and retries is not useful. Definitely not for repack or express and even for promptreco it could cause problems with the integration with promptskimming.
What exactly will be needed to implement this depends on #3114
Tier0 presentation / paper CHEP 2012 abstract
Currently the express job splitting has a unit test, but that test only checks that the class can be instantiated. Once the problem of setting up a Oracle test instance and using it in unit tests has been solved, we want to expand on that.
There are already placeholders for the basic checks we want to run in the unit test.
Had a chat with Simon, who suggested to write the unit tests for just the splitting algorithm, without database. It's non-trivial because the express jobsplitting uses custom DAOs. Not sure it's worth it, given that I have to figure out a way to run with the database anyways.
The InsertSplitLumis DAO currently expects run/lumi/stream as bind variables. In the new system with the way repack and express job splitting work, this changes to subscription/lumi.
Get rid of the use of Periodic and EndOfRun splitting algorithms and especially the use of parent/child filesets and triggering on fileset closing for EndOfRun. Do it all in one job splitter, with one input fileset and one subscription. Needed features:
We'll need a WMSpec for PromptReco and everything after (DQM, RECO and AOD merges, AlcaSkimming, ALCARECO merges, DQM harvesting).
To first order we should be able to use the ReReco WMSpec from WMCore, this is a placeholder ticket to make sure it really supports all we need from it.
Complete the schema, needed for #1960
Based on CVS:
T0/src/python/T0/JobSplitting/ExpressMerge.py
T0/src/python/T0/State/Database/Oracle/ListFilesForExpressMerge.py
The old code should not be used as-is, because in the new system we are dealing with subscriptions for only a single dataset, not the one global expressmerge subscriptions.
We still cannot use the standard merge subscription because for express merges we need two special features:
These features are already supported in the current code and the code fragment dealing with them should be usable.
PS: We do not need to support splitting express data into primary datasets during the express merge. That is a feature which is supported by the current Tier0, but which has never been used.
Assumes patch from #1960. Output of JobCreator:
2011-12-22 19:10:11,135:INFO:JobCreatorPoller:Beginning JobCreator.pollSubscriptions() cycle.
2011-12-22 19:10:11,155:ERROR:JobCreatorPoller:WMWorkloadURL ./TestWorkload/WMSandbox/WMWorkload.pkl is empty
2011-12-22 19:10:11,156:ERROR:JobCreatorPoller:Have no task for workflow 1
Aborting Subscription 1
2011-12-22 19:10:11,163:ERROR:JobCreatorPoller:WMWorkloadURL ./TestWorkload/WMSandbox/WMWorkload.pkl is empty
2011-12-22 19:10:11,163:ERROR:JobCreatorPoller:Have no task for workflow 3
Aborting Subscription 3
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.