Coder Social home page Coder Social logo

autopyfactory's Introduction

PanDA WMS

This is the open source repository for the PanDA Workload Management System.

The PanDA (Production and Distributed Analysis) system was originally developed for the ATLAS experiment at CERN's Large Hadron Collider (LHC) by teams at Brookhaven National Laboratory and the University of Texas at Arlington.

While PanDA continues to be developed for ATLAS, where it is used for the massively scaled distributed data-intensive production and analysis processing of the experiment at over 100 sites globally (150k concurrent jobs around the clock, a million jobs a day, analyzing a data set currently 150 petabytes in size), it is also being generalized, extended and packaged for use by other scientific communities through the BigPanDA project supported by the US Department of Energy.

For more information see:

PanDA twiki at CERN: https://twiki.cern.ch/twiki/bin/view/PanDA/PanDA

PanDA WMS project site: http://pandawms.org

ATLAS experiment: http://atlas.ch/

autopyfactory's People

Contributors

btovar avatar jhover avatar jose-caballero avatar lincolnbryant avatar ptrlv avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

autopyfactory's Issues

Agis, apf-queue label name

We didn't discuss this. APF queue [section] names need to be unique so check if ce_name is ok otherwise use ce_id. ce_name can be rather verbose.

Currently constructed using ce_name but previsouly we use [nickname+ce_id].

Agis, separate default files

Add config file agis.conf with sections for separate queries. This is to allow for example different defaults for analy/production (or other categories).

Review schedplugin messages

APF-2.4 message example:

Ready:in=0;activated=18,offset=0,pending=16;ret=2;Scale:in=2;factor=0.5;ret=1;MaxPerCycle:in=1;maxpercycle=50;ret=1;MinPerCycle:in=1;minpercycle=0;ret=1;StatusTest:(not test);input=1; Return=1;StatusOffline:(not offline);in=1;ret=0;MaxPending:in=1;pending=16,maxpending=100;ret=1

What structure do these have? The monitor needs to tokenize this so I suggest these changes:

  1. semi-colon to separate scheds
  2. change ; to , within a single sched
  3. StatusTest Return change to ret (to be consistent)
  4. make consistent the comment part eg. (not offline)

Or can this be improved further?

Common wrapper pilot STATUSCODE

This issue belongs in the wrapper repo when it gets onto github.

Reminder to check if wrapper signals the pilot statuscode to apfmon.

Empty batchinfo

Occassionally the sched plugins report batchinfo = None. An example in APF-2.3.1 when using Condor:
ReadySchedPlugin.py:39 calcSubmitNum(): Missing info. wmsinfo is WMSQueueInfo: notready=590, ready=52, running=4658, done=0, failed=0, unknown=0 batchinfo is None

This is a showstopper for the scheduling and no pilots get sheduled. I just restarted the factory and pilots are flowing again but only after the queue config was refreshed. I think my question for this issue is how can we clarify whats going on? Under what circumstances is batchinfo=None?

Throttle admin email notifications

(Requested by Peter Love. )
Especially for proxy problems, but useful for everything. Limit the number of admin email notifications by specifying a minimum delay between identical emails.

ThrottleAborted sched plugin

This issue to track progress on the "feature request" email on Feb 11. This is where pilot submission is throttled based on the number of Removed jobs in condor_history.

An exmaple is where the CE rejects jobs, such as this:
CREAM error: BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:qsub: Maximum number of jobs already in queue for user MSG=total number of current user's jobs exceeds the queue limit: user [email protected], queue hmem_sl6-) N/A (j

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.