Coder Social home page Coder Social logo

ajunlonglive / drmaa Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dgruber/drmaa

0.0 1.0 0.0 1.31 MB

Compute cluster (HPC) job submission library for Go (#golang) based on the open DRMAA standard.

License: BSD 2-Clause "Simplified" License

Shell 1.73% Go 97.52% Dockerfile 0.75%

drmaa's Introduction

go-drmaa

GoDoc Apache V2 License Go Report Card

This is a job submission library for Go (#golang) which is compatible to the DRMAA standard. The Go library is a wrapper around the DRMAA C library implementation provided by many distributed resource managers (cluster schedulers).

The library was developed using Univa Grid Engine's libdrmaa.so. It was tested with Grid Engine, Torque, and SLURM, but it should work also other resource managers / cluster schedulers which provide libdrmaa.so.

The "gestatus" subpackage only works with Grid Engine (some values are only available on Univa Grid Engine).

The DRMAA (Distributed Resource Management Application API) standard is meanwhile available in version 2. DRMAA2 provides more functionality around cluster monitoring and job session management. DRMAA and DRMAA2 are not compatible hence it is expected that both libraries are co-existing for a while. The Go DRMAA2 can be found here.

Note: Univa Grid Engine 8.3.0 and later added new functions that allows you to submit a job on behalf of another user. This helps creating a DRMAA service (like a web portal) that submits jobs. This functionality is available in the UGE83_sudo branch: https://github.com/dgruber/drmaa/tree/UGE83_sudo The functions are: RunJobsAs(), RunBulkJobsAs(), and ControlAs()

Compilation

First download the package:

   export GOPATH=${GOPATH:-~/src/go}
   mkdir -p $GOPATH
   go get -d github.com/dgruber/drmaa
   cd $GOPATH/github.com/dgruber/drmaa

Next, we need to compile the code.

For Univa Grid Engine and original SGE:

   source /path/to/grid/engine/installation/default/settings.sh
   ./build.sh
   cd examples/simplesubmit
   go build
   export LD_LIBRARY_PATH=$SGE_ROOT/lib/lx-amd64
   ./simplesubmit

For Son of Grid Engine ("loveshack"):

   source /path/to/grid/engine/installation/default/settings.sh
   ./build.sh --sog
   cd examples/simplesubmit
   go build
   export LD_LIBRARY_PATH=$SGE_ROOT/lib/lx-amd64
   ./simplesubmit

For Torque:

If your Torque drmaa.h header file is not located under /usr/include/torque, you will have to modify the build.sh script before running it.

   ./build.sh --torque
   cd examples/simplesubmit
   go build
   ./simplesubmit

For SLURM and the updated SLURM C drmaa binding

   ./build.sh --slurm /usr/local

The example program submits a sleep job into the system and prints out detailed job information as soon as the job is started.

Short Introduction in Go DRMAA

Go DRMAA applications need to open a DRMAA session before the DRMAA calls can be executed. Opening a DRMAA session usually establishes a connection to the cluster scheduler (distributed resource manager). Hence if no more DRMAA calls are made the Exit() method of the session must be executed. This tears down the connection. When an application does not call the Exit() method this can leave a communication handle open on the cluster scheduler side (which can take a while to be removed automatically). It should be always avoided not to call Exit(). In Go the defer statement can be used but remember that the function is not executed when an os.Exit() call is made.

Creating a DRMAA session:

s, err := drmaa.MakeSession()

Usually jobs and job workflows are submitted within DRMAA applications. In order to submit a job first a job template needs to be allocated:

jt, errJT := s.AllocateJobTemplate()
if errJT != nil {
   fmt.Printf("Error during allocating a new job template: %s\n", errJT)
   return
}

Underneath a C job template is allocated which is out-of-scope of the Go system. Hence it must be ensured that the job template is deleted when it is not used anymore. Also here the Go defer statement is useful.

// prevent memory leaks by freeing the allocated C job template at the end
defer s.DeleteJobTemplate(&jt)

The job template contains the specification of the job, like the command to be executed and its parameters. Those can be set by the setter methods of the job.

// set the application to submit
jt.SetRemoteCommand("sleep")
// set the parameter (use SetArgs() when having more parameters)
jt.SetArg("1")

A job can be executed with the session RunJob() method. If the same command should be executed many times, running it as a job array would make sense. In Grid Engine each instance gets a task ID assigned which the job can see in the SGE_TASK_ID environment variable (which is set to unknown for normal jobs). This task ID can be used for finding the right data set the job (array job task) needs to process. Submitting an array job is done with the RunBulkJobs() method.

jobID, errSubmit := s.RunJob(&jt)

// submitting 1000 instances of the same job
jobIDs, errBulkSubmit := s.RunBulkJobs(&jt, 1, 1000, 1)

A job state can also be changed (suspended / resumed / put in hold / deleted):

errTerm := s.TerminateJob(jobID)

The JobInfo data structure contains the runtime information of the job, like exit status or the amount of used resources (memory / IO / etc.). The JobInfo data structure can be get with the Wait() method.

jinfo, errWait := s.Wait(jobID, drmaa.TimeoutWaitForever)

For more details please consult the documentation and the DRMAA standard specifications.

More examples can be found on my blog at http://www.gridengine.eu.

drmaa's People

Contributors

dgruber avatar jbarber avatar aminiussi avatar alea55 avatar cameronbrunner avatar ucappprofile avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.