Skip to: Site menu | Main content

LA-SiGMA Execution Management Interest group

Formed by members of LASiGMA community with a strong interest in execution management over LONI and other supercomputing platforms, the group meets to explore and discuss, topics within LASiGMA which can benefit from execution management. Currently, it is being led by T. Bishop, R. Hall, S. Jha, J. Moreno, J. (Ram) Ramanujam and several postdocs and students.

Upcoming event: LA-SiGMA Tutorial Announcement

"LASiGMA Execution Management Tutorial, Dec 2, Friday 1:00pm"

EVO Link

The execution management team will provide a tutorial on how to submit a large number of computations to LONI such that they run on whatever combination of LONI machines optimizes the throughput without any user intervention. The underlying tools are called ManyJobs and BigJobs and will be the subject of a Demo/Tutorial.


*Prerequisites:* LONI account, an active allocation and the ability to ssh onto all LONI machines using secure keys (a separate ssh HOWTO will be offered at noon on Dec 2)

Contact Lucy Kiruri (Lucy Kiruri, lkirur1 AT tigers.lsu.edu) for additional information. Please circulate this announcement. See also: LA-SiGMA Execution Management Software

Activities
Meetings: Fridays 1:00pm, Johnston 244 or on EVO
LASiGMA Execution Management group meets almost every week at 1:00pm Wednesdays. If you are in LSU you may attend locally in Johnston 244. You may also participate remotely via EVO.


Weekly Friday meeting : EVO Link Get the latest info by subscribing to the Execution Management mailing list ,

Topics:
Many-Jobs ManyJobs is a python based tool for managing ManyTasks on widely distributed supercomputing resources. To date it has been used in production mode on the following machines: Abe@NCSA, Lonestar@TACC, Ranger@TACC, QueenBee@LONI, Kraken@NICS, Steele@RCAC, (eric,louie,poseidon,painter,oliver)@LONI, and the CCS Cluster at Tulane's CCS. All you need is an account with a user allocation in one of the above machines and you too can run Many Jobs. ManyJob was developed as part of a LONI Institute Project to Tom Bishop, Shantenu Jha and Hideki Fujioka. Fujioka was the main developer. ManyJob provides a simple self-contained implementation of the SAGA-based BigJob concept. ManyJob has received further support from NSF Cybertools and LaSIGMA Projects (EPS-1003897).
BiG-Job and SAGA A Pilot-Job allows the execution of jobs without the necessity to queue each individual job. A pilot job is started through the regular Grid resource manager, provides a container for many sub-jobs, i.e. applications submit these sub-jobs through the pilot job and not the resource manager. A major advantage of this approach is that the waiting time at the local resource manager, which usually significantly contributes the overall time-to-completion, is avoided. It also provides application-level control of the sub-job execution. The SAGA BigJob framework is a SAGA-based pilot job implementation. Unlike other common pilot job systems SAGA BigJob (i) natively supports MPI job and (ii) works on a variety of back-end systems, generally reflecting the advantage of using a SAGA-based approach.
Other codes of interest We would like to expand our activities to include all researchers who extensively use LONI and other large supercomputing clusters. We are interested in exploring all home grown as well as canned packages to increase the productivity via better job execution management on all large computing platforms.
Mailing list: Please contact T. Bishop(bishop- AT -latech.edu) if you are interested in activities of the group. Get the latest info by subscribing to the
LA-SiGMA Execution Management mailing list ,
Wiki: You may find more information on the wiki regarding details of execution management.
LA-SiGMA Execution Management Wiki page ,