Argonne National Laboratory BG/L System

Main

How to use BGL

Getting Started
About the BG/L System
Presentations/Papers
CS Research
ANL BGL Wiki
Current Status and News
Support
BG Consortium
App Docs
Mailing lists
Policies
Search the site

Using the job resource manager on BGL: commands, options and examples

This document provides examples of how to submit jobs on the Argonne BGL System. It also provides examples of commands that can be used to query the status of jobs, what partitions are available, etc.. For an introduction to using the job resource manager and running jobs on BGL, see Running Jobs on the BGL System.

How To Examples and Results
Submit a job request Use cqsub to submit a job. Scripts and interactive jobs are not supported at this time.

Run the compiled binary exe1 with 10 nodes for a maximum of 1 hour and 30 minutes:

cqsub -n 10 -t 120 exe1

There is a special queue used for development work called short. This queue is only for jobs that meet the following criteria:

  • Requested walltime is 30 minutes or less.
  • Requested number of nodes is 64 nodes or less.

To submit jobs to this queue, use cqsub -q short. To run the compiled binary exe1 with 10 nodes for a maximum of 30 minutes in the development queue:

cqsub -q short -n 10 -t 30 exe1
Delete a job from the queue To delete a job from the queue, use the qdel command.

Cancel job 34586:

cqdel 34586
If the job failed to cancel (indicating that the resource manager is unable to kill the mpirun's cleanly), you might try again with the force option:
cqdel -f 34586
If you do have to forcibly delete a job, please send mail to support@bgl.mcs.anl.gov with the job id so that we can do the necessary cleanup.
Query queue and job information To find out information about the state of the queue, the state of particular jobs, etc., use the cqstat command.

To see a full summary of all jobs in all queues:

cqstat -f
Query partition availability To determine which partitions are currently available to the scheduler, use the partlist command. This command will give you a list of partitions and their state. For example:
% partlist
Name           Queue                   State
================================================
ANL_R00        short                   blocked
ANL_R000       short:default:reserved  blocked
ANL_R001       short:default           idle
R000_J102-32   default                 busy
R000_J102-64   default                 blocked
R000_J106-64   short:default           idle
R000_J111-64   short:default           idle

Query partitions To get information about BG/L partitions, use the bgl-listblocks command.

To see a summary of all active BG/L partitions (where active means, the partition is booting, allocated, or in the process of being freed) use the command with no arguments:

bgl-listblocks
To see a complete list of all BG/L partitions currently defined for BGL:
bgl-listblocks --all
Note that blocks may be overlapping or not available to regular users (i.e. used only running diagnostics, etc.).

To see a summary of a specific partition:

bgl-listblocks --id 

To see complete details about a partition:

bgl-listblocks --long --id 
Query jobs To get information about BG/L jobs from the view of the BG/L database, use the bgl-listjobs command.

To see a summary of all active jobs (where active means, the partition has completed booting and the job is running (R), the job is in the process of being deleted (D) or the job completed with an error (E)), use the command with no arguments:

bgl-listjobs
To see a complete list of all jobs ever run:
bgl-listjobs --all

To see a summary of a specific job:

bgl-listjobs --id 

To see complete details about a job:

bgl-listjobs --long --id 


Help Security/Privacy Notice Disclaimer