|
|
|
Using the job resource manager on BGL: commands, options and examples
This document provides examples of how to submit jobs on the Argonne BGL
System. It also provides examples of commands that can be used
to query the status of jobs, what partitions are available, etc..
For an introduction to using the
job resource manager and
running jobs on BGL, see
Running Jobs on the BGL System.
| How To |
Examples and Results |
| Submit a job request |
Use cqsub to submit a job. Scripts and interactive jobs are not
supported at this time.
Run the compiled binary exe1 with 10 nodes for a maximum of 1 hour and 30
minutes:
cqsub -n 10 -t 120 exe1
There is a special queue used for development work called short. This
queue is only for jobs that meet the following criteria:
- Requested walltime is 30 minutes or less.
- Requested number of nodes is 64 nodes or less.
To submit jobs to this queue, use cqsub -q short. To run the compiled binary exe1
with 10 nodes for a maximum of 30 minutes in the development queue:
cqsub -q short -n 10 -t 30 exe1
|
| Delete a job from the queue |
To delete a job from the queue, use the qdel command.
Cancel job 34586:
cqdel 34586
If the job failed to cancel (indicating that the resource manager
is unable to kill the mpirun's cleanly),
you might try again with the force option:
cqdel -f 34586
If you do have to forcibly delete a job, please send mail to
support@bgl.mcs.anl.gov with the job id so that we can do
the necessary cleanup.
|
| Query queue and job information |
To find out information about the state of the queue, the state of
particular jobs, etc., use the cqstat command.
To see a full summary of all jobs in all queues:
cqstat -f
|
| Query partition availability |
To determine which partitions are currently available to the scheduler, use the
partlist command. This command will give you a list of partitions
and their state. For example:
% partlist
Name Queue State
================================================
ANL_R00 short blocked
ANL_R000 short:default:reserved blocked
ANL_R001 short:default idle
R000_J102-32 default busy
R000_J102-64 default blocked
R000_J106-64 short:default idle
R000_J111-64 short:default idle
|
| Query partitions |
To get information about BG/L partitions, use the
bgl-listblocks command.
To see a summary of all active BG/L partitions
(where active means, the partition is
booting, allocated, or in the process of being freed) use the command with no arguments:
bgl-listblocks
To see a complete list of all BG/L partitions currently defined for BGL:
bgl-listblocks --all
Note that blocks may be overlapping or not available to regular users (i.e. used only running diagnostics, etc.).
To see a summary of a specific partition:
bgl-listblocks --id
To see complete details about a partition:
bgl-listblocks --long --id
|
| Query jobs |
To get information about BG/L jobs from the view of the BG/L database, use the
bgl-listjobs command.
To see a summary of all active jobs
(where active means, the partition has completed booting and the job is
running (R), the job is in the process of being deleted (D) or the job completed with an error (E)),
use the command with no arguments:
bgl-listjobs
To see a complete list of all jobs ever run:
bgl-listjobs --all
To see a summary of a specific job:
bgl-listjobs --id
To see complete details about a job:
bgl-listjobs --long --id
|
|
|