[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bgl-discuss] Something going on today? Jobs aren't terminating.
Hi Chad,
Something has gone wrong with the scheduler. I'm not quite certain how
to fix the problem.
Everyone,
Please use mpirun with -partition in the old way for your jobs until we
can fix the scheduler problem. Use the partitions in the top midplane.
Pick one that is not in use. Use 'bgl-listblocks --all | grep R001_J' to
get the name of the partitions available and pick one with a status of
'F'.
thanks,
Susan.
On Sun, 20 Mar 2005, Chad Glendenin wrote:
> Did I miss a notification about some activity or maintenance going on with
> BG/L today? I can run jobs, and they produce the expected output to the
> filesystem, but the jobs aren't terminating, and I'm not getting my
> stdout.
>
> chad@login1:~> qstat
> JobID User WallTime Nodes State
> =========================================
> 327 smc 00:00:10 32 running
> 328 chad 00:00:10 1 running
> 329 chad 00:00:01 1 running
>
> That one-minute job (329) started about 15 minutes ago, and the command
> was the following:
>
> qsub -t 1 -n 1 -c 1 /home/chad/vol-0.6/vol --usage
>
> All that command should do is print some help text to stdout and exit.
>
> "qdel 329" tells me my job is deleted, but it's still in the queue and
> there's no 329.output/error in $HOME.
>
> "qdel -f 329" manages to delete the job from the queue, but it also throws
> away my stdout.
>
> Do I need to do something to fix this?
>
> Thanks,
> ccg
>
> - --------------------------------------------------------------------
> To add or remove yourself from this mailing list, use the 'notifyme'
> command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.
>
>
- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.