[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bgl-discuss] Something going on today? Jobs aren't terminating.
>>>>> "Susan" == Susan Coghlan <smc@xxxxxxxxxxx> writes:
Susan> Hi Chad,
Susan> Something has gone wrong with the scheduler. I'm not quite
Susan> certain how to fix the problem.
Actually, the queue manager had stopped receiving process completion
requests. This is why jobs appeared to get stuck. We should go through
a bunch of diagnostic stuff tomorrow. I am looking into why this
happened, but I can also add code to minimize impact as well.
Susan> Everyone,
Susan> Please use mpirun with -partition in the old way for your
Susan> jobs until we can fix the scheduler problem. Use the
Susan> partitions in the top midplane. Pick one that is not in use.
Susan> Use 'bgl-listblocks --all | grep R001_J' to get the name of
Susan> the partitions available and pick one with a status of 'F'.
Everything is back up and running now. I have run a test job through
the system, and everything worked fine. You can resume using the
queuing system.
-nld
- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.