[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bgl-discuss] big-o-jobs
Mark,
We aren't set up for larger than 32 way jobs yet. The plan was to have
people still request a chunk of time for the larger jobs. 512 can run
during the day, but we need to know about it so we can put in a
reservation and jobs won't run on the 32 ways within the midplane
partition. The scheduler at this point does not have any way to determine
if nodes are inside of another partition. Your 128 way job ran because
the 512 top midplane partition was mistakenly not taken offline. The
running of that job caused problems because the scheduler was trying to
run jobs on partitions inside of that 512 partition (all the bottom 32
node partitions were running jobs, so the scheduler attempted to run on
the top). Thus, we can not leave 512 partitions active at the same time
as the 32 node partitions in the 512 partition are active.
Jobs can be queued up using qsub to run during that reserved time
period. We were going to continue with 1024 being allowed during the
night and weekend.
I planned to write up an email describing this and sending it out to the
list but hadn't had time. So, here is the plan:
32 node partitions are active at all times except when a reservation is in
place.
1024 node jobs may be run at any time during the night or weekend, as long
as a reservation for the full rack has been put into place.
512 node jobs may be run at any time during the day or night, as long as a
reservation for the partition has been put into place.
To get a 512 node reservation, please send email to
support@xxxxxxxxxxxxxxx with the desired start time and duration. If
possible, a reservation for that time period will be put in place (if
there are any 32 node jobs running in the 512 node partition, we will
contact the owner and ask them if we can kill them, etc). If not, you
will be notified and asked to pick a later time. Putting a reservation in
place means that we tell the scheduler that at time T, 32 node partitions
X,Y,Z,... are reserved and 512 node partition 'A' becomes active.
As the time for a reservation approaches. the scheduler will not run 32
node jobs in the set of reserved partitions X,Y,Z... that would run into
the reserved time period.
Once the reservation start time arrives, any 512 node jobs will be
scheduled in partition 'A', FIFO. We are hoping to have 'owned'
reservations shortly. Then, only jobs owned by the person(s) with the
reservation will run.
To run
Susan.
On Tue, 15 Mar 2005, Mark Hereld wrote:
> so. last night i queued a few fairly short jobs (probably only 15m to 30m)
> on 512 nodes. but they remained queued all night, despite the likelyhood
> that i was the only bloke doing bgl bidnis last night. earlier in the day
> i successfully ran a 128 node short job, so i'm sure that it works in
> principal. a stack of 32 node jobs ran, taking only minutes each, and
> cleared the queue early in the night.
>
> what's up: bug, policy, harassment?
> -- mark
>
>
> Mark Hereld Futures Laboratory
> http://www.mcs.anl.gov/~hereld/ Mathematics & Computer Science
> Argonne National Laboratory
> Voice: 630 252 4170 9700 S. Cass Ave. #221
> FAX: 630 252 6424 Argonne, IL 60439
>
- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.