[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bgl-discuss] machine locked



At 07:50 AM 3/18/2005, Narayan Desai wrote:
>>>>> "Pete" == Pete Beckman <beckman@xxxxxxxxxxx> writes:

  Pete> At 10:00 PM -0600 3/17/05, Narayan Desai wrote:
  >> sorry, I meant standard output and standard error. Normal file
  >> I/O isn't limited in any way. Generally, we only see this
  >> quantity of stdio in the case of pretty serious bugs.

  Pete> Maybe, but why should the scheduler/resource manager care
  Pete> (unless the user asked it to limit output, like limiting core
  Pete> files)?  There should be no temp files, and the stdio should
  Pete> go directly to the user's file space.  If they generate too
  Pete> much, that's their problem, in the same way that if they
  Pete> generate too much data, that's their problem.  If we let users
  Pete> fill the entire filesystem with file writes, we should let
  Pete> them fill the data space with stdio...

the primary issue is that this data isn't only being written to disk;
it is also processed by the queue manager. (this is the mechanism we
will be able to use to determine if the job ran properly or not) The
reason we cut off stdio over a particular limit was because it caused
performance problems for all users on chiba.

I agree with Pete; stdio and stderr should be like other I/O and should not flow through the queue manager. How does observing stderr or stdio tell you whether the job ran properly or not (shouldn't the exit status of the process tell you that)? I thought that the MPD design handled this by moving all of the user-related processing (including stdout/err aggregation and stdin forwarding) to user-processes in order to protect the process manager and ensure that problems with the users' job (including flooding stdout) only affected the user.


Bill

 -nld

- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.

William Gropp
http://www.mcs.anl.gov/~gropp


- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.