[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bgl-discuss] MPI failure, simplified
At 10:50 AM 2/21/2006, Pete Beckman wrote:
Are they saying that 80MB of data are sent in eager mode?
That's what I understood from their email.
"Our implementation is perhaps too agressive in assuming there is
enough memory to accomodate the data"
I disagree with their interpretation of the standard, but the standard may
be ambiguous enough on this point to allow their interpretation. I will
note that the standard does say (section 3.7) in talking about nonblocking
sends and system resources that "Quality implementations of MPI should
ensure that this happens only in "pathological" cases. That is, an MPI
implementation should be able to support a large number of pending
nonblocking operations." I don't consider the example a pathological case,
but as there is no definition of pathological in the standard documents,
that is also in the eye of the beholder.
The problem with the IBM approach is that it greatly complicates coding for
more complex communication patterns. This was not the intent of the MPI
Forum. This example may be misleading IBM by admitting a simple change;
not all codes and situations are so simple. The other (and I believe most
serious) problem with the IBM interpretation is that there is no way to
write a "safe" program with just Isend and Irecv (without using some
additional form of synchronization to ensure that the receive is always
posted before the send); I'm sure that the MPI Forum would be surprised by
this. To me, this is a deficiency in the standard, since we should have
been clear on what was required for a safe program.
I also believe that the performance issue (which is real) can be solved by
sending only the first bandwidth-delay product bytes (rounded up to
something efficient). At that point, if you haven't received an ack that
the receive is posted, you should stop sending (so as to avoid tripping the
expectation of the user that they've written a "safe" program by using
nonblocking sends and receives); if you have received the ack, then you can
continue with little loss in performance (only the cost of the ack, which
is overlapped). In fact, we should implement this as at least an option in
MPICH2 for the faster networks.
Also, if you are going to use a barrier (Ack!) between send and receives,
the sends should be performed with MPI_Rsend or MPI_Irsend, since that is
exactly what the Ready-send mode is for; it avoids any additional
handshakes and should (slightly) simplify the internal logic.
Bill
-Pete
- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.
William Gropp
http://www.mcs.anl.gov/~gropp
- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.