[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bgl-discuss] MPI failure, simplified



At 10:50 AM 2/21/2006, Pete Beckman wrote:
Are they saying that 80MB of data are sent in eager mode?

That's what I understood from their email.

"Our implementation is perhaps too agressive in assuming there is
enough memory to accomodate the data"

I disagree with their interpretation of the standard, but the standard may be ambiguous enough on this point to allow their interpretation. I will note that the standard does say (section 3.7) in talking about nonblocking sends and system resources that "Quality implementations of MPI should ensure that this happens only in "pathological" cases. That is, an MPI implementation should be able to support a large number of pending nonblocking operations." I don't consider the example a pathological case, but as there is no definition of pathological in the standard documents, that is also in the eye of the beholder.


The problem with the IBM approach is that it greatly complicates coding for more complex communication patterns. This was not the intent of the MPI Forum. This example may be misleading IBM by admitting a simple change; not all codes and situations are so simple. The other (and I believe most serious) problem with the IBM interpretation is that there is no way to write a "safe" program with just Isend and Irecv (without using some additional form of synchronization to ensure that the receive is always posted before the send); I'm sure that the MPI Forum would be surprised by this. To me, this is a deficiency in the standard, since we should have been clear on what was required for a safe program.

I also believe that the performance issue (which is real) can be solved by sending only the first bandwidth-delay product bytes (rounded up to something efficient). At that point, if you haven't received an ack that the receive is posted, you should stop sending (so as to avoid tripping the expectation of the user that they've written a "safe" program by using nonblocking sends and receives); if you have received the ack, then you can continue with little loss in performance (only the cost of the ack, which is overlapped). In fact, we should implement this as at least an option in MPICH2 for the faster networks.

Also, if you are going to use a barrier (Ack!) between send and receives, the sends should be performed with MPI_Rsend or MPI_Irsend, since that is exactly what the Ready-send mode is for; it avoids any additional handshakes and should (slightly) simplify the internal logic.

Bill


-Pete

- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.

William Gropp
http://www.mcs.anl.gov/~gropp


- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.