[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fw: [bgl-discuss] MPI Buffer Problem




----- Forwarded by Bob Walkup/Watson/IBM on 06/02/2005 04:48 PM -----
Bob Walkup

06/02/2005 04:25 PM


        To:        Douglas Sondak <sondak@xxxxxxxxxx>
        cc:        
        From:        Bob Walkup/Watson/IBM@IBMUS
        Subject:        Re: [bgl-discuss] MPI Buffer ProblemLink


This sounds like MPI rank 0 is running out of memory because it is allocating buffers for messages that have been sent before a matching mpi_recv was posted.  This kind of problem is common, and generally occurs during operations that gather data onto one MPI process.  It should be possible to slightly re-structure the MPI code, to eliminate "unexpected" messages.  For example, code like this can fail:

if (myrank .eq. 0) then
   do pe = 1, numpes
      call mpi_recv(rbuf, count, type, pe, ...)
   end do
else
   call mpi_send(sbuf, count, type, 0, ...)
end if

In the code above, the mpi_recv() calls are blocking and get completed one at a time, while all of the senders try to send at once.  That can force the receiver to allocate buffers for the "unexpected" messages (the messages that don't have a corresponding receive posted).  One solution is to introduce control flow.  For example:

if (myrank .eq. 0) then
   do pe  = 1, numpes
       call mpi_send(flag, 1, mpi_integer, pe, ...)
       call mpi_recv(rbuf, count, type, pe, ...)
   end do
else
   call mpi_recv(flag, 1, mpi_integer, 0, ...)
   call mpi_send(sbuf, count, type, 0, ...)
end if

What the modified code does is to change the sequence such that each MPI process sends data to rank 0 when rank 0 asks for it, and not before.  This eliminates unexpected messages, and totally serializes the "gather" operation.  There are other potential solutions, but the idea is to make sure that matching receives get posted before the sends start pouring in.

The same kind of problem can occur on other platforms, but this would normally hit Blue Gene first, because of the small amount of memory on each node.  Also, this problem tends to be much more severe for very large parallel jobs (thousands of processes), but can still be managed by making the messages "expected".

Regards,
Bob Walkup (walkup@xxxxxxxxxx, 914-945-1512)
---------------------------------------------------------------


Douglas Sondak <sondak@xxxxxxxxxx>
Sent by: owner-discuss@xxxxxxxxxxxxxxx

06/02/2005 03:09 PM

       
        To:        discuss@xxxxxxxxxxxxxxx
        cc:        sondak@xxxxxxxxxx
        Subject:        [bgl-discuss] MPI Buffer Problem



I'm getting the following error message when I try to run a
Fortran 90/MPI code on our newly-installed Blue Gene/L at Boston
University:

RVZ: cannot allocate unexpected buffer

The code has run successfully in the past on a wide variety of
platforms including IBM p690, SGI Origin3000, linux clusters, SGI
Altix, etc.

I am running on 4 processors.  (This is a test case.)  The routine in
which the problem is occurring collects arrays on one processor using
standard blocking sends and receives.  A total of 4 messages are sent:

proc. 1 sends one message to proc. 0
proc. 2 sends one message to proc. 0
proc. 3 sends two messages to proc. 0

Each message is 5,752,300 bytes.  The error occurs when receiving the
message from proc. 1.  The messages from procs. 2 and 3 work fine.  I
found that if I reduce the size of the proc. 1 message to 43,928
bytes, it works fine (leaving the sizes of the other 3 messages at
5,752,300 bytes).

I suspected a memory problem, so I tried eliminating the 3 messages
that work, and only sent the single message from proc. 1 to proc. 0.
This still fails with the same error message.

I looked at the postings at the ANL web site, and found a posting
about a similar error message, with the word "eager" rather than
"RVZ."  I tried changing the eager limit to 6,000,000 (larger than
the message), and this didn't help.  As a shot in the dark I also
tried a small eager limit (1024), and this didn't help either.

Has anyone seen anything like this?  Might anyone have a suggestion
about diagnosing the problem?  I'm now at something of a loss as to
how to proceed.  Thanks!

___________________________________________________________

Doug Sondak                Boston University
email: sondak@xxxxxx       Office of Information Technology
phone: (617)353-8273       111 Cummington Street
fax  : (617)353-6260       Boston, MA 02215

- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.