[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fw: [bgl-discuss] MPI Buffer Problem




When memory runs out with such a small number of MPI processes and messages that are in the <10MB range, the indication is that the application is running really close to the hardware memory limit, 512 MB/node.  Anything you can do to free-up some memory should help.  Alternatively, if the memory requirement per MPI process decreases as you scale up, you might try 16 MPI tasks instead of 4, ... or maybe even more.  It will be a relatively hard road if your application is always on the edge of running out of memory, but things should go more smoothly if there is plenty of memory left available for MPI etc.

Regards,
Bob Walkup (walkup@xxxxxxxxxx, 914-945-1512)
--------------------------------------------------------------


Douglas Sondak <sondak@xxxxxxxxxx>
Sent by: owner-discuss@xxxxxxxxxxxxxxx

06/03/2005 02:05 PM

       
        To:        discuss@xxxxxxxxxxxxxxx
        cc:        sondak@xxxxxxxxxx
        Subject:        Fw: [bgl-discuss] MPI Buffer Problem



>         From:   Bob Walkup/Watson/IBM@IBMUS
> This sounds like MPI rank 0 is running out of memory because it is
> allocating buffers for messages that have been sent before a matching
> mpi_recv was posted.
> ...
> One solution is
> to introduce control flow.  For example:
>
> if (myrank .eq. 0) then
>    do pe  = 1, numpes
>        call mpi_send(flag, 1, mpi_integer, pe, ...)
>        call mpi_recv(rbuf, count, type, pe, ...)
>    end do
> else
>    call mpi_recv(flag, 1, mpi_integer, 0, ...)
>    call mpi_send(sbuf, count, type, 0, ...)
> end if

Thanks very much for the help.  I tried this approach, and it worked
in one case.  I tried it in a different routine that works in a
similar way to the first routine, and I'm getting the same error
there.  I'll look into this further.

> This user, however, experiences this problem on 4 processes with large
> messages (5.7 MB), which I don't think will be sent as eager. And he
> experiences it even after reducing the eager limit to 1024. That's
> mysterious.

I see your point, and I don't understand this either.  I'll try it
again to make sure I can duplicate the behavior.

> MPI_Gatherv has to work in this case, otherwise it can be considered a bug
> in the implementation of gatherv.

Unfortunately I can't use a gather.  I'm collecting chunks of arrays,
for example, processor 1 might send x(:,3) to processor 0, processor
2 might send x(:,6) to processor 0, etc.

Thanks for all the responses; they were quite helpful.
___________________________________________________________

Doug Sondak                Boston University
email: sondak@xxxxxx       Office of Information Technology
phone: (617)353-8273       111 Cummington Street
fax  : (617)353-6260       Boston, MA 02215

- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.