When memory runs out with such a small
number of MPI processes and messages that are in the <10MB range, the
indication is that the application is running really close to the hardware
memory limit, 512 MB/node. Anything you can do to free-up some memory
should help. Alternatively, if the memory requirement per MPI process
decreases as you scale up, you might try 16 MPI tasks instead of 4, ...
or maybe even more. It will be a relatively hard road if your application
is always on the edge of running out of memory, but things should go more
smoothly if there is plenty of memory left available for MPI etc.
Regards,
Bob Walkup (walkup@xxxxxxxxxx, 914-945-1512)
--------------------------------------------------------------
Douglas Sondak <sondak@xxxxxxxxxx> Sent by: owner-discuss@xxxxxxxxxxxxxxx
06/03/2005 02:05 PM
To:
discuss@xxxxxxxxxxxxxxx
cc:
sondak@xxxxxxxxxx
Subject:
Fw: [bgl-discuss] MPI Buffer Problem
> From: Bob
Walkup/Watson/IBM@IBMUS
> This sounds like MPI rank 0 is running out of memory because it is
> allocating buffers for messages that have been sent before a matching
> mpi_recv was posted.
> ...
> One solution is
> to introduce control flow. For example:
>
> if (myrank .eq. 0) then
> do pe = 1, numpes
> call mpi_send(flag, 1, mpi_integer, pe,
...)
> call mpi_recv(rbuf, count, type, pe, ...)
> end do
> else
> call mpi_recv(flag, 1, mpi_integer, 0, ...)
> call mpi_send(sbuf, count, type, 0, ...)
> end if
Thanks very much for the help. I tried this approach, and it worked
in one case. I tried it in a different routine that works in a
similar way to the first routine, and I'm getting the same error
there. I'll look into this further.
> This user, however, experiences this problem on 4 processes with large
> messages (5.7 MB), which I don't think will be sent as eager. And
he
> experiences it even after reducing the eager limit to 1024. That's
> mysterious.
I see your point, and I don't understand this either. I'll try it
again to make sure I can duplicate the behavior.
> MPI_Gatherv has to work in this case, otherwise it can be considered
a bug
> in the implementation of gatherv.
Unfortunately I can't use a gather. I'm collecting chunks of arrays,
for example, processor 1 might send x(:,3) to processor 0, processor
2 might send x(:,6) to processor 0, etc.
Thanks for all the responses; they were quite helpful.
___________________________________________________________
Doug Sondak Boston
University
email: sondak@xxxxxx Office of Information Technology
phone: (617)353-8273 111 Cummington Street
fax : (617)353-6260 Boston, MA 02215
- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.