[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bgl-discuss] discrete MPI_Wtime offset



On Tue, Jun 06, 2006 at 13:08:47 -0500, Anthony Chan wrote:

> I notice that the MPI_Wtime returned by BGL-MPI has a ~0.11 msec discrete
> offset for every 32 processes (BGL's MPI_WTIME_IS_GLOBAL is true).  This
> can be observed in MPI job with number of nodes, np = 64 and 256 but
> _NOT_ in np = 128 (strange?).

Interesting.  The fact that you see every 32 processes return the same
value is of course due to the fact that the pset size on Argonne BGL is 32
(i.e., one I/O node per 32 compute nodes).  I verified for a 64-processes
job that within each pset the results are the same.

It is kind of strange that the problem is not present for 128 nodes.  As
far as I know, there is nothing special about 128-nodes config, certainly
not compared to 64 or 256 nodes.

It would be interesting to see how that works for larger jobs, of at least
512 nodes, as that's the minimum size that IBM really cares about (smaller
jobs can't use the torus)...

> My guess is that this is a bug in BGL's MPI implementation.

I agree it's a bug.  FYI, I verified that this behaviour is present with
either the factory-default or ZeptoOS I/O node kernel/ramdisk.

Kamil

- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.