[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bgl-discuss] discrete MPI_Wtime offset




On Tue, 6 Jun 2006, Kamil Iskra wrote:

> On Tue, Jun 06, 2006 at 13:08:47 -0500, Anthony Chan wrote:
>
> > I notice that the MPI_Wtime returned by BGL-MPI has a ~0.11 msec discrete
> > offset for every 32 processes (BGL's MPI_WTIME_IS_GLOBAL is true).  This
> > can be observed in MPI job with number of nodes, np = 64 and 256 but
> > _NOT_ in np = 128 (strange?).
>
> Interesting.  The fact that you see every 32 processes return the same
> value is of course due to the fact that the pset size on Argonne BGL is 32
> (i.e., one I/O node per 32 compute nodes).  I verified for a 64-processes
> job that within each pset the results are the same.
>
> It is kind of strange that the problem is not present for 128 nodes.  As
> far as I know, there is nothing special about 128-nodes config, certainly
> not compared to 64 or 256 nodes.
>
> It would be interesting to see how that works for larger jobs, of at least
> 512 nodes, as that's the minimum size that IBM really cares about (smaller
> jobs can't use the torus)...

512-node job does not have the problem.

> > My guess is that this is a bug in BGL's MPI implementation.
>
> I agree it's a bug.  FYI, I verified that this behaviour is present with
> either the factory-default or ZeptoOS I/O node kernel/ramdisk.
>

Thanks for verifying the problem.

- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.