[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bgl-discuss] Fwd: [bgl-support #302] May I ask you two questions?



> >>(1) Inconsistency of the timing results
> >>
> >>Suppose I execute a same code several times in a same day or
> >>different days.
> >>Every time I used the command "cqsub -q short -t 00:30:00 -n 16
> >>executed_program" to submit my request. So far the timing results
> >>was as
> >>small as 114.760775 seconds and as large as 117.027405 seconds.
> >>Does this
> >>make any sense?

So you get a distribution of some... 2% ?  That doesn't sound too bad.
Then again, I'm doing file I/O performance testing, so I'm used to much
larger distributions.

Your way of measuring time looks OK, BTW.

> >>(2) MPI Behavior on MSC BG/L machine
> >>
> >>I have no idea of MPI behavior on MSC BG/L machine therefore I test
> >>the max
> >>bandwidth on it by executing a ping-pong program. For example, I
> >>execute it
> >>on 16 processors to figure out the max bandwidth (MPI_Send &MPI_Recv
> >>behavior) between each pair of them. To my surprise, I got the
> >>range from
> >>153 MB/s to 158 MB/s. To be honest, I expected a bigger difference
> >>here
> >>since BG/L is a 3-D torus machines therefore different pair of
> >>processors
> >>does corresponds to different hops on which the bandwidth should
> >>dominantly
> >>depends. How to explain these close performances?

Perhaps the links utilized by process pairs in your test happened to be
non-overlapping?  How did you run this experiment?  Which pairs did you
test (surely not all 120 at the same time)?  Did you use a custom
BGLMPI_MAPPING?

Bacground info:

Each node on BG/L has 6 torus connections to its nearest neighbors, each
one with a bandwidth of around 150 MB/s.  With small partitions like the
one you used, it's actually not a torus, but a mesh, and nodes use between
3 and 5 links.  There is a number of ways that lets you find the location
of any particular process within the torus topology:

MPI_Get_processor_name() function puts it in the returned string, so you
can just print it out,

rts_coordinatesForRank(getpid(), &x, &y, &z, &t) (include <rts.h>; it's in
/bgl/BlueLight/ppcfloor/bglsys/include),

rts_get_personality(), followed by BGLPersonality_xCoord(),
BGLPersonality_yCoord(), etc.

Kamil

-- 
Kamil Iskra, PhD
Argonne National Laboratory, Mathematics and Computer Science Division
9700 South Cass Avenue, Building 221, Argonne, IL 60439, USA
phone: +1-630-252-7197  fax: +1-630-252-5986

- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.