(1) Inconsistency of the timing results
Suppose I execute a same code several times in a same day or
different days.
Every time I used the command "cqsub -q short -t 00:30:00 -n 16
executed_program" to submit my request. So far the timing results
was as
small as 114.760775 seconds and as large as 117.027405 seconds.
Does this
make any sense? I expect a very tiny difference on timing result
such as
several centiseconds since every time the code was exclusively
executed on
the machines. Actually I collect the wall-clock time in the
following way;
double start, finish
MPI_Barrier(comm);
start = MPI_Wtime();
.
MPI_Barrier(comm);
finish = MPI_Wtime();
if (my_rank == 0)
printf("Elapsed time = %e seconds\n", finish - start);
(2) MPI Behavior on MSC BG/L machine
I have no idea of MPI behavior on MSC BG/L machine therefore I test
the max
bandwidth on it by executing a ping-pong program. For example, I
execute it
on 16 processors to figure out the max bandwidth (MPI_Send &MPI_Recv
behavior) between each pair of them. To my surprise, I got the
range from
153 MB/s to 158 MB/s. To be honest, I expected a bigger difference
here
since BG/L is a 3-D torus machines therefore different pair of
processors
does corresponds to different hops on which the bandwidth should
dominantly
depends. How to explain these close performances? I suppose the BG/
L is a
heterogeneous system but the results showed that it was a
homogeneous system
on MPI behavior aspect.
Best Regards,
Yongzhi
Dear expert:
Currently I have the following two questions regarding MSC BG/L
machine:
(1) Inconsistency of the timing results
Suppose I execute a same code several times in a same day or
different days. Every time I used the command "cqsub -q short -t
00:30:00 -n 16 executed_program" to submit my request. So far the
timing results was as small as 114.760775 seconds and as large as
117.027405 seconds. Does this make any sense? I expect a very tiny
difference on timing result such as several centiseconds since
every time the code was exclusively executed on the machines.
Actually I collect the wall-clock time in the following way;
double start, finish
MPI_Barrier(comm);
start = MPI_Wtime();
.
MPI_Barrier(comm);
finish = MPI_Wtime();
if (my_rank == 0)
printf("Elapsed time = %e seconds\n", finish - start);
(2) MPI Behavior on MSC BG/L machine
I have no idea of MPI behavior on MSC BG/L machine therefore I test
the max bandwidth on it by executing a ping-pong program. For
example, I execute it on 16 processors to figure out the max
bandwidth (MPI_Send &MPI_Recv behavior) between each pair of them.
To my surprise, I got the range from 153 MB/s to 158 MB/s. To be
honest, I expected a bigger difference here since BG/L is a 3-D
torus machines therefore different pair of processors does
corresponds to different hops on which the bandwidth should
dominantly depends. How to explain these close performances? I
suppose the BG/L is a heterogeneous system but the results showed
that it was a homogeneous system on MPI behavior aspect.
Best Regards,
Yongzhi