[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bgl-discuss] global tree bandwidth and socket write() performance



On Mon, Sep 26, 2005 at 09:21:21 -0500, Kamil Iskra wrote:

> There is a way to verify it.  ZeptoOS 1.1 (installed on ANL BG/L) lets you
> log on the I/O node and strace the "ciod" process (you will need root
> access to be able to do the latter).  You would then find out how large a
> buffer is being passed to the write() calls.  I could check it myself, but
> I don't have a suitable benchmark at hand, and I am too lazy to write one
> right now (it is Monday morning after all.. :-).

OK, it's no longer Monday morning, so:

I wrote a test much like yours, trying to write a 10 MB buffer to a socket.

Here's the relevant part from strace'ing ciod running on the I/O node:

write(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 73848
write(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 73848
write(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 73848
write(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 73848
write(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 73848
write(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 73848
write(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 73848
write(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 73848
write(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 73848
write(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 73848
write(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 73848
write(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 73848
write(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 73848
write(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 73848
write(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 73848
write(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 73848
write(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 73848
write(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 73848
write(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 73848
write(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 73848
write(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 73848
write(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4155104) = 73848
write(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4081256) = 73848
(and so it goes...)

The 4 MB you see above is probably the value of CIOD_RDWR_BUFFER_SIZE, a
config parameter of ciod.

Anyway, from the above I can't prove that the buffer is actually sent from
the compute node process to ciod every time write() is invoked, because the
communication protocol of the tree network is proprietary and does not go
via syscall interface.  The indications are however pretty strong that that
is indeed the case, so this would explain the performance decrease you've
observed.

Kamil

-- 
Kamil Iskra, PhD
Argonne National Laboratory, Division of Mathematics and Computer Science
9700 South Cass Avenue, Argonne, IL 60439, USA
phone: +1-630-252-7197  fax: +1-630-252-5986

- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.