[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bgl-discuss] 30% faster DGEMM than ESSL DGEMMS



Hi all,

Just in case this is useful to someone. 

Recently, I have tuned DGEMM code in C for the Double Hummer on BG/L. The code 
achieved 30% speedup over the DGEMMS in ESSL. I am sure this is not the best 
that can be achieved and I didn't explore many alternatives but just as one 
reference if someone is serious about squeezing the last drop of performance 
out of 440. 

http://www-unix.mcs.anl.gov/~jaewook/pub/dgemm.pub.c

I wish I could find about it if anyone has attained the comparable or higher 
speedup on a single processor for square matrix multiplication.

Experimental settings

- 1024x1024 double precision matrices
- manual coding based on the built-in functions to use Double Hummer
- Two-level cache tiling: 256x256 for L3, 32x32 for L1
- Superword-Level Locality: 4x4x4 for i, j, k-loop
- Loop interchang and loop coalescing on L1-tiled k-loop
- Some loop invariant code motion and redundant code elimination

The iteration counts are the multiple of tile sizes so that there is no 
trailing loops but if tile sizes are not even, you need to have trailing 
loops that can be just the original loop. One caveat is that you have to 
guarantee that the start addresses of the arrays are aligned to 16 byte 
boundaries.

Compilation command:

> blrts_xlc -I/bgl/BlueLight/ppcfloor/bglsys/include -O5 -qarch=440d 
-qtune=440 -qmaxmem=64000 dgemm.pub.c -L/bgl/BlueLight/ppcfloor/bglsys/lib 
-lrts.rts -ldevices.rts -o dgemm.pub.exe -DPRINT -lessln 
-L/soft/tools/essl-rev1/ -lxlfmath -L/opt/ibmmath/lib -I/opt/ibmmath/include 
-L/opt/ibmcmp/xlf/bg/10.1/blrts_lib -lxlf90 -L/opt/ibmcmp/xlf/bg/10.1/lib 
-qlist -qsource


By the way, does anyone know if DGEMMS in ESSL use Double Hummer 
instructions ? I read that a subset of ESSL routines use the SIMD 
instructions but I didn't dare to disassemble the code. 

-jaewook-

- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.