[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bgl-discuss] some clarifications



BGL hackers: Since these issues keep arising, let me summarize my view of 
the simd'ization state of the current compiler (please feel free to 
disprove or corraborate anything I say):

First, some background -- we have never upgraded our compilers. 

Next, my tests lead me to characterize the simd capabilities of the
compiler as "broken". This does not mean that -440d -03 does nothing, just
that, in the absence of an examination of the assembler, I wouldn't trust
it or make too much out of how it changes the results of your code, for
slightly better or worse. This is the result of a number of tests I've
done on small loops with obvious properties, a realization of the fact
that .s files are never produced with extended second-fpu instructions
(while .lst files are), that the loop vectorizer report doesn't work, that
regardless of these things speedup properties of these simple loops are
totally different from what should be expected, as well as our data from a
dozen or so full application codes, none of which have reported anything
too positive from using double fpu capabilities. Furthermore, I have
discussed these things with IBM and nothing they said helped remedy or
make sense of the situation.  I'm left thinking that we should stop
analyzing the double-hummer results in a rational way and to just upgrade
and hope for the best.

If, given that -440d is broken, your code is still much slower than
expected on a node, the typical culprits have been 1) math intrinsics and
2)  differing cache size and structure of bg/l (compared to power3).

If anyone has any evidence to contradict this, please let me know! -andrew

- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.