[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bgl-discuss] some clarifications
It sounds as if the main problem is that IBM has made the minimal number
of changes to their standard xlf compiler to get it to compile codes for
the BGL, and that it is probably a waste of time to do excessive
optimization of our codes for this machine until a more robust version
of the compiler exists, one which actually supports ALL the hardware
and does optimizations specific to this chip.
Don
PS what are the cache sizes and configurations on this chip?
On Thu, 10 Mar 2005, Andrew Siegel wrote:
>
> BGL hackers: Since these issues keep arising, let me summarize my view of
> the simd'ization state of the current compiler (please feel free to
> disprove or corraborate anything I say):
>
> First, some background -- we have never upgraded our compilers.
>
> Next, my tests lead me to characterize the simd capabilities of the
> compiler as "broken". This does not mean that -440d -03 does nothing, just
> that, in the absence of an examination of the assembler, I wouldn't trust
> it or make too much out of how it changes the results of your code, for
> slightly better or worse. This is the result of a number of tests I've
> done on small loops with obvious properties, a realization of the fact
> that .s files are never produced with extended second-fpu instructions
> (while .lst files are), that the loop vectorizer report doesn't work, that
> regardless of these things speedup properties of these simple loops are
> totally different from what should be expected, as well as our data from a
> dozen or so full application codes, none of which have reported anything
> too positive from using double fpu capabilities. Furthermore, I have
> discussed these things with IBM and nothing they said helped remedy or
> make sense of the situation. I'm left thinking that we should stop
> analyzing the double-hummer results in a rational way and to just upgrade
> and hope for the best.
>
> If, given that -440d is broken, your code is still much slower than
> expected on a node, the typical culprits have been 1) math intrinsics and
> 2) differing cache size and structure of bg/l (compared to power3).
>
> If anyone has any evidence to contradict this, please let me know! -andrew
>
> - --------------------------------------------------------------------
> To add or remove yourself from this mailing list, use the 'notifyme'
> command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.
>
>
- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.