[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bgl-discuss] overlapping load/fma



This may be an obvious question but I can't find it addressed anywhere:
does anyone know if the bgl proc can overlap a quadword load with a
complex (vectorized) fma?  I have assumed so since this is the key to
getting peak on powerx systems, but all of my performance data is
suggesting otherwise. For example, for a in-cache kernel with a load/fma
ratio of 1/4 I still get only about 80% of peak, whereas on the power2 I
can get near 100% with much less unrolling (1/1 ratio), and on the power3
it takes about a 1/2 ratio to achieve peak.

- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.