[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bgl-discuss] overlapping load/fma
This may be an obvious question but I can't find it addressed anywhere:
does anyone know if the bgl proc can overlap a quadword load with a
complex (vectorized) fma? I have assumed so since this is the key to
getting peak on powerx systems, but all of my performance data is
suggesting otherwise. For example, for a in-cache kernel with a load/fma
ratio of 1/4 I still get only about 80% of peak, whereas on the power2 I
can get near 100% with much less unrolling (1/1 ratio), and on the power3
it takes about a 1/2 ratio to achieve peak.
- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.