Hi,
For my next project I'm going to try a Mandelbrot viewer, much like http://markbowers.org/home/fpga-mandelbrot
Because I have an Xilinx board I've chosen to use 36bit unsigned + sign bit to make most of the 18x18 multipliers. I'm also going to try to squeeze as much out of my poor wee chip as possible.
Is my thinking straight that an optimally pipelined x^2 is one stage shorter than x*y?
My thinking is if you are multiplying a:b x c:d (where a,b,c and d are 18 bit vectors) you decompose it into bd + ad *2^18 + bc*2^18 + ac*2^36, where as a:b x a:b is bb + 2ab*2^18 + ac*2^36.
Does that sound reasonable?