solar wrote:fpga4fun -
Oh, one more thought on the required FPGA size. In the software implementation of DES, I use a technique known as "bitslicing". Basically, the DES S-boxes are implemented as sets of boolean expressions rather than as lookup tables:
http://www.darkside.com.au/bitslice/It takes on average around 55 "gates" to implement one S-box. If the same is done in an FPGA, would it perhaps consume less space than a 64x4-bit ROM does? Would the signal propagation delays be any higher? I am thinking that it might be possible to fit a fully-pipelined (but possibly slower?) DES on the current Dragon board in this way.
As the previous reply mentions, gates and LUTs are not directly comparable - LUTs are nowhere near as efficient (in terms of speed, delay, power, and area). As you said though, by pipeling, you can trade some extra delay, power, and area to get the speed back up. You can almost always trade one at the expense of the others (within some limits).
You might consider adjusting your optimization algorithm to create intermediate steps that have no more than 4 inputs (for Xilinx... Altera can do a bit more). This would allow you to pipeline, and then share, the output of each of those intermediate steps in order to keep the speed up and area to a relative minimum.
Having said all of this, the RAM blocks in most modern FPGA's can run extremely fast - you'd have all your lookups done in one clock cycle with no wasted LUTs. So that's probably where the S-box belongs for most real-world applications. Nothing wrong with continuing your research though - you never know where or when something like this might be useful.
Have fun,
Marc