bigger FPGA and a faster clock?

Dragon

Postby fpga4fun » Tue Jan 24, 2006 2:27 am

1. We can't just put a bigger FPGA on Dragon, we would need to create a new board.

2. The maximum clock speed is not simply a matter of adding an oscillator, it very much depends on what logic you put in the FPGA. Since the design is already pipelined, your best bet to get faster performance is to use newer (and faster speed grade) FPGAs.

3. Dragon source code shows how to make a PCI slave controller. You need to get a PCI master controller in the FPGA to get good transfer rates.
fpga4fun
Site Admin
 
Posts: 837
Joined: Thu Sep 18, 2003 6:47 am

Postby Kristallo » Tue Jan 24, 2006 9:26 am

You can often overclock a FPGA significantly without any probems, you just need to keep it in mind so you are not tripped by false results.

Sometimes you can get a Virtex development kit that can fit 10-50 DES cores for reasonable money. They also have built in fairly slow PowerPC processors.
Kristallo
 
Posts: 203
Joined: Mon Sep 20, 2004 3:25 am

Postby Kristallo » Tue Jan 24, 2006 10:59 am

http://www.digilentinc.com/Products/Detail.cfm?Nav1=Products&Nav2=Programmable&Prod=XUPV2P
Sometimes you can pick up those kinds of boards at academic prices.

They can fit 10-50 times larger designs that a single Xilinx Spartan IIe-200 chip depending on the size of the Virtex.

Ethernet: 10 MB/s
USB: 40 MB/s
SATA: 100 MB/s ++

What FPGA that is most suitable would depend on the result you need and what price you need it for. For a FPGA solution to be sensible you will be looking at very high performance since CPU implementations will be a lot faster before the FPGA version is ready and running.
Kristallo
 
Posts: 203
Joined: Mon Sep 20, 2004 3:25 am

Postby fpga4fun » Tue Jan 24, 2006 10:08 pm

1. Let me know what you'd need (FPGA size, external memory requirements, ...).

2. Yes, Dragon can certainly do 100MHz (I use FlashyD at 100MHz on Dragon). Dragon seems too small though.

3. If you need fast transfers from PC memory to/from FPGA, you need PCI master. If you can alleviate transfers by using memory local to the FPGA, then maybe PCI slave is enough.
fpga4fun
Site Admin
 
Posts: 837
Joined: Thu Sep 18, 2003 6:47 am

Postby fpga4fun » Fri Jan 27, 2006 5:22 am

Ok, thanks for the board suggestion.

About the gates in FPGAs, there aren't gates like there would be in an ASIC. Instead the logic expressions are mapped to 4-inputs LUTs (LUT = loop-up table). To see how your 55 gates would fit, the best is to run an FPGA software like Xilinx ISE and see what result it gives.
fpga4fun
Site Admin
 
Posts: 837
Joined: Thu Sep 18, 2003 6:47 am

Postby mrand » Wed Feb 01, 2006 1:20 pm

solar wrote:fpga4fun -
Oh, one more thought on the required FPGA size. In the software implementation of DES, I use a technique known as "bitslicing". Basically, the DES S-boxes are implemented as sets of boolean expressions rather than as lookup tables:

http://www.darkside.com.au/bitslice/

It takes on average around 55 "gates" to implement one S-box. If the same is done in an FPGA, would it perhaps consume less space than a 64x4-bit ROM does? Would the signal propagation delays be any higher? I am thinking that it might be possible to fit a fully-pipelined (but possibly slower?) DES on the current Dragon board in this way.


As the previous reply mentions, gates and LUTs are not directly comparable - LUTs are nowhere near as efficient (in terms of speed, delay, power, and area). As you said though, by pipeling, you can trade some extra delay, power, and area to get the speed back up. You can almost always trade one at the expense of the others (within some limits).

You might consider adjusting your optimization algorithm to create intermediate steps that have no more than 4 inputs (for Xilinx... Altera can do a bit more). This would allow you to pipeline, and then share, the output of each of those intermediate steps in order to keep the speed up and area to a relative minimum.

Having said all of this, the RAM blocks in most modern FPGA's can run extremely fast - you'd have all your lookups done in one clock cycle with no wasted LUTs. So that's probably where the S-box belongs for most real-world applications. Nothing wrong with continuing your research though - you never know where or when something like this might be useful.

Have fun,

Marc
mrand
 
Posts: 91
Joined: Fri Mar 18, 2005 3:04 am


Return to PCI FPGA board