If you look at my example post a bit further down for SDRAM block writes, I used fixed length USB packets. I started by increasing the packet length in the example SDRAM code.
Question: can you just double the size of a fixed read successfully? I found that I had to do things by baby steps to fully understand the example code.
What I found in my tests was that the kernel transitions (calls to DeviceIoControl) on the host were dominating the run time. You may find that fixed packets with padding will give you all of the performance you need. If so the development path is easy. First double the the packet size, successfully, then repeat as needed.
Generally, the thing I found the most usefull for cutting down the time I spent debugging was a USB based logic analyser. I used the LogicPort (http://www.pctestinstruments.com/).
Finally, if you get stuck, post your Xilinx ISE 9.1.01i compilable project, and I'll take a close look.
-Reed