Continue to Site

Eng-Tips is the largest engineering community on the Internet

Intelligent Work Forums for Engineering Professionals

  • Congratulations IDS on being selected by the Eng-Tips community for having the most helpful posts in the forums last week. Way to Go!

FFT in FPGA?

Status
Not open for further replies.

walker1

Industrial
Dec 27, 2001
117
This was 1st posted under the PLC forum.


I just had a request yesterday. If I would be able to create a small, say 4, 12 bit inputs, FFT in a FPGA ?

We use the Altera Cyclone series, and Altera does have a MegaCore FFT, which, however, starts at 64 inputs and uses way too many resources in the Cyclone. A posting to Altera's support has revealed, that the function can not be scaled down.

In other words, I have to create something myself.

Does anybody know where to find an algorithm, that can be implemmented without using a DSP core or similar?
I do have a Pascal module (with source!) I can try to figure out, but if it is of any use for hardware is unknown.
 
Replies continue below

Recommended for you

Indeed yes!

I was told that cohenrently adding the complex results of the FFT would give a much better result than just doing it with the 4 raw I/Q samples.

A little too much math for me here and now, but the guy has proved VERY competent before. And his MatLab simulations did show a significant improvement.

How it might work, when everything is limited to 12 bit resolution integers instead of 64 bit double precision floating point, that is another story.
 
I don't understand what you think you can get with a 4 sample FFT. The resolution is too poor to get much of anything.

TTFN



 
No the resolution of a 4 point FFT will be coarse to say the least, but that is not what we are looking for.

As I have mentioned, some of the math is a little above me at the moment, but I will try to describe what we are doing.

We have a scientific radar, where each range channel is being recorded at 400 ks/s. I and Q.

Up until now we have integrated over a minimum of four samples by summing I and Q values and transmit the resulting integrated I and Q. The next resulting sample contains the sum of the following 4 input samples, but no pre-history. Integrate & Dump, as we call it.
We could simply sample at 100 kHz and be done with it, but with this we are more flexible with the resulting sample rate. We also gain some in signal / noise.

Now, on aircraft I believe it mainly was, it would give better results if we first did a small FFT and then integrated over the results of that.

As I can recall from the simulations: With the old method we loose up to 10 dB in the integrated signal at certain Dopplers, where as the new should give a loss well below 1 dB. That with a 4 point FFT. Better still with a longer one.
 
Something fishy is going on here. You are taking time domain samples and either summing to produce a filtered answer or FFT'ing. Summing as a filter is probably quite poor. You could therefore implement a better time domain filter using weighted sums of the samples. To get an equivalent filter using an FFT you would FFT, remove some artefacts, then inverse FFT to get back a time domain result.

This filtering in the FFT domain sounds difficult with so few points.
 
For a four sample frequency transform, a Discrete Fourier Transform is probably as efficient as an FFT. You could implement your own DFT from information in the DSP Guide.


I have similar doubts as do other responders. For noise reduction, you might consider a Savitzky-Golay Filter. This filter does a Least Squares Polynomial Fit to data samples and is implemented as a FIR filter structure. See

 
Well I'm just a mere mortal engineer, who have been asked, if I could create such an FFT.

And we have to do some limited filtering on the input samples, because the next set will come from the same hardware channel, but the RF source may have changed. (Frequency, polarization, range or even monopulse)

It is of course known how many samples that will be received in each set and for each RF source.

The described Integrate & Dump is performed in our FPGA, and reduces the amount of data to transfer to the PC. 100 ks/s is just about what we can handle. (Each burst contains 35 32 bit words.)
 
I recall that one is supposed to choose a 'windowing function' (Hamming, etc.) to smoothly enter and exit the data set. With only 4 samples, what weights would you assign that could possibly be described as smooth? Without such smooth windowing, the discontinuities will mean that you're simply creating plenty of frequency-domain artifacts from your 4-point square wave pulse.

Are we missing something here?

 
Well, the MatLab guy brought windowing up himself, but said something like:
"Forget that. It will only have minor influence on what we are going to do."

I have now partwise constructed (paper drawing only!) the 4 input butterfly diagram that is the main part of the FFT.
If I am correct, the 0 output bin (DC ?) ends up with exactly the average of the 4 inputs, which does not sound too surprising to me.
This at least implies that some of the information we up until now have lost during the Integrate & Dump can be reclaimed.
The intention as decribed earlier is to sum the four I's and the four Q's on the output and use that I/Q set further on in the process.

Still, the math behind all this is a little bit on the hairy side to me, so I may be wrong.
 
Windowing affects the amount of spectral leakage. If your frequencies of interest are not exactly multiplies of the quantization, you'll get leakage and the various windows affect the amount of leakage.


As for averaging, neither frequency nor time domain samples should be added together willy-nilly. Registration prior to averaging must be investigated, particularly if you have no other synchronization.

TTFN



 
If I were in your shoes, I'd run a numerical experiment with actual 12-bit input data to see what is really going on. A quick and dirty experiment could be set up in MS-Excel, or your favorite language, fairly quickly (manually feeding it with 12-bit data).

Of course you'll want to process with more bits internally to avoid truncating the information away, but using 12-bit limited input data would be critical to the sim.

One general rule of thumb seems to be that when the volume of processing vastly exceeds the volume of input data, be very suspicious that somebody is fooling themselves. We've all met the math long-hairs that want to process a binary '1' to see if they can extract any further information...

I still don't see how many different frequencies you think can be hidden inside four samples... This is where simulating in math (without condering the actual bits) can be very misleading.


Given the number of high-performance systems (SW radios, modems) that are recently starting to use the sea-of-gates FPGAs (instead of DSPs) to perform signal processing, there must be SW tools available to translate from one domain (DSP) into the other (FPGA). No way people are doing that by hand.

 
Whatever you plan to do with the data in the Frequency Domain can always be done in the Time Domain. The choice is usually based on simplicity or speed of execution. It would appear that you want to preserve more information while improving the S/N ratio. Instead of your "Integrate and Dump (Which might be called "Average four data points"), perhaps something as simple as a Moving Average Filter followed by the 4:1 compression would improve things adequately. And it's easy to implement.

 
Well, in the system under devellopment we can integrate over a minimum of 4 all the way up to 64 samples. So far 16 has been max, giving each result up to 16 bits. (Beyond that I start down-shifting in the new system)

I have now tried to simulate a 4 point DFT in Exel, and summing after that actually does give a better power response than what we are used to. All 0.02 dB of it :)
That with a very slow moving input. I will try something faster tomorrow.

But unless I am wrong, it can be implemmented without any multipier. Sums and differences only! With only 4 points the sine and cosine factors must of course be -1, 0 or 1.
I guess that will reduce the number of cells I need in the FPGA.

I have seen a page on the web, which used two four point converters (plus some adding and multiplication) to create an eight point converter.
Eight or 16 points will of course with optional zero padding give us a much more flexible system.

And the moving average stuff?
I am familiar with the general algorithm, but I am not sure we can use it.
In order to get the best corrolation we must integrate over an entire code length, which is very close to 20 samples.
So I guess we need all the data on equal terms.
 
No. No rounding error it seems.

My sine and cosine input had a period of 100 samples.

If I went to a 15 times higher frequency (6 2/3 sample per period) the difference was 5 dB.
And of course, if if my period matched my 4 point DFT the integration gave out zero.

The power on the frequency side, however, stayed exactly at 16, as always.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor