Deciding on length of FFT

So, I am confused on how to decide the length of FFT so that my results are good in each case.

I guess the length should depend on number of samples and sampling rate. Please give your inputs on this.

asked Nov 15, 2011 at 6:14 185 7 7 silver badges 17 17 bronze badges

There are 100 things that you have to learn before you attempt something like this, to name the few - FFT windowing, sound resampling (sample rate changing), current techniques for audio fingerprinting, . Determining correct ALGORITHM for sound matching should be FIRST and it should come from the REQUIREMENTS of your sound matching.

Commented Nov 15, 2011 at 9:12

@DanielMošmondor Right now, I am not worried about different sampling rates of files as one of my input file is made using the other one. So, sampling rate is least of my concern. I am just comparing the waveforms and trying to find how similar they are by taking FFT and comparing the magnitude of the freq components. The problem I am facing is deciding in the length of FFT, to keep it fixed or dependent on sampling rate and total no of samples (or length of wav file)

Commented Nov 15, 2011 at 9:32

2 Answers 2

The length of the FFT, N , will determine the resolution in the frequency domain:

resolution (Hz) = sample_rate (Hz) / N 

So for example in case (1) you have resolution = 960 / 128 = 7.5 Hz . SO each bin in the resulting FFT (or presumably the power spectrum derived from this) will be 7.5 Hz wide, and you will be able to differentiate between frequency components which are at least this far apart.

Since you don't say what kind of waveforms these are, or what the purpose of your application is, it's hard to know what kind of resolution you need.

One important further point - many people using FFT for the first time are unaware that in general you need to apply a window function prior to the FFT to avoid spectral leakage.

answered Nov 15, 2011 at 9:04 212k 37 37 gold badges 399 399 silver badges 570 570 bronze badges

Thanks for the reply. I am not sure what resolution should I go for. The waveforms are simple wave files. This can be like a audio clip converted to wav format. Here, in my case I think the freq resolution is not much imp for me. Regarding window function, the data I am using for FFT is static, extracted from the wav file, do i really need to apply any window function? If I do, then here also I would be needing length of window which will be same as FFT length and I am back to square one.

Commented Nov 15, 2011 at 9:16

960 Hz is a very low sample rate for audio, but I assume you know what you're doing - can you give more info about the application, as this will most likely determine the resolution you need ? (I'm guessing maybe it's something like seismic data ?) Anyway, yes, you definitely need a window function - use something simple like von Hann, the window size is the same size as the FFT as you say.

Commented Nov 15, 2011 at 10:47

yes, 960Hz is too low sampling rate but its just one of a test file to test my app. I am just comparing the waveforms and trying to find how similar they are by taking FFT and comparing the magnitude of the freq components. The problem I am facing is deciding in the length of FFT, to keep it fixed or dependent on sampling rate and total no of samples (i.e. length of wav file). Till now I am assuming sampling rate to be same for both files and there is no time shifting thing. As you say, I will use the windowing function. Still the length is a problem for me.

Commented Nov 15, 2011 at 11:13

Well if you explain what kind of files they are and what it is that you are trying to match then it should be possible to specify a frequency resolution, which will then determine the required FFT size as a function of sample rate.

Commented Nov 15, 2011 at 12:25

they can be any wave files. Ex, 3rd file in my ques is windows default file used for notifications by windows OS. 2nd file in my ques is a movie clip of 2mins which i have converted in wav using a converter software. Like this, I can get different wave files with different length and sampling rate. One common point is, they all are wave files and comparison is done between files in which one is created using the other one (like i explained in my ques). I am comparing the waveforms and trying to get the time period in which I have done the modifications as stated in ques. Hope this helps

Commented Nov 16, 2011 at 5:46

I have to say I have found your question very cryptic. I think you should look into Short-time Fourier transform. The reason I say this is because you are looking at quite a large amount of samples if you use a sampling frequency of 44.1KhZ over 2mins with 2 channels. One fft across the entire amount will take quite a while indeed, not to mention the estimate will be biased as the signals mean and variance will change drastically over the whole duration. To avoid this you want to frame the time-domain signal first, these frames can be as small as 20ms-40ms (commonly used for speech) and often overlapping (Welch method of Spectral Estimation). Then you apply a window function such as Hamming or Hanning window to reduce spectral leakage and calculate an N-Point fft for each frame. Where N is the next power of two above the number of samples in that frame. For example:

  1. Fs = 8Khz, single channel;
  2. time = 120sec;
  3. no_samples = time * Fs = 960000 ;
  4. frame length T_length= 20ms;
  5. frame length in samples N_length = 160;
  6. frame overlap T_overlap= 10ms;
  7. frame overlap in samples N_overlap= 80;
  8. Num of frames N_frames = (no_samples - (N_length-N_overlap))/N_overlap = 11999;
  9. FFT length = 256;

So you will be processing 11999 frames in total, but your FFT length will be small. You will only need an FFT length of 256 (next power of two above frame length 160). Most algorithms that implement the fft require the signal length and fft length to be the same. All you have to do is append zeros to your framed signal up until 256. So pad each frame with x amount of zeros, where x = FFT_length-N_length. My latest android app does this on recorded speech and uses the short-time FFT data to display the Spectrogram of speech and also performs various spectral modification and filtering, its called Speech Enhancement for Android