The definition of the power spectrum in the usual terms is in (3.5).  I give a simplified version of where it comes from in (3.7) to (3.8).  Press calls this a periodogram and goes on about the definition problems, but in general, he sets the standard.  This document ends with (4.7) which gives the mean average power at a set of points k/T as determined from a transform of LT points.

Chapter 12
 Press et al, Numerical Recipes in C


The one-sided power spectral density (PSD) of the function h is


When h is real, the two terms are equal.

                If h(t) goes endlessly form - to , then its total power and power spectral density will, in general, be infinite.  Of interest then is the power spectral density per unit time.  This is (1.2) divided by T.

Discrete data




The Nyquist frequency is


All values fall between ±fc.  The fact that nature may have some higher values results in aliasing where these are mapped into lower values.  The discrete form of Parseval’s theorem relating the total power in the time domain to that in the Fourier is    

Bob’s derivation


Then noting that Δt=T/N and Δf=1/T, equating the two power expressions gives


yielding Press’s


Chapter 13.4
 Press et al, Numerical Recipes in C- Power Spectrum Estimation using the FFT

                Suppose that our function c(t) is sampled at N points to produce c0 . . . cN-1 and that these points span a range of time T, that is

T=(N-1)Δ, where Δ is the sampling interval.  Then here are several different descriptions of the total power.





                PSD estimators have an even greater variety.  Even if it is agreed always to relate the PSD normalization to a particular description of the function normalization (e.g. (3.2)), there are al least four possibilities listed.

                If we take an N-point sample of c(t) and use the FFT to compute its discrete Fourier transform


Then the periodogram estimate of the power spectrum is defined at N/2+1 frequencies as


where fk is defined only for zero and the positive frequencies


This is normalized such that the sum of the N/2+1 values of P is equal to the “mean squared amplitude of the function cj.

                The variance as the interval becomes longer is always equal to 100%.  This is skirted by averaging over many data segments.

Bob’s derivation

                The mean squared amplitude is


Then noting that Δt=T/N and Δf=1/T


Bob a few more thoughts. 

                Note that the mean squared amplitude is


                The square root of P(fk)/Δf has the desired dimensions of per root Hz.

This correlates with the text in the paragraphs following Press’s derivation in section 13.4.  He recommends taking the transform with a finer frequency spacing, and then adding up the smaller steps so that …

Power averaging






So that with a single spacing


This morphs into the definitions (3.5) when a one sided power density is desired.  For n=Lk, the two frequencies have the same value





Using trapezoidal rule to evaluate these integrals




In general, a single component of Hhatn dominates the sum.  Its value increases as L and its square as L2 , canceling the L2 in the denominator.  This averaging prevents are knowing the exact location of the peak.  Of interest is a 60.125 Hz signal.  If it is calculated with 16384 points, its aliases spread way outside the region.  When it is calculated over 8 x 16384 seconds, it is a single peak at n=2k+1 with no spread.  The Phat (60) will appear larger than P(60) by the amount of the spread at 60.


                In general, the averages are of P.  A number of segments of the data are taken and P is formed for each and then averaged over the number of segments.