The definition of the power spectrum in the usual terms is in (3.5). I give a simplified version of where it comes from in (3.7) to (3.8). Press calls this a periodogram and goes on about the definition problems, but in general, he sets the standard. This document ends with (4.7) which gives the mean average power at a set of points k/T as determined from a transform of LT points.

(1.1)

The *one-sided power spectral density *(PSD) of the
function h is

When h is real, the two terms are equal.

If h(t) goes endlessly form -∞ to ∞, then its total power and power spectral density will, in general, be infinite. Of interest then is the power spectral density per unit time. This is (1.2) divided by T.

(2.1)

(2.2)

(2.3)

The Nyquist frequency is

(2.4)

All values fall between ±f_{c}. The fact that nature may have some higher
values results in aliasing where these are mapped into lower values. The discrete form of Parseval’s theorem
relating the total power in the time domain to that in the Fourier is

*(2.5)*

*Then noting that Δ _{t}=T/N and Δ_{f}=1/T, equating the two power
expressions gives*

* **(2.6)*

*yielding Press’s*

(2.7)

Suppose
that our function c(t) is sampled at N points to produce c_{0} . . . c_{N-1}
and that these points span a range of time T, that is

T=(N-1)Δ, where Δ is the sampling interval. Then here are several different descriptions of the total power.

(3.1)

(3.3)

PSD estimators have an even greater variety. Even if it is agreed always to relate the PSD normalization to a particular description of the function normalization (e.g. (3.2)), there are al least four possibilities listed.

…

If we take an N-point sample of c(t) and use the FFT to compute its discrete Fourier transform

(3.4)

Then the periodogram estimate of the power spectrum is defined at N/2+1 frequencies as

where f_{k} is defined only for zero and the
positive frequencies

(3.6)

This is normalized such that the sum of the N/2+1 values of
P is equal to the “mean squared amplitude of the function c_{j}.

The variance as the interval becomes longer is always equal to 100%. This is skirted by averaging over many data segments.

*The
mean squared amplitude is*

*Then noting that Δ _{t}=T/N and Δ_{f}=1/T*

Bob a few more thoughts.

*Note
that the mean squared amplitude is*

*(3.9)*

* The
square root of P(f _{k})/Δ_{f}
has the desired dimensions of per root Hz. *

This correlates with the text in the paragraphs following Press’s derivation in section 13.4. He recommends taking the transform with a finer frequency spacing, and then adding up the smaller steps so that …

(4.1)

Where

(4.2)

So that with a single spacing

(4.3)

This morphs into the definitions (3.5) when a one sided power density is desired. For n=Lk, the two frequencies have the same value

(4.4)

Where

(4.5)

Using trapezoidal rule to evaluate these integrals

(4.6)

Simplifying

In general, a single component of
Hhat_{n} dominates the sum. Its
value increases as L and its square as L^{2} , canceling the L^{2}
in the denominator. This averaging
prevents are knowing the exact location of the peak. Of interest is a 60.125 Hz signal. If it is calculated with 16384 points, its aliases spread way
outside the region. When it is
calculated over 8 x 16384 seconds, it is a single peak at n=2k+1 with no
spread. The Phat (60) will appear
larger than P(60) by the amount of the spread at 60.

In general, the averages are of P. A number of segments of the data are taken and P is formed for each and then averaged over the number of segments.