Robust Fitting of
Spectra to Splines with Variable Knots.
R. L.
Coldwell, University of Florida and Constellation Technology
Abstract
Spectra consist of continuum features that
vary over many channels, and peaks that vary over few channels. In a fit to the
continuum the peaks appear as outliers. Robust methods in which the weights
associated with the peaks are reduced allow the continuum to be fitted almost
independently of the peaks. This requires a very smooth background. Cubic
splines, which are continuous with continuous first and second derivatives, are
a good choice for this task when the locations of the discontinuities in the
third derivatives, the knots, are included in the parameters optimized in the
fitting process. An extension to the Marquardt method allows a Newton-Raphson
method to be used in this optimization.
Introduction
Spectra consist of continuum and peaks. It
seems intuitive that the continuum should be smooth and as close to the bottom
of the peaks as possible. The code RobWin fits spectra by finding the
continuum, then the peaks. The physics needed to determine the continuum is the
fact that it and its first two derivatives are continuous and that the
locations in the second derivatives are optimally placed. When outlier
regression is added to this the continuum can be fitted with no reference to
the peaks. The code used in this work is RobWin

The
first few sections give a sketch of the mathematics of the non-linear
regression methods that allow the continuum to be fitted with respect to the
knot locations. Then an easy example of the continuum for a nuclear spectrum
taken with an intrinsic Germanium detector is shown. Finally an 152Eu spectrum
take with a thin HgI detector in which the separation of peaks from continuum
is not quite so easy is fitted to a continuum, a nuclide response function
including 5 efficiency constants, and a few unidentified peaks.
Naïve Newton Raphson
Minimization
Define
(1)
The object is to minimize c 2with respect to
. This requires finding a vector
such that for all M components of ![]()
(2)
First calculating c 2 and its first derivatives at a value
. Use these for a Taylor series
expansion about
such
that near this point the first derivatives at
are given by
(3)
The
Newton-Raphson method for finding the minimum is to truncate the above
expansion to the terms shown and solve for the value of
such that the first derivatives
are all zero.
The first derivative vector at
required for this expansion is
given by
(4)
While the second derivative array at
is given by
(5).
The second sum contains the term
, which as the solution
is approached oscillates with alternating signs about zero. The first term
contains no such oscillating small term. The minimization of c 2 requires that its partials be set exactly to zero, which
means equation 2 needs to be exact. The expansion of this partial as given by
equation 3 is an approximation in which cubic and higher terms are already
neglected so that very little is lost in equation 5 by dropping the small term
is the second partial to yield
(6)
Inverting this matrix allows us to
immediately find ![]()
(7)
The Marquardt Parameter
For
the Newton Raphson method to find a minimum in the unadulterated form given
above it is necessary that
is less than the radius of convergence of equation 3. When
this is not true the quantity minimized is
+c 2, where
is a
multidimensional version of the Marquardt parameter. For sufficiently large
values of
, this
added term dominates and the minimum is found at
, while in the limit, the Newton Raphson
method is recovered.
Two traditional problems of the Newton
Raphson method are that of a constant cM which has no effect on c 2 and that of two constants, cL and cL'
which are in, effect multiples of each other. Both of these cause the second
derivative matrix to have no inverse. The addition of an extremely small l M makes the desired value of cM become uniquely
zero while extremely small values of l L and l L' make the desired cL=, cL'. In both cases any
very small values of
make the matrix invertable, solving the problem and giving the
false impression the exact value of
is unimportant . For non-linear system with constants in the
exponential, the problem in general is that the quadratic approximation in
equation 2 predicts the minimum at a
far enough from
that it is beyond the radius of convergence of the expansion.
It is then not only not a minimum, but frequently its calculation causes a
computer overflow error. The solution is to first find the cK that
are causing this and then to make l K the
proper size for optimal convergence. The value of cK predicted for
the next step as a function of l K is typically of
the form shown to the right with the arrow marking the desired value. For large
values of l K the new value of cK is equal
to the old value, while for small values the predicted cK causes
machine overflow.
The Robmin method used in this work solves
iteratively for the decrease in c 2 that
can be achieved in each step. It uses the difference between the vector of
first derivatives and the array of second derivatives for the predicted changes
in
to find the ratio
of the l K's to each other. Finally it uses a one
dimensional Newton's method to find the final
for which the predicted c 2 is as much lower than the present one as can be achieved.
The method is working properly far from the minimum when every third prediction
results in a lower c 2. In the event that convergence appears
to be established, but is very slow Aitkin's extrapolation is used in the early
steps. At the extrema in c 2,
. This method and the code NLFIT which is
one of its simplest implementations has been drug out in a series of class
lectures starting in ..\..\class/CROBM96.html
The Standard Deviation
in ![]()
The
difference between any function of N statistically independent points and its
true value is due to differences introduced by these points.
. (8)
The difference squared is
(9)
Its expectation value when averaged over an
ensemble of equivalent data sets is
(10)
Assume that the changes are sufficiently
small that equation 7 is accurate enough to represent c 2 when one value of xi is changed within its
standard deviation. Further assume that one last iteration with l =0 has been made to evaluate c 2
and its derivatives at
. Note that the presence of a Marquardt value l ¹ 0 makes the
matrix artificially stiff, resulting in an underestimation of the standard
deviations. The derivative of
with respect to
xi is then given by
(11)
The coefficient of the partial of the second
derivative array is exactly zero at the minimum of c 2.
(12)
Taking the derivative of equation 4 with
respect to fI
(13)
Thus
(14)
Specializing equation 10 for a single
component of and inserting equation 14 yields
(15)
At this point, two assumptions have been
made. Firstly it has been assumed that there are sufficient data that the
change of a single data point gives a new
close enough to the old
that the new is within
the radius of convergence of a quadratic expansion about the old. Secondly it
has been assumed that the minimum is sufficiently small and that there are
enough points that
(16)
Both of these assumptions are usually valid,
a third more questionable assumption greatly simplifies equation 15. Assume
that for all i
, then the sum over i in equation 15 is exactly the same as
the sum in equation 6 leading to
(17)
This justifies the name error matrix name of
associated with the inverse of the second derivative matrix. The factor of two
is different from that found in the usual textbooks which comes from the fact
that the derivatives in equations 2 and 5 contain a factor of two usually
cancelled out. The factor a is determined by comparing the
minimum c 2 to the expected value and also
correcting for the biasing effect of the m constants used in fitting the data.
(18)
Thus finally the standard deviation estimate
for each constant at the end of the fit is
(19)
This standard deviation represents the
probability of getting the same value of cK if a statistically
equivalent set of data is fitted, stating from this same set of constants.
Spectra
frequently have peaks containing 1 million counts at one energy and 10 counts
at another energy. The relative error on a million-count peak obeying Poisson
statistics is
while those on a 100- count peak is
. This means that a constant that
lowers the relative error of the million count peak is contributes 100 times as
much to lowering the weighted c 2 as does a constant
with the same relative effect in the 100 count region. Furthermore since the
spectrum is viewed on a logarithmic scale, so that both peaks can be seen at
the same time, these improvements in the high count peaks are frequently not
visible on the plots, while the errors in the low count region are all too
obvious. The presence of this highly accurate data introduces a large number of
local minima as the constants fit well one or another feature almost invisible
graphically.
Local minima are especially a problem for the
Newton Raphson method, owing to the fact that it finds extrema, rather than the
global minimum.
The
solution is to make a weighted least squares fit in which the error estimates on
the point are determined by the curve fit problem as much as by Poisson
statistics. In particular the error estimate for each data point is taken to be
. The first
part of the maximum is the Poisson error of the data point and the second is
the curve fitting error representing the fact that with a given number of
constants, the data should only be fitted to a limited percentage accuracy.
In practice the constant b can be determined iteratively. The value of c 2 is given by

While the desired value is

The
approximations are accurate in the case that a few f's with very small Poisson
standard deviations dominate the sums. The c 2
becomes independent of b as the fit improves. The value of L works well between 1 and 10 for complete fits and works best at about
100 when outlier removal is desired.
Converging on the
continuum
The error estimates of data points above the continuum
are made equal to

Those below the continuum are made equal to

The values of A, B and a are adjusted so that the presence of a single outlier for every 16
data points leaves the continuum unmoved. [Robfit pp 50-54] The
exact values are relatively unimportant. This part of fitting the continuum
works because the errors associated with the peaks become very large, and those
associated with low points become very small. It is possible for the method of
the last section to make the error estimates of this section so large as to
have little effect. In this case the continuum fit relies entirely on the
smoothing properties of splines. As a practical matter a value of L on the order of 100 allows the advantage of avoiding the numerous
local minima at the beginning while also systematically discriminating against
peaks and in favor of low lying points.
Splines
Under normal circumstances a spline simply cannot be made to fit peaks.
A spline can be
defined as

With c1=1 and x 1 this is

The spline is the function representing the
data with the smoothest possible second derivatives.
The continuum is fitted either to this spline
or to its exponential. The hard part of fitting splines to data is the fitting
of the knots. A spline, and especially the exponential of a spline, is
extremely non-linear in the location of these knots. It has been out experience
which will be supported by some of the curves below, that this, however, is
essential, in order to avoid oscillations through the data.
Consider
a peak like bump which is 1 at x=0, extending from -1 to 1 and is zero
elsewhere, which we want to fit to a spline. The fact that we want the fit to
be zero for large x, immediately implies that all of terms in the above
expression are zero. The conditions on s are
s(-1)=0, s'(-1)=0, s''(-1)=0, s'''(-1)=0, and s(0)=1. Details1.doc
Call each of this a condition , form
and minimize
this with respect to the constants in the splines. If the knots are selected at
-1,-1/3,0,1/3 and 1 on the basis of symmetry, the constants for the bspline are
easily found yielding.
![]()

Five knots from -1 to +1, approximately twice
the fwhm of the peak are required to produce this. It is interesting to attempt
to produce this result with only three splines. The knots are not linear
coefficients and in this case it is not possible to exactly satisfy all five
conditions. The procedure essentially constructs a cubic beginning just below
the knot at 1 so that


The
practical value of this is that a cubic spline will be able to reproduce a peak
only if it has approximately 5 knots within a few half widths of the peak. In
addition, the natural tendency of the splines defined here is to become large
for small values of x. This is also the natural tendency for spectral data.
Introduction of Knots
The
continuum spline contains knots. The final location of these knots is
determined by minimizing c 2 with respect to them. In the above
section, however, we have seen that placing five of these knots close together
will allow us to fit a peak. This is not desired so in practice each new knot
is introduced at the largest residual separated from the present knots by a
spacing given by the number of channels divided by twice the number of knots.
Example with narrow
peaks
The following is an almost ideal example. It is the
longest count time spectra from a test devised by George P. Lasche, of
Constellation Technology. Kenneth W. Neufeld and Jan A. Nobel at Constellation
Technology's Pinellas Plant just north of St. Petersburg Florida took the
spectrum, which is composed of 16,382 channels over a nominal energy range from
0 to 3 MeV, and was counted for 128 hours.
It was taken with a set of 10 certified
laboratory calibration radioactive sources, each of about 10 microcuries,
mounted together at a distance 20.48 meters (67.2 feet) with a 100%-efficient
high-purity germanium detector.


The white line is a 60 constant approximation to the
continuum, fitted as the exponential of a spline to the entire spectrum with no
outlier suppression. Note that while most of the residuals are positive, there
are also a substantial number of negative ones, indicating regions where the
background is too high. Of most interest is the region around 10000 channels
where the continuum appears to have gone up into a peak. Which on closer
inspection looks quite nice.

The region near the origin is more
disappointing.

Starting
from the above fit, but now biasing, the negative residuals have mostly
disappeard.

The
fit near the origin, however, can be improved.

The full featured fit adds suppression only after half
the requested number of knots have been inserted and is biased to put the first
group of knots at low channel numbers

A
small price for this is paid in the high energy region


On a linear scale.
Example with wide peaks
Source was 462 ± 0.5% kBq of
Eu-152 held at 16.8 ± 0.5 cm from crystal. The detector, an HgI crystal
about 1 in2 x 2mm, was set for 0 to 3000 ± 4 KeV. Amanda M. Gerrish of Constellation Technology made the settings
and took the 16 hour spectrum in March 1998.
48constants fitted from channel 63 to 8060

The nuclides were fitted with absorption a
linear exponential (1 constant) starting at 200 KeV, then this plus a single
spline down to 60 KeV. Then the background was squeezed towards the fit by
using the FALL option.

I then refitted the nuclides allowing
quadratic energy and fwhm. A detailed look at the high-energy peak shows.


The 121 KeV peak was also made into a standard. At
some point during this I allowed the introduction of peaks with cutoff's above
10 and standard tolerances of 200. The background then became unstable with
respect to the nuclides and started down at one point. I restarted the
background fit without the nuclides, but with the peaks left in, then added the
nuclides back in as they had been fitted. The result.





Analysis Report for
C:\hgi\Gleu152.sp
Date: 9/21/98
RobWin Version 0.8
licensed exclusively to Bob Coldwell
Statistics
Error report for
analysis run: Code HUU FAILURE
Chi-square for this
run is 11066.5
Minimum bound of
chi-square for this run is 3559
Calibrations
Energy calibration
in units of keV, x = channel number:
Energy(keV) =
4.747823 +0.3658035*x -1.576291E-7*x^2
FWHM calibration in
units of channels^2, x = channel number:
FWHM^2 = 873.395
-1.116093*x +0.001046525*x^2
Evaluation of
spectrum for presence of isotopes:
Absolute Activity
Normalization: 1
Nuclide Activity
Counts Uncertainty Significance
EU-152,A 17855.60
3506910.00 19050.00 184.00
Significant
unidentified peaks found:
Peak sensitivity
(lowest significance level): 5.00
Channel Energy
+/-Energy Counts +/-Counts
80.14 34.06 <
0.01 1547840 725840.00
84.15 35.53 0.04
650577.00 55410.00
85.56 36.05 0.15
226616.00 57600.00
105.21 43.23 0.05
4776290 775220.00
120.04 48.66 0.09
728373.00 36360.00
173.69 68.28 0.20
308908.00 7603.00
365.58 138.46 0.76
107133.00 4656.00
741.48 275.90 0.21
25277.10 909.60
FIT TO
C:\HGI\GLEU152.SP
IP= 3
8 PEAKS 7998 CHAN,
CHIS= .1024E+05 BCHIS= 1089. NNB,NITB= 256 48 48
CUTOFF= 5.00
CIS(1,2,3)
4.74782319203 .365803512340 -.157629111092E-06
80.136 34.061 .008
1.301 .013 .154784E+07 .2584E+05 3
84.146 35.528 .043
1.449 .021 650577. .5541E+05 1
85.564 36.046 .146
1.779 .051 226616. .5760E+05 1
105.209 43.232 .049
8.049 .148 .477629E+07 .7522E+05 2
120.040 48.657 .095
3.671 .147 728373. .3636E+05 3
173.694 68.281 .205
16.976 .483 308908. 7603. 3
365.585 138.459 .763
34.671 1.436 107133. 4656. 3
741.479 275.897 .206
13.522 .493 25277.1 909.6 2

