Robust Fitting of Spectra to Splines with Variable Knots.

R. L. Coldwell, University of Florida and Constellation Technology

Abstract

Spectra consist of continuum features that vary over many channels, and peaks that vary over few channels. In a fit to the continuum the peaks appear as outliers. Robust methods in which the weights associated with the peaks are reduced allow the continuum to be fitted almost independently of the peaks. This requires a very smooth background. Cubic splines, which are continuous with continuous first and second derivatives, are a good choice for this task when the locations of the discontinuities in the third derivatives, the knots, are included in the parameters optimized in the fitting process. An extension to the Marquardt method allows a Newton-Raphson method to be used in this optimization.

Introduction

Spectra consist of continuum and peaks. It seems intuitive that the continuum should be smooth and as close to the bottom of the peaks as possible. The code RobWin fits spectra by finding the continuum, then the peaks. The physics needed to determine the continuum is the fact that it and its first two derivatives are continuous and that the locations in the second derivatives are optimally placed. When outlier regression is added to this the continuum can be fitted with no reference to the peaks. The code used in this work is RobWin

                The first few sections give a sketch of the mathematics of the non-linear regression methods that allow the continuum to be fitted with respect to the knot locations. Then an easy example of the continuum for a nuclear spectrum taken with an intrinsic Germanium detector is shown. Finally an 152Eu spectrum take with a thin HgI detector in which the separation of peaks from continuum is not quite so easy is fitted to a continuum, a nuclide response function including 5 efficiency constants, and a few unidentified peaks.

Naïve Newton Raphson Minimization

Define

                 (1)

The object is to minimize c 2with respect to . This requires finding a vector such that for all M components of

                 (2)

First calculating c 2 and its first derivatives at a value . Use these for a Taylor series expansion about such that near this point the first derivatives at are given by

                 (3)

                The Newton-Raphson method for finding the minimum is to truncate the above expansion to the terms shown and solve for the value of such that the first derivatives are all zero.

The first derivative vector at required for this expansion is given by

                            (4)

While the second derivative array at is given by

               (5).

The second sum contains the term , which as the solution is approached oscillates with alternating signs about zero. The first term contains no such oscillating small term. The minimization of c 2 requires that its partials be set exactly to zero, which means equation 2 needs to be exact. The expansion of this partial as given by equation 3 is an approximation in which cubic and higher terms are already neglected so that very little is lost in equation 5 by dropping the small term is the second partial to yield

                 (6)

Inverting this matrix allows us to immediately find

                 (7)

The Marquardt Parameter

                For the Newton Raphson method to find a minimum in the unadulterated form given above it is necessary that is less than the radius of convergence of equation 3. When this is not true the quantity minimized is +c 2, where is a multidimensional version of the Marquardt parameter. For sufficiently large values of , this added term dominates and the minimum is found at , while in the limit, the Newton Raphson method is recovered.

Two traditional problems of the Newton Raphson method are that of a constant cM which has no effect on c 2 and that of two constants, cL and cL' which are in, effect multiples of each other. Both of these cause the second derivative matrix to have no inverse. The addition of an extremely small l M makes the desired value of cM become uniquely zero while extremely small values of l L and l L' make the desired cL=, cL'. In both cases any very small values of make the matrix invertable, solving the problem and giving the false impression the exact value of is unimportant . For non-linear system with constants in the exponential, the problem in general is that the quadratic approximation in equation 2 predicts the minimum at a far enough from that it is beyond the radius of convergence of the expansion. It is then not only not a minimum, but frequently its calculation causes a computer overflow error. The solution is to first find the cK that are causing this and then to make l K the proper size for optimal convergence. The value of cK predicted for the next step as a function of l K is typically of the form shown to the right with the arrow marking the desired value. For large values of l K the new value of cK is equal to the old value, while for small values the predicted cK causes machine overflow.

The Robmin method used in this work solves iteratively for the decrease in c 2 that can be achieved in each step. It uses the difference between the vector of first derivatives and the array of second derivatives for the predicted changes in to find the ratio of the l K's to each other. Finally it uses a one dimensional Newton's method to find the final for which the predicted c 2 is as much lower than the present one as can be achieved. The method is working properly far from the minimum when every third prediction results in a lower c 2. In the event that convergence appears to be established, but is very slow Aitkin's extrapolation is used in the early steps. At the extrema in c 2, . This method and the code NLFIT which is one of its simplest implementations has been drug out in a series of class lectures starting in ..\..\class/CROBM96.html

std dev

The Standard Deviation in

                The difference between any function of N statistically independent points and its true value is due to differences introduced by these points.

.         (8)      

The difference squared is

    (9)                    

Its expectation value when averaged over an ensemble of equivalent data sets is

              (10)

 

Assume that the changes are sufficiently small that equation 7 is accurate enough to represent c 2 when one value of xi is changed within its standard deviation. Further assume that one last iteration with l =0 has been made to evaluate c 2 and its derivatives at . Note that the presence of a Marquardt value l ¹ 0 makes the matrix artificially stiff, resulting in an underestimation of the standard deviations. The derivative of with respect to xi is then given by

              (11)

The coefficient of the partial of the second derivative array is exactly zero at the minimum of c 2.

                                (12)

Taking the derivative of equation 4 with respect to fI

     (13)

Thus

                        (14)

 

Specializing equation 10 for a single component of and inserting equation 14 yields

(15)

At this point, two assumptions have been made. Firstly it has been assumed that there are sufficient data that the change of a single data point gives a new close enough to the old that the new is within the radius of convergence of a quadratic expansion about the old. Secondly it has been assumed that the minimum is sufficiently small and that there are enough points that

             (16)

Both of these assumptions are usually valid, a third more questionable assumption greatly simplifies equation 15. Assume that for all i , then the sum over i in equation 15 is exactly the same as the sum in equation 6 leading to

(17)

 

This justifies the name error matrix name of associated with the inverse of the second derivative matrix. The factor of two is different from that found in the usual textbooks which comes from the fact that the derivatives in equations 2 and 5 contain a factor of two usually cancelled out. The factor a is determined by comparing the minimum c 2 to the expected value and also correcting for the biasing effect of the m constants used in fitting the data.

     (18)

Thus finally the standard deviation estimate for each constant at the end of the fit is

        (19)

This standard deviation represents the probability of getting the same value of cK if a statistically equivalent set of data is fitted, stating from this same set of constants.

Practical Curve Fitting

                Spectra frequently have peaks containing 1 million counts at one energy and 10 counts at another energy. The relative error on a million-count peak obeying Poisson statistics is while those on a 100- count peak is. This means that a constant that lowers the relative error of the million count peak is contributes 100 times as much to lowering the weighted c 2 as does a constant with the same relative effect in the 100 count region. Furthermore since the spectrum is viewed on a logarithmic scale, so that both peaks can be seen at the same time, these improvements in the high count peaks are frequently not visible on the plots, while the errors in the low count region are all too obvious. The presence of this highly accurate data introduces a large number of local minima as the constants fit well one or another feature almost invisible graphically.

Local minima are especially a problem for the Newton Raphson method, owing to the fact that it finds extrema, rather than the global minimum.

                The solution is to make a weighted least squares fit in which the error estimates on the point are determined by the curve fit problem as much as by Poisson statistics. In particular the error estimate for each data point is taken to be . The first part of the maximum is the Poisson error of the data point and the second is the curve fitting error representing the fact that with a given number of constants, the data should only be fitted to a limited percentage accuracy.

In practice the constant b can be determined iteratively. The value of c 2 is given by

While the desired value is

                The approximations are accurate in the case that a few f's with very small Poisson standard deviations dominate the sums. The c 2 becomes independent of b as the fit improves. The value of L works well between 1 and 10 for complete fits and works best at about 100 when outlier removal is desired.

Converging on the continuum

          The error estimates of data points above the continuum are made equal to

Those below the continuum are made equal to

               

The values of A, B and a are adjusted so that the presence of a single outlier for every 16 data points leaves the continuum unmoved. [Robfit pp 50-54] The exact values are relatively unimportant. This part of fitting the continuum works because the errors associated with the peaks become very large, and those associated with low points become very small. It is possible for the method of the last section to make the error estimates of this section so large as to have little effect. In this case the continuum fit relies entirely on the smoothing properties of splines. As a practical matter a value of L on the order of 100 allows the advantage of avoiding the numerous local minima at the beginning while also systematically discriminating against peaks and in favor of low lying points.

Splines

Under normal circumstances a spline simply cannot be made to fit peaks.

A spline can be defined as

With c1=1 and x 1 this is

The spline is the function representing the data with the smoothest possible second derivatives.

The continuum is fitted either to this spline or to its exponential. The hard part of fitting splines to data is the fitting of the knots. A spline, and especially the exponential of a spline, is extremely non-linear in the location of these knots. It has been out experience which will be supported by some of the curves below, that this, however, is essential, in order to avoid oscillations through the data.

 

                Consider a peak like bump which is 1 at x=0, extending from -1 to 1 and is zero elsewhere, which we want to fit to a spline. The fact that we want the fit to be zero for large x, immediately implies that all of terms in the above expression are zero. The conditions on s are

                s(-1)=0,   s'(-1)=0,  s''(-1)=0, s'''(-1)=0, and s(0)=1. Details1.doc

Call each of this a condition , form and minimize this with respect to the constants in the splines. If the knots are selected at -1,-1/3,0,1/3 and 1 on the basis of symmetry, the constants for the bspline are easily found yielding.

Five knots from -1 to +1, approximately twice the fwhm of the peak are required to produce this. It is interesting to attempt to produce this result with only three splines. The knots are not linear coefficients and in this case it is not possible to exactly satisfy all five conditions. The procedure essentially constructs a cubic beginning just below the knot at 1 so that

 

 

                The practical value of this is that a cubic spline will be able to reproduce a peak only if it has approximately 5 knots within a few half widths of the peak. In addition, the natural tendency of the splines defined here is to become large for small values of x. This is also the natural tendency for spectral data.

Introduction of Knots

                The continuum spline contains knots. The final location of these knots is determined by minimizing c 2 with respect to them. In the above section, however, we have seen that placing five of these knots close together will allow us to fit a peak. This is not desired so in practice each new knot is introduced at the largest residual separated from the present knots by a spacing given by the number of channels divided by twice the number of knots.

Example with narrow peaks

 

          The following is an almost ideal example. It is the longest count time spectra from a test devised by George P. Lasche, of Constellation Technology. Kenneth W. Neufeld and Jan A. Nobel at Constellation Technology's Pinellas Plant just north of St. Petersburg Florida took the spectrum, which is composed of 16,382 channels over a nominal energy range from 0 to 3 MeV, and was counted for 128 hours.

It was taken with a set of 10 certified laboratory calibration radioactive sources, each of about 10 microcuries, mounted together at a distance 20.48 meters (67.2 feet) with a 100%-efficient high-purity germanium detector.

          The white line is a 60 constant approximation to the continuum, fitted as the exponential of a spline to the entire spectrum with no outlier suppression. Note that while most of the residuals are positive, there are also a substantial number of negative ones, indicating regions where the background is too high. Of most interest is the region around 10000 channels where the continuum appears to have gone up into a peak. Which on closer inspection looks quite nice.

 

The region near the origin is more disappointing.

                Starting from the above fit, but now biasing, the negative residuals have mostly disappeard.

                The fit near the origin, however, can be improved.

          The full featured fit adds suppression only after half the requested number of knots have been inserted and is biased to put the first group of knots at low channel numbers

                A small price for this is paid in the high energy region

 

On a linear scale.

Example with wide peaks

Source was 462 ± 0.5% kBq of Eu-152 held at 16.8 ± 0.5 cm from crystal. The detector, an HgI crystal about 1 in2 x 2mm, was set for 0 to 3000 ± 4 KeV. Amanda M. Gerrish of Constellation Technology made the settings and took the 16 hour spectrum in March 1998.

 

48constants fitted from channel 63 to 8060

The nuclides were fitted with absorption a linear exponential (1 constant) starting at 200 KeV, then this plus a single spline down to 60 KeV. Then the background was squeezed towards the fit by using the FALL option.

I then refitted the nuclides allowing quadratic energy and fwhm. A detailed look at the high-energy peak shows.

 

 

          The 121 KeV peak was also made into a standard. At some point during this I allowed the introduction of peaks with cutoff's above 10 and standard tolerances of 200. The background then became unstable with respect to the nuclides and started down at one point. I restarted the background fit without the nuclides, but with the peaks left in, then added the nuclides back in as they had been fitted. The result.

 

Analysis Report for C:\hgi\Gleu152.sp

Date: 9/21/98

RobWin Version 0.8 licensed exclusively to Bob Coldwell

 

Statistics

Error report for analysis run: Code HUU FAILURE

Chi-square for this run is 11066.5

Minimum bound of chi-square for this run is 3559

 

Calibrations

Energy calibration in units of keV, x = channel number:

Energy(keV) = 4.747823 +0.3658035*x -1.576291E-7*x^2

FWHM calibration in units of channels^2, x = channel number:

FWHM^2 = 873.395 -1.116093*x +0.001046525*x^2

 

Evaluation of spectrum for presence of isotopes:

Absolute Activity Normalization: 1

Nuclide Activity Counts Uncertainty Significance

EU-152,A 17855.60 3506910.00 19050.00 184.00

 

Significant unidentified peaks found:

Peak sensitivity (lowest significance level): 5.00

Channel Energy +/-Energy Counts +/-Counts

80.14 34.06 < 0.01 1547840 725840.00

84.15 35.53 0.04 650577.00 55410.00

85.56 36.05 0.15 226616.00 57600.00

105.21 43.23 0.05 4776290 775220.00

120.04 48.66 0.09 728373.00 36360.00

173.69 68.28 0.20 308908.00 7603.00

365.58 138.46 0.76 107133.00 4656.00

741.48 275.90 0.21 25277.10 909.60

FIT TO C:\HGI\GLEU152.SP

IP= 3

8 PEAKS 7998 CHAN, CHIS= .1024E+05 BCHIS= 1089. NNB,NITB= 256 48 48

CUTOFF= 5.00

CIS(1,2,3) 4.74782319203 .365803512340 -.157629111092E-06

80.136 34.061 .008 1.301 .013 .154784E+07 .2584E+05 3

84.146 35.528 .043 1.449 .021 650577. .5541E+05 1

85.564 36.046 .146 1.779 .051 226616. .5760E+05 1

105.209 43.232 .049 8.049 .148 .477629E+07 .7522E+05 2

120.040 48.657 .095 3.671 .147 728373. .3636E+05 3

173.694 68.281 .205 16.976 .483 308908. 7603. 3

365.585 138.459 .763 34.671 1.436 107133. 4656. 3

741.479 275.897 .206 13.522 .493 25277.1 909.6 2