Robust minimization
for\robmin.for for\mpbob.for – (files needed for robust minimization)
CALL ROBMIN(CHI,NCON,FR,IEND,INIT)
Chi is the value of χ2(c0) as calculated with the current set Cons(1,…,NCON). Fr is the fractional decrease in χ2 desired. Fr frequently starts as 0.9999 and is updated in Robmin. Iend is the remaining number of iterations. Robmin sets this number to 0 when no further minimization is possible. Init is set to 1 before the first call to Robmin. Robmin sets init to zero after the first call. When Robmin can find no lower values it sets init to -5 if the reason for this is an Fr within 10-10 of 1, to -7 if iend has been decremented to 1, to -6 if the difference between the last χ2, the predicted χ2. And the current χ2 has become less than χ2×10-9.
PARAMETER(MAXCON=5,IT=63)
Maxcon is the maximum number of constants to be varied. It needs to be changed everywhere in the file for\robmin.for. It has been made 5 in the master file for test purposes. It is the number of integers used by the multiple precision calls. The value 63 is the largest natural value for multiplications and is the recommended size. It corresponds to approximately 250 digits of accuracy. Robmin works well with values as small as It=15[1], the cpu time to invert a matrix in multiple precision scales as It2. Most other multiple precision operations require little or no extra time as It is increased. The value of IT, if changed, needs to be changed in both mpbob.for and robmin.for.
COMMON/PASSRM/CONS(MAXCON),PC(MAXCON),MPPC(IT+1,MAXCON),
2 PPCC(MAXCON*(MAXCON+1)/2),SM(MAXCON),BBEST(MAXCON),CHB,
3 MPPPCC(IT+1,MAXCON*(MAXCON+1)/2),MP
The above common needs to be in the calling routine.
Cons begins as the set of c0 values that were used to calculate the value of χ2(c0) which is chi in the call to Robmin. After the call to Robmin the values of Cons are such that χ2(c) = Fr × χ2(c0). PC(J) is . MPPC(1..IT+1,J) is the multiple precision version of PC. PPCC(KIJ(I,J)), with KIJ an integer function in the robmin file, is . MPPPCC(1...IT+1,KIJ(I,J)) is the multiple precision version of PPCC(KIJ(I,J). PC, MPPC, PPCC, and MPPPCC need to be calculated in the calling code. The final integer MP in /PASSRM/ tells Robmin how much multiple precision to use. MP = 0, means that MPPC, and MPPPCC do not need to be calculated by the calling routine. MP=1 means that MPPC and MPPCC have been calculated but that the time saving half multiple matrix inverter should be used when the double precision Sminv sets fails – a failure flag is set by Sminv. MP=2, 3 or 4 means that the full multiple precision matrix inverter should be used if this flag is set. MP = 5 means that the full multiple precision matrix inverter should always be used.
SM(J) multiplies the Marquardt parameter in the J’th dimension. It is calculated by routines inside the robmin file. BBEST(J) is an internal value for the best estimate of cJ.
Iprint is in a data statement at the top of the subroutine. It is normally set to 1 leading to the output below. The letter to the far right is N or Y if sminv is effectively making the matrix inversion. It becomes O or Z if MPHsminv or MPsminv is making the final inversion.
Multiple precision in calling code
PC and PPCC are double precision MPPC and MPPPC are multiple precision. All four are in common /PASSRM/
– initialization
DO I=1,NCON
PC(I)=0D0
IF(MP.NE.0)CALL MPINIT(MPPC(1,I))
ENDDO
NVD=NCON*(NCON+1)/2
DO I=1,NVD
IF(MP.NE.0)CALL MPINIT(MPPPCC(1,I))
PPCC(I)=0D0
ENDDO
Incrementing MPPC and MPPPCC
PC(K)=PC(K)+TEMP
IF(MP.EQ.1.OR.MP.EQ.2)CALL DplMP(TEMP,MPPC(1,K))
IF(MP.GE.3)THEN
C MPP is the multiple precision version of PC – which is calculted by some polys
CALL DXMP(WPC,MPP(1,I),MPTEMP)
CALL MPPLMP(MPTEMP,MPPC(1,K),MPPC(1,K))
ENDIF
Just before calling robmin to get maximum doubple precsion accuracy
IF(MP.NE.0)THEN
DO I=1,NCON
PC(I)=DPMP(MPPC(1,I))
ENDDO
DO I=1,NVD
PPCC(I)=DPMP(MPPPCC(1,I))
ENDDO
ENDIF
R,CHI,CHB,CHL .100000000 .229264167E+34 .100000000+307 .500000000+307N
R,CHI,CHB,CHL .100000000 .221778921E+37 .229250224E+33 .229264167E+34Y[2]
REV= 1 IEND= 399 ALAM 78.07
R,CHI,CHB,CHL .493892807 .110416078E+34 .113230854E+34 .229264167E+34Y
R,CHI,CHB,CHL .243930105 .591287732E+33 .269335620E+33 .110416078E+34Y
R,CHI,CHB,CHL .595018962E-01 .140206881E+38 .351827503E+32 .591287732E+33Y
REV= 1 IEND= 397 ALAM 244.14
R,CHI,CHB,CHL .471119050 .255704526E+33 .278565547E+33 .591287732E+33Y
R,CHI,CHB,CHL .221953159 .837714416E+32 .567540142E+32 .255704526E+33Y
R,CHI,CHB,CHL .500000000E-01 .602749019E+32 .418885933E+31 .837714416E+32Y
Chi-square is a function of M parameters c = {c1, …,cM}. It can be expanded about any set of parameters c0 as
The derivatives of (1.1) with respect to the parameters are
At the minimum
An accurate set of the first derivatives is needed to be sure that the minimum has been reached, but an approximate set of second derivatives can be used to iterate c towards its minimum value. Multiply (1.2) by the inverse of the second derivative matrix and sum on I.
(1.4)
The last sum on I is a Kronecker delta
(1.5)
This is easily summed to yield
(1.6)
Then rearranging the terms gives the iterative form
..\solving\Newton-Raphson-Marquardt.doc
The Newton Raphson method has quadratic convergence when it works. That is the error on each succeeding step squares the error. 1D-1à1D-2à1D-4à1D-8à1D-16 à truncation limit. This seems to almost never happen. What usually happens in physics is

Figure 1 Lennard Jones potential between two neutral atoms

Figure 2 Cartoon of χ2 versus parameter set
A very common situation is cartooned in Figure 2. The parameters are at c0 as marked by the arrow on the right. Most set of constant changes imply the bump to the left of this arrow making it a local minimum. One particular set of changes will result in the arrow to the far left which raises the value as lot. The best change is in the middle which only slightly lowers the value of χ2 but sets the stage for subsequent steps to find the local minimum in the center that is lower than the one on the right.
Add a limiting term to (1.1) so that the chi-square to be minimized on a given step is (2.1)
The exponential form for the coefficient of the added term is to emphasize that this term is always greater than 0. Equation (2.1) can also be written as
Equation (2.2) differs from (1.1) only in the addition of the term eλSJ2 to the diagonal elements of the second derivative matrix. The blue curve in Figure 2 shows χLim2 versus c for a single value of λ, the red curve shows χLim2 versus c for a value of λ less than that in the blue curve. As in the equations following (1.2) it is possible to go directly to the arrows marking the minimum of these curves in figure 2. The second derivative array is
(2.3)
All other steps leading to (1.7) are the same leading to
Using cmin in (2.2) to gives the value for . This is the value at the arrows in figure 2.
With l a very large negative value and all SJ’s the same, eλ is a Marquardt parameter that requests the smallest coefficients possible for both linearly dependent combinations. This solves the problem of linear dependencies and makes the matrices in principle invertible. For sufficiently large λ the value at the minimum remains χ02. The cJ’s for different values of J can be very different in their non-linearity. The relative sizes of the SJ’s can be found from the way the first derivatives relate to the second derivatives in the minimization process. Smvals.doc .htm.
Define
This function is defined in the Robmin file as
FUNCTION FOFLAM(MPALAM,IMB)
The function used by Robmin is that defined in (3.1) minus Fr, the fractional lowering desired.
The parameter λ is multiple precision owing to the need to very precisely find a “best” value for F(λ). IMB is an integer used to pass the type of matrix inversion needed back to Robmin.
IMPLICIT REAL*8 (A-H,O-Z)
PARAMETER (MAXCON=5,IT=63)
COMMON/PTOF/CHI,FR,NCON
COMMON/PASSRM/CONS(MAXCON),PC(MAXCON),MPPC(IT+1,MAXCON),
2 PPCC(MAXCON*(MAXCON+1)/2),SM(MAXCON),B(MAXCON),AJP,
3 MPPPCC(IT+1,MAXCON*(MAXCON+1)/2),MP
DIMENSION MPALAM(*)
SAVE
CALL GSOLVE(CHI,PC,MPPC,PPCC,MPPPCC,SM,MPALAM,NCON,AJP,B,MP,IMB)
FOFLAM=AJP/CHI-FR
RETURN
END
The subroutine Gsolve inverts the matrix, finds the constant changes needed to lower χ2 and stores then in B(maxcon). The matrix inversion with MP=0 always uses Sminv, if the matrix cannot be inverted, the changes stored in B Are set to zero so that F(λ) becomes 1. If MP=1, the routine MPHsminv is used to invert the matrix. This routine uses multiple precision to avoid the zero that stopped sminv, but then reverts to double precision to conserve cpu time. If MP = 2, 3, or 4, the routine used when Sminv fails is MPSminv which has double precision for the entire inversion. If MP = 5, MPSminv is always used.

Figure 3 Values of F(λ) from brack in the event that 0 is not crossed. (Minimization with 12 constants and 400,1 in aigau.dir)
The data in figure 3 is from the bracketing/bli section of Robmin in a case where F was never able to get to the low value requested. It comes from an attempt to fit the complementary error function to 16 digit precision with a Pade approximate[3]. The Matrix inversion is multiple precision with enough digits to avoid the Pade approximate linear dependencies. The value of F(λ) is not allowed to become bigger than 1 in figure 3, but it really is possibly as large as 1025 on the far left of figure 3. Values on the nearly vertical point sets can extend from -1025 to 1025. As lambda increases the right side of figure 3 looks much more like the function expected.
For large values of lambda the new constants are constrained by the Marquardt parameter to be close to the old ones. This means that the local minimum around the far left arrow in #Figure_2 is being explored. Then as the value of lambda decreases, the regions around the middle and right arrow are explored. These are large projections of a quadratic expansion of χ2. They are not accurate and in fact frequently represent local maxima and saddle points rather than minima. The very low values should not be ignored. Usually they lead to a set of constants with a much higher actual χ2 as shown graphically by the far left arrow in #Figure_2, but occasionally they lead to a complete change in c0 such that a χ2 that was slowly converging to 1015 will have a predicted value of -1016 and an actual value of 1012. for\robmin.for
The equation to solve is (3.1) – Fr for λ and at the same time for c(λ)
Figure 4 Cartoon of F versus l
The thin blue line in figure 4 is the desired value of Fr. The figure shows that there are usually at least two solutions to (4.1). The one on the left is the most desirable. It can happe, however, that the blue line is below the bottom of the dip in Figure 4. In this case there is no solution, and the desired return is the value at the bottom of the dip. The general problem of finding two values of (4.1) such that one is >0 and one < 0 is discussed in
Bracketing.doc .htm BracketBli.doc htm
The code for this in Robmin is
SUBROUTINE BRACK(FEXT,BEG,MPX,ENDC,MPX1,MPX2,F1,F2,SUCCES)
Fext is the function called which in this case is Foflam. This is a function of a multiple precision MPX. Beg is the beginning location and Endc the ending location for the search. MPX is a multiple precision location at which the search starts. On a successful return F1=Fext(MPX1) < 0 and F2=Fext(MPX2) > 0. If this happens and Sminv did the last inversion, succes = Y, if MPHsminv or MPsminv did the last inversion success = Z, if a value is found equal to 0 using Sminv success = P, using MPHsminv or MPsminv, success = Q.
To stay on the right side of figure 4, the search starts at the far right adding up to 41 terms at unequal distances moving left. The positions of these terms cluster close together near the initial MPX and become further spaced towards the edges. It can happen that none of these is less than zero. In this case a weighted BLI is used to add up to NP=512 points at the positions most likely to produce a value less than 0. If none of these is less than zero, the lowest value found is returned with success = N.
In one dimension
(4.2)
for the c which makes f(c) equal to zero.
(4.3)
the actual derivatives as shown in the graph
would
shoot the function all over the map and in wrong directions in general. The value of bracketing is that the actual
derivatives are replaced by a linear interpolation, which becomes a numerical
derivative in the limit that the spacing becomes small. That is by first finding a value above zero
and one below zero one can replace
(4.4)
Then use Newton’s method as before always replacing c+ or c- by the next better estimate. --- Note that it is essential to keep both one value above zero and one above. This requires testing each estimate.
SUBROUTINE NEWT(MPX1,F1,MPX2,F2,FUN1,NCALL)
C THIS ROUTINE USES (FUNCP-FUNCM)/(XP-XM) TO ESTIMATE THE
C DERIVATIVE IN THE VICINITY OF FUNC = 0. FUNCP*FUNCM < 0.
C FINDS ABS(X1-X2) < PRECX OR MAX(ABS(F1),ABS(F2))<PRECF
DATA PRECX/1D-120/,NCALLM/64/
The Newton’s method used here normally squares the error on each iteration so only a very few extra iterations are needed to reach the 10-120 specified. This rather extreme accuracy is due to the nearly vertical lines in figure 3. It frequently happens that F(λ-) is orders of magnitude below Fr while F(λ+) is orders of magnitude high even though the difference between these two values of λ is less than 10-120. The c for the lower one is always passed back to the code. This happens when c is quite different from c0. Usually the value represents an incorrect quadratic extrapolation of χ2. The large positive F is due to a maximum value projected for a very large step in c, while the large negative value is due to a minimum projected for a very large step in c. Usually either c is such that χ2(c) is greater than χ2(c0). In this case Robmin sets Fr closer to 1 making the next step occur in the smoother region to the right in figure 3. Occasionally the c associated with the minimum corresponds to the middle arrow in figure 3. The projected χLim2 of -1025 will not be found for the positive definite χ2(c), but a drop of one to two orders of magnitude followed by a quick minimization in the new region of c happens rather frequently.
[2] The failure to find a term here can make the routine stop right at the beginning. This value is below the current chi, so it keeps trying. The 0.1 is used at IREV = 5 to get below the numerical noise caused by adding multiple precision changes to double precision constants. It could be as high as 0.99 and probably cause no problem.
[3] Pade approximates are nearly linearly dependent ..\..\Fittery\nlfit\LinDep.doc.