Robust minimization

  1. #Coding Details
    1. #COMMON_PASSRM
    2. #Multiple_precision_in_calling_code
    3. #Robmin_output
  2. #Introduction
  3. #Marquardt parameter
  4. #F(λ)
  5. #Bracketing
  6. #Newton’s Method

Coding Details

for\robmin.for for\mpbob.for – (files needed for robust minimization)

      CALL ROBMIN(CHI,NCON,FR,IEND,INIT)

Chi is the value of χ2(c0) as calculated with the current set Cons(1,…,NCON).  Fr is the fractional decrease in χ2 desired.  Fr frequently starts as 0.9999 and is updated in Robmin.  Iend is the remaining number of iterations.  Robmin sets this number to 0 when no further minimization is possible.  Init is set to 1 before the first call to Robmin.  Robmin sets init to zero after the first call.  When Robmin can find no lower values it sets init to -5 if the reason for this is an Fr within 10-10 of 1, to -7 if iend has been decremented to 1, to -6 if the difference between the last χ2, the predicted χ2. And the current χ2 has become less than χ2×10-9.

      PARAMETER(MAXCON=5,IT=63)

Maxcon is the maximum number of constants to be varied.  It needs to be changed everywhere in the file for\robmin.for.  It has been made 5 in the master file for test purposes.  It is the number of integers used by the multiple precision calls.  The value 63 is the largest natural value for multiplications and is the recommended size.  It corresponds to approximately 250 digits of accuracy.  Robmin works well with values as small as It=15[1], the cpu time to invert a matrix in multiple precision scales as It2.  Most other multiple precision operations require little or no extra time as It is increased.  The value of IT, if changed, needs to be changed in both mpbob.for and robmin.for.

      COMMON/PASSRM/CONS(MAXCON),PC(MAXCON),MPPC(IT+1,MAXCON),

     2 PPCC(MAXCON*(MAXCON+1)/2),SM(MAXCON),BBEST(MAXCON),CHB,

     3 MPPPCC(IT+1,MAXCON*(MAXCON+1)/2),MP

The above common needs to be in the calling routine. 

Cons begins as the set of c0 values that were used to calculate the value of χ2(c0) which is chi in the call to Robmin.  After the call to Robmin the values of Cons are such that χ2(c) = Fr × χ2(c0).  PC(J) is . MPPC(1..IT+1,J) is the multiple precision version of PC.  PPCC(KIJ(I,J)), with KIJ an integer function in the robmin file, is .  MPPPCC(1...IT+1,KIJ(I,J)) is the multiple precision version of  PPCC(KIJ(I,J).  PC, MPPC, PPCC, and MPPPCC need to be calculated in the calling code.  The final integer MP in /PASSRM/ tells Robmin how much multiple precision to use.  MP = 0, means that MPPC, and MPPPCC do not need to be calculated by the calling routine.  MP=1 means that MPPC and MPPCC have been calculated but that the time saving half multiple matrix inverter should be used when the double precision Sminv sets fails – a failure flag is set by Sminv.  MP=2, 3 or 4 means that the full multiple precision matrix inverter should be used if this flag is set.  MP = 5 means that the full multiple precision matrix inverter should always be used.

            SM(J) multiplies the Marquardt parameter in the J’th dimension.  It is calculated by routines inside the robmin file.  BBEST(J) is an internal value for the best estimate of cJ

            Iprint is in a data statement at the top of the subroutine.  It is normally set to 1 leading to the output below.  The letter to the far right is N or Y if sminv is effectively making the matrix inversion.  It becomes O or Z if MPHsminv or MPsminv is making the final inversion.

Multiple precision in calling code

            PC and PPCC are double precision MPPC and MPPPC are multiple precision.  All four are in common /PASSRM/

 – initialization

      DO  I=1,NCON

        PC(I)=0D0

        IF(MP.NE.0)CALL MPINIT(MPPC(1,I))

      ENDDO

      NVD=NCON*(NCON+1)/2

      DO I=1,NVD

        IF(MP.NE.0)CALL MPINIT(MPPPCC(1,I))

        PPCC(I)=0D0

      ENDDO

Incrementing MPPC and MPPPCC

          PC(K)=PC(K)+TEMP

          IF(MP.EQ.1.OR.MP.EQ.2)CALL DplMP(TEMP,MPPC(1,K))

          IF(MP.GE.3)THEN

C MPP is the multiple precision version of PC – which is calculted by some polys

            CALL DXMP(WPC,MPP(1,I),MPTEMP)

            CALL MPPLMP(MPTEMP,MPPC(1,K),MPPC(1,K))

          ENDIF

Just before calling robmin to get maximum doubple precsion accuracy

      IF(MP.NE.0)THEN

        DO I=1,NCON

          PC(I)=DPMP(MPPC(1,I))

        ENDDO

        DO I=1,NVD

          PPCC(I)=DPMP(MPPPCC(1,I))

        ENDDO

      ENDIF

 

 

 

Robmin output

R,CHI,CHB,CHL  .100000000      .229264167E+34  .100000000+307  .500000000+307N

R,CHI,CHB,CHL  .100000000      .221778921E+37  .229250224E+33  .229264167E+34Y[2]

REV=    1 IEND=  399 ALAM   78.07

R,CHI,CHB,CHL  .493892807      .110416078E+34  .113230854E+34  .229264167E+34Y

R,CHI,CHB,CHL  .243930105      .591287732E+33  .269335620E+33  .110416078E+34Y

R,CHI,CHB,CHL  .595018962E-01  .140206881E+38  .351827503E+32  .591287732E+33Y

REV=    1 IEND=  397 ALAM  244.14

R,CHI,CHB,CHL  .471119050      .255704526E+33  .278565547E+33  .591287732E+33Y

R,CHI,CHB,CHL  .221953159      .837714416E+32  .567540142E+32  .255704526E+33Y

R,CHI,CHB,CHL  .500000000E-01  .602749019E+32  .418885933E+31  .837714416E+32Y

 

Introduction

            Chi-square is a function of M parameters c = {c1, …,cM}.  It can be expanded about any set of parameters c0 as

  (1.1)

The derivatives of (1.1) with respect to the parameters are

                                                                    (1.2)

At the minimum

                                                                                                      (1.3)

An accurate set of the first derivatives is needed to be sure that the minimum has been reached, but an approximate set of second derivatives can be used to iterate c towards its minimum value.  Multiply (1.2) by the inverse of the second derivative matrix and sum on I.

                                (1.4)

The last sum on I is a Kronecker delta

                                                                (1.5)

This is easily summed to yield

                                                                               (1.6)

Then rearranging the terms gives the iterative form

                                                                                    (1.7)

..\solving\Newton-Raphson-Marquardt.doc

            The Newton Raphson method has quadratic convergence when it works.  That is the error on each succeeding step squares the error.  1D-1à1D-2à1D-4à1D-8à1D-16 à truncation limit.  This seems to almost never happen.  What usually happens in physics is

Figure 1  Lennard Jones potential between two neutral atoms

 

  1. A function that is not quadratic at all as in Figure 1  
  2. A saddle point -  (1.3) is satisfied for every for every saddle point.
  3. Linear dependencies – Two combinations of parameters have exactly the same lowering effect on chi-square.    When this happens there is no unique inverse and naïve matrix inversion methods fail.
  4. A local minimum -  (1.3) is satisfied for every local minimum.
  5. A local or global maximum -  (1.3) is satisfied for every local maximum.

 

 

 

Figure 2 Cartoon of χ2 versus parameter set

            A very common situation is cartooned in Figure 2.  The parameters are at c0 as marked by the arrow on the right.  Most set of constant changes imply the bump to the left of this arrow making it a local minimum.  One particular set of changes will result in the arrow to the far left which raises the value as lot.  The best change is in the middle which only slightly lowers the value of χ2 but sets the stage for subsequent steps to find the local minimum in the center that is lower than the one on the right.  

Marquardt parameter

            Add a limiting term to (1.1) so that the chi-square to be minimized on a given step is (2.1)

The exponential form for the coefficient of the added term is to emphasize that this term is always greater than 0.  Equation (2.1) can  also be written as

(2.2)

            Equation (2.2) differs from (1.1) only in the addition of the term eλSJ2 to the diagonal elements of the second derivative matrix.  The blue curve in Figure 2 shows χLim2 versus c for a single value of λ, the red curve shows χLim2 versus c for a value of λ less than that in the blue curve.  As in the equations following (1.2) it is possible to go directly to the arrows marking the minimum of these curves in figure 2.  The second derivative array is

          (2.3)

All other steps leading to (1.7) are the same leading to

          (2.4)

Using cmin in (2.2) to gives the value for This is the value at the arrows in figure 2. 

With l a very large negative value and all SJ’s the same, eλ is a Marquardt parameter that requests the smallest coefficients possible for both linearly dependent combinations.  This solves the problem of linear dependencies and makes the matrices in principle invertible.   For sufficiently large λ the value at the minimum remains χ02.  The cJ’s for different values of J can be very different in their non-linearity. The relative sizes of the SJ’s can be found from the way the first derivatives relate to the second derivatives in the minimization process.  Smvals.doc .htm.

F(λ)

Define

(3.1)

This function is defined in the Robmin file as

      FUNCTION FOFLAM(MPALAM,IMB)

The function used by Robmin is that defined in (3.1) minus Fr, the fractional lowering desired.

The parameter λ is multiple precision owing to the need to very precisely find a “best” value for F(λ).  IMB is an integer used to pass the type of matrix inversion needed back to Robmin. 

      IMPLICIT REAL*8 (A-H,O-Z)

      PARAMETER (MAXCON=5,IT=63)

      COMMON/PTOF/CHI,FR,NCON

      COMMON/PASSRM/CONS(MAXCON),PC(MAXCON),MPPC(IT+1,MAXCON),

     2 PPCC(MAXCON*(MAXCON+1)/2),SM(MAXCON),B(MAXCON),AJP,

     3 MPPPCC(IT+1,MAXCON*(MAXCON+1)/2),MP

      DIMENSION MPALAM(*)

      SAVE

      CALL GSOLVE(CHI,PC,MPPC,PPCC,MPPPCC,SM,MPALAM,NCON,AJP,B,MP,IMB)

      FOFLAM=AJP/CHI-FR

      RETURN

      END

The subroutine Gsolve inverts the matrix, finds the constant changes needed to lower χ2 and stores then in B(maxcon).  The matrix inversion with MP=0 always uses Sminv, if the matrix cannot be inverted, the changes stored in B Are set to zero so that F(λ) becomes 1.  If MP=1, the routine MPHsminv is used to invert the matrix.  This routine uses multiple precision to avoid the zero that stopped sminv, but then reverts to double precision to conserve cpu time.  If MP = 2, 3, or 4, the routine used when Sminv fails is MPSminv which has double precision for the entire inversion.  If MP = 5, MPSminv is always used.

 

Figure 3 Values of F(λ) from brack in the event that 0 is not crossed.  (Minimization with 12 constants and 400,1 in aigau.dir)

            The data in figure 3 is from the bracketing/bli section of Robmin in a case where F was never able to get to the low value requested.  It comes from an attempt to fit the complementary error function to 16 digit precision with a Pade approximate[3].  The Matrix inversion is multiple precision with enough digits to avoid the Pade approximate linear dependencies.  The value of F(λ) is not allowed to become bigger than 1 in figure 3, but it really is possibly as large as 1025 on the far left of figure 3.  Values on the nearly vertical point sets can extend from -1025 to 1025.  As lambda increases the right side of figure 3 looks much more like the function expected.

            For large values of lambda the new constants are constrained by the Marquardt parameter to be close to the old ones.  This means that the local minimum around the far left arrow in #Figure_2 is being explored.  Then as the value of lambda decreases, the regions around the middle and right arrow are explored.  These are large projections of a quadratic expansion of χ2.  They are not accurate and in fact frequently represent local maxima and saddle points rather than minima.  The very low values should not be ignored.  Usually they lead to a set of constants with a much higher actual χ2 as shown graphically by the far left arrow in #Figure_2, but occasionally they lead to a complete change in c0 such that a χ2 that was slowly converging to 1015 will have a predicted value of -1016 and an actual value of 1012for\robmin.for

Bracketing

            The equation to solve is (3.1) – Fr for λ and at the same time for c(λ)

(4.1)

Figure 4  Cartoon of F versus l

 

            The thin blue line in figure 4 is the desired value of Fr.  The figure shows that there are usually at least two solutions to (4.1).  The one on the left is the most desirable.  It can happe, however, that the blue line is below the bottom of the dip in Figure 4.  In this case there is no solution, and the desired return is the value at the bottom of the dip.  The general problem of finding two values of (4.1) such that one is >0 and one < 0  is discussed in

Bracketing.doc .htm    BracketBli.doc htm

The code for this in Robmin is

      SUBROUTINE BRACK(FEXT,BEG,MPX,ENDC,MPX1,MPX2,F1,F2,SUCCES)

Fext is the function called which in this case is Foflam.  This is a function of a multiple precision MPX.  Beg is the beginning location and Endc the ending location for the search.  MPX is a multiple precision location at which the search starts.  On a successful return F1=Fext(MPX1) < 0 and F2=Fext(MPX2) > 0.  If this happens and Sminv did the last inversion, succes = Y, if MPHsminv or MPsminv did the last inversion success = Z, if a value is found equal to 0 using Sminv success = P, using MPHsminv or MPsminv, success = Q.

            To stay on the right side of figure 4, the search starts at the far right adding up to 41 terms at unequal distances moving left.  The positions of these terms cluster close together near the initial MPX and become further spaced towards the edges.  It can happen that none of these is less than zero.  In this case a weighted BLI is used to add up to NP=512 points at the positions most likely to produce a value less than 0.  If none of these is less than zero, the lowest value found is returned with success = N.

Newton’s Method ( Newton.doc .htm Complex Newton.doc .htm)

In one dimension Newton’s method is to solve the linear equation

            (4.2)

for the c which makes f(c) equal to zero.

            (4.3)

the actual derivatives as shown in the graph

would shoot the function all over the map and in wrong directions in general.  The value of bracketing is that the actual derivatives are replaced by a linear interpolation, which becomes a numerical derivative in the limit that the spacing becomes small.  That is by first finding a value above zero and one below zero one can replace

            (4.4)

Then use Newton’s method as before always replacing c+ or c- by the next better estimate.  --- Note that it is essential to keep both one value above zero and one above.  This requires testing each estimate. 

      SUBROUTINE NEWT(MPX1,F1,MPX2,F2,FUN1,NCALL)

C THIS ROUTINE USES (FUNCP-FUNCM)/(XP-XM) TO ESTIMATE THE

C DERIVATIVE IN THE VICINITY OF FUNC = 0.  FUNCP*FUNCM < 0.

C FINDS ABS(X1-X2) < PRECX OR MAX(ABS(F1),ABS(F2))<PRECF

      DATA PRECX/1D-120/,NCALLM/64/

The Newton’s method used here normally squares the error on each iteration so only a very few extra iterations are needed to reach the 10-120 specified.  This rather extreme accuracy is due to the nearly vertical lines in figure 3.  It frequently happens that F(λ-) is orders of magnitude below Fr while F(λ+) is orders of magnitude high even though the difference between these two values of λ is less than 10-120.   The c for the lower one is always passed back to the code.  This happens when c is quite different from c0.  Usually the value represents an incorrect quadratic extrapolation of χ2.  The large positive F is due to a maximum value projected for a very large step in c, while the large negative value is due to a minimum projected for a very large step in c.  Usually either c is such that χ2(c) is greater than χ2(c0).  In this case Robmin sets Fr closer to 1 making the next step occur in the smoother region to the right in figure 3.  Occasionally the c associated with the minimum corresponds to the middle arrow in figure 3.  The projected χLim2 of -1025 will not be found for the positive definite χ2(c), but a drop of one to two orders of magnitude followed by a quick minimization in the new region of c happens rather frequently.

 

 

 



[1] ..\..\Fittery\nlfit\LogAigau\The number of Multiple Precision integers.htm

[2] The failure to find a term here can make the routine stop right at the beginning.  This value is below the current chi, so it keeps trying.  The 0.1 is used at IREV = 5 to get below the numerical noise caused by adding multiple precision changes to double precision constants.  It could be as high as 0.99 and probably cause no problem.

[3] Pade approximates are nearly linearly dependent ..\..\Fittery\nlfit\LinDep.doc.