More Accurate Linear Interpolation for Table Lookup Algorithms

Copyright © 1991-2004, John Forkosh Associates, Inc.
email:
john@forkosh.com         www.forkosh.com

John Forkosh
285 Stegman Parkway #309
Jersey City, NJ 07305

Introduction

Scientific applications often require the very frequent evaluation of complicated functions. To reduce computational overhead, such a function can be defined by a table of its values at various fixed points. Some form of interpolation is then required to estimate values of the function at intermediate points. Linear interpolation is the quickest procedure; however, it is usually the least accurate.

A lookup table for a function is a sequence of numbers Using linear interpolation, is approximated by straight line segments connecting the Consider the very simple function Because this function is concave (its second derviative is always positive) the straight line segments will always lie above the curve, i.e., linearly interpolated values will always be too large.

To avoid such systematic errors, we suggest line segments connecting a sequence of "pseudo-values" that are determined by minimizing the sum of the integrated square errors attributable to each segment. A least-squares technique is used to determine the appropriate The result is that the table does not contain exact values for the function at the points However, the mean square error due to sampling random points in the function's domain is minimized. For instance, for between with table values at each integer, the mean square error is reduced by a factor of 6 (the rms error by a factor of 2.45).

Our discussion is pitched at about the level of "Numerical Recipes in C," Press et.al., Cambridge University Press 1988. It illustrates not only interpolation (Chapter 3), but also least squares (Chapter 14) and the solution of linear algebraic equations (Chapter 2). The latter arises because, for a table with values, our analysis expresses the as simultaneous equations in terms of the Complete code is provided so that the reader can apply the algorithm to any function whatsoever.

Functions of a Single Variable

Interpolation techniques represent a function by a table of values Between and standard linear interpolation approximates by a line segment, which we'll denote by with and chosen so that and This provides exact values of at the but does not constrain the mean square errors

In addition, it gives rise to systematic errors when the sign of doesn't change between points. If (concave function) then for all in the interval (and similarly for a convex function). This situation is illustrated in Figure 1a.

   Figure 1a       Figure 1b   
Usual linear interpolation approximates with straight line segments between points and . When is concave as illustrated, the line segments always lie above the curve, so the interpolated values are always too large. Linear interpolation approximating by line segments between points and determined by the procedure developed in this article. The systematic error is removed and the mean square error is minimized.

Figure 1b illustrates the procedure to be developed. In this case the line segments connect points and The are determined by a least squares technique that minimizes the sum of the error terms given in Eq.(1), so that

This gives equations for the unknown

The work between here and Eq.(8) contains the algebra necessary to formulate these equations so that we can program their solution. The reader who doesn't care to follow this derivation can skip to Eq.(8) which contains the required result. If you do skip there, note that denotes the middle of each interval.

To keep the analysis tractable we assume that is constant. Then the line segments can be written

and the sum of the error terms given by Eqs.(1) and (2) is

At the interior points two terms from the sum in Eq.(3) contain subscript so that

while at the boundaries and only one term contributes, so that

These derivatives can be evaluated directly as follows:

Substitute and to obtain:

Now separate the term from each integral and substitute on the top line, on the bottom line:

The left-hand integrals can be evaluated directly:

So far the analysis is exact. To finish the problem we approximate the right-hand integrals using Simpson's rule applied at the midpoint of each interval. Let so that
from which we can evaluate

Any desired degree of accuracy can be obtained by using better approximations in Eqs.(6). The results are then substituted back into Eqs.(5), and in this case

Eqs.(2), (4) and (7) now let us write down equations for the unknown in terms of the known  :

These results are most easily written in matrix form:

Eq.(8) specifies how to construct a table of , for linearly interpolating any function , that minimizes (modulo Simpson's rule) the mean square error due to randomly sampling points in the function's domain. However, must be a function of a single variable. The application for which this alogorithm was originally designed requires interpolation of a function of two variables. This more general situation is conceptually quite similar to the one-dimensional case we discussed; however, the required algebra becomes noticeably more complicated. I therefore decided to stop the formal discussion of the algorithm at this point.

Discussion of Code

The implementation of Eq.(8) is illustrated in the listings below. They contain a "C" function lint1d(x0,dx,n,f,y) that returns a table of , suitable for one-dimensional linear interpolation, as prescribed by Eq.(8). It simply malloc's a temporary array in which the coefficient matrix is set up, then calls the user-supplied function to initialize array with the vector of constants, and finally solves the simultaneous equations, returning the , solutions to the caller in the same array .

The listings contain a test driver for lint1d(), and some sample output for as discussed in the introduction. They also contain the simultaneous equation solver used by lint1d(). It's a Fortran-to-C port of an old IBM Scientific Subroutine Package routine, simq(), as noted in the code. As you can see, Eq.(8) specifies a very straightforward procedure, and the resulting code for lint1d() is correspondingly simple. The code for simq() is a bit dense, but can be treated as a black box for the purposes of this application.

By supplying the address of your own function to lint1d()'s f-argument, an interpolation table can very easily be constructed to suit any purpose whatsoever. Thus, almost any application that employs one-dimensional linear interpolation at fixed intervals should be able to realize noticeable improvements in accuracy. All it takes is a simple one-line call to lint1d(). Afterwards, interpolation on the improved table is performed totally transparently, so absolutely no application-level recoding is required.

lint1d() contains an embedded driver (compiled when TESTDRIVE is #define'd) to facilitate testing and to illustrate usage. I've also tried to carefully document the code (except for some of the more elaborate indexing arithmetic required near the bottom of simq()). My personal experience with scientific code is that it frequently tends to be under-documented and also under-robust. The latter problem can be particularly pernicious since the code usually won't blow up, but will simply supply the wrong result due to an inappropriate numerical method or due to some more trivial problem. (For instance, I once worked on a model where the value of had been inadvertantly entered incorrectly.) Since numerical results are usually not analytically verifiable, and since feedback mechanisms frequently prevent small errors from growing uncontrollably, it may take quite some time (if ever) before a researcher realizes that his results are little more than massaged noise. The only effective procedure to guard against such difficulties is a design that carefully decomposes a complicated calculation into easily describable functional components that can be individually tested for known correct results.

Listings

lint1d

/**********************************************************
 *
 * Copyright (c) 1991, John Forkosh.  All rights reserved.
 * Released to the public domain only for use that is both
 * (a) by an individual, and (b) not for profit.
 * --------------------------------------------------------
 *
 * Function:    lint1d ( x0,dx,n, f,y )
 *
 * Purpose:     Creates a table containing n values,
 *              starting with y[0]=f(x0), y[1]=f(x0+dx),
 *              through y[n-1]=f(x0+(n-1)*dx).
 *              However, the value in y[i] won't
 *              necessarily exactly equal f(x0+i*dx)
 *              as indicated above.  Instead, the table
 *              is constucted to minimize the mean square
 *              error due to sampling random points in
 *              f(x)'s domain.
 *
 * Arguments:   x0 (I)  Double containing the first point
 *                      in the function's domain.
 *              dx (I)  Double containing x-interval
 *                      between points in the table.
 *              n (I)   Int containing the number of
 *                      points in the table.
 *              f (I)   Address of function returning
 *                      double whose values are to be
 *                      represented in the table.
 *              y (O)   Address of double returning the
 *                      table of n values, as discussed.
 *
 * Returns:     (int)   0 for a normal solution, or
 *                      1 for any error (e.g., a singular
 *                      set of equations).
 *
 * Notes:     o See the accompanying article for a complete
 *              description of lint1d() and simq().
 *            o -DTESTDRIVE  to compile a sample main()
 *              (see below).
 *
 * --------------------------------------------------------
 * Revision History:
 * 01/07/91     J.Forkosh       Installation
 *
 *********************************************************/
/* --- standard headers --- */
#include <stdio.h>              /* need NULL ptr value */
#include <stdlib.h>             /*for malloc() prototype*/

/* --------------------------------------------------------
Entry point
-------------------------------------------------------- */
int     lint1d ( x0,dx,n, f,y ) /* returns 0=ok, 1=error */
double  x0,dx;                  /*1st point and interval*/
int     n;                      /* number of points */
double  (*f)();                 /*func to be interpolated*/
double  *y;                     /*table for interpolation*/
{
/* --- local allocations and declarations --- */
int     iserror = 0;            /* no error detected yet */
int     i,j;                    /* row,col indexes */
double  x;                      /* arg for f(x) */
/* --- need temporary array for coefficient matrix --- */
double  *a = (double *)malloc((n*n+1)*sizeof(double));

/* --------------------------------------------------------
Set up coefficient matrix as per discussion in article
(since the matrix is symmetric, the columnwise requirement
for simq() input is irrelevant).
-------------------------------------------------------- */
/* --- first check that malloc() was successful --- */
if ( a == NULL )                /*failed to malloc matrix*/
  return ( iserror=1 );         /*return error to caller*/
for (i=0;i<n*n;i++) a[i]=0.0;   /* init array to zeros */
/* --- 4's on diag (2's at extremes) and 1's offdiag --- */
for ( i=1; i<n; i++ ) {         /* skip 0,0 element */
  j = n*i + i;                  /* index of diag element */
  a[j] = 4.0;                   /* set diagonal element */
  a[j-1] = a[j+1] = 1.0;        /* and off-diag elements */
  } /* --- end-of-for(i=1...n-1) --- */
a[0] = a[n*n-1] = 2.0;          /* set 1st,last diag */
a[1] = a[n*n-2] = 1.0;          /*and remaining off-diags*/

/* --------------------------------------------------------
Set up vector of constants as per discussion in article
-------------------------------------------------------- */
/* --- interior points --- */
for ( i=1; i<n; i++ ) {         /* loop skips 1st point */
  x = x0 + dx*i;                /* next arg to be tabled */
  y[i] = 2.0*(f(x-0.5*dx)+      /*const vector calculated*/
                f(x)+f(x+0.5*dx)); /* as per article */
  } /* --- end-of-for(i=1...n-1) --- */
/* --- boundary points --- */
y[0]   = f(x0) + 2.0*f(x0+0.5*dx); /* first x in domain */
y[n-1] = f(x)  + 2.0*f(x -0.5*dx); /*use last x from loop*/

/* --------------------------------------------------------
Solve the simultaneous equations
-------------------------------------------------------- */
iserror = simq(a,y,n);          /* y[] returns solution */
free ( (void *)a );             /*don't need coeff matrix*/
return ( iserror );             /*back to caller with y[]*/
} /* --- end-of-function lint1d() --- */

simq

/**********************************************************
 *
 * Copyright (c) 1991, John Forkosh.  All rights reserved.
 * Released to the public domain only for use that is both
 * (a) by an individual, and (b) not for profit.
 * --------------------------------------------------------
 *
 * Function:    simq ( a, b, n )
 *
 * Purpose:     Solves the set of n simultaneous linear
 *              equations ax=b.
 *
 * Arguments:   a (I)   Address of n-by-n array of doubles
 *                      stored columnwise (see notes)
 *                      containing the matrix of
 *                      coefficients (which are destroyed
 *                      during the computation).
 *              b (I/O) Address of vector of n doubles,
 *                      containing the original constants,
 *                      and returning the final solution.
 *              n (I)   Int containing the number of
 *                      equations to be solved (must be >1)
 *
 * Returns:     (int)   0 for a normal solution, or
 *                      1 for a singular set of equations.
 *
 * Notes:     o Simq() is derived from the Fortran routine
 *              of the same name in IBM's System/360
 *              Scientific Subroutine Package, Publication
 *              GH20-0205-4, Fifth Edition (August 1970),
 *              Page 120.  See the discussion of Gauss-
 *              Jordan elimination in Chapter 2 of 
 *              Numerical Recipes in C for a similar
 *              treatment.
 *            o The matrix of coefficients, a[], must be
 *              stored columnwise in a singly-dimensioned
 *              array, e.g., for a 3x3 matrix the nine
 *              elements of a[] are
 *                      a[0]  a[3]  a[6]
 *                      a[1]  a[4]  a[7]
 *                      a[2]  a[5]  a[8]
 *              In other words, the index for row i and
 *              column j (i,j=0,...,n-1) is a[n*j+i].
 *              Note that many index calculations are
 *              buried in initialization steps of loops.
 *            o The method of solution is by elimination
 *              using largest pivotal divisor.  Each stage
 *              of elimination consists of interchanging
 *              rows when necessary to avoid division by
 *              zero or small elements.  First the forward
 *              solution to obtain the n-th variable is
 *              done in n stages.  Then the back solution
 *              for the other variables is done by
 *              successive substitution.  Final solution
 *              values are returned in vector b[], with
 *              variable 1 in b[0], variable 2 in b[1],
 *              etc.  If no pivot can be found exceeding
 *              the #defined TOLerance, the matrix is
 *              considered singular and an error returned.
 *
 * --------------------------------------------------------
 * Revision History:
 * 01/07/91     J.Forkosh       Installation
 *
 *********************************************************/
/* --- macros --- */
#define dabs(x) ((x)>=0.0?(x):(-(x))) /* absolute value */
/* --- user-adjustable tolerance --- */
#define TOL 0.0

/* --------------------------------------------------------
Entry point
-------------------------------------------------------- */
int     simq ( a, b, n )        /* returns 0=ok, 1=error */
double  *a;                     /* coefficient matrix */
double  *b;                     /* constant vector */
int     n;                      /* number of equations */
{
/* --- local allocations and declarations --- */
int     i,j, ix,jx;             /* row,col indexes */
int     it,k;                   /* temp indexes */

/* --------------------------------------------------------
Forward Solution
-------------------------------------------------------- */
for ( j=0; j<n; j++ )           /* cols */
  {
  /* --- local allocations and declarations --- */
  int   imax;                   /* pivot row */
  double biga=0.0;              /*largest element in col*/
  double save;                  /*save value during pivot*/

  /* ------------------------------------------------------
  Find maximum coefficient (pivot row) in column
  ------------------------------------------------------ */
  for ( it=j*n,i=j; i<n; i++ )  /*rows in lower diagonal*/
    if ( dabs(a[it+i]) > dabs(biga) ) /* got bigger biga */
      { biga = a[it+i];         /* so store biggest */
        imax = i; }             /* and its col offset */
  /* --- error if pivot less than tolerance --- */
  if ( dabs(biga) <= TOL )      /* pivot too small so... */
    return ( 1 );               /*back to caller with err*/

  /* ------------------------------------------------------
  Interchange rows as necessary and divide by leading coeff
  ------------------------------------------------------ */
  /* --- easy to interchange constant vector --- */
  save = b[imax];               /* save b[imax] */
  b[imax] = b[j];               /* replace it with b[j] */
  b[j] = save/biga;             /* swap and rescale */
  /* --- must interchange row one col at a time --- */
  for ( i=j*(n+1),it=imax-j,k=j; k<n; k++,i+=n ) {
    save = a[it+i];             /* save swap value */
    a[it+i] = a[i];             /* replace it with a[i] */
    a[i] = save/biga;           /* swap and rescale */
    } /* --- end-of-for(k=j...n-1) --- */

  /* ------------------------------------------------------
  Eliminate next variable (Note: the index calculations
  required beyond this point get much more complicated.)
  ------------------------------------------------------ */
  if ( j < (n-1) )              /* except on last j */
   for ( ix=j+1; ix<n; ix++ ) { /* lower diagonal */
    int ixj = (n*j)+ix;         /*lowr diag rows in col j*/
    for( it=j-ix,jx=j+1; jx<n; jx++ )
      { int ixjx = (n*jx)+ix;
        a[ixjx] -= a[ixj]*a[it+ixjx]; }
    b[ix] -= b[j]*a[ixj];
    } /* --- end-of-for(ix=j+1...n-1) --- */
  } /* --- end-of-for(j=0...n-1) --- */

/* --------------------------------------------------------
Back Solution
-------------------------------------------------------- */
for ( it=n*n,i=2; i<=n; i++ ) { /* all rows except last */
  int   ia=it-i, ib=n-i, ic=n-1;
  for ( k=1; k<i; k++,ia-=n,ic-- )
    b[ib] -= a[ia]*b[ic];
  } /* --- end-of-for(i=2...n) --- */
return ( 0 );                   /*back to caller with b[]*/
} /* --- end-of-function simq() --- */

driver

#ifdef TESTDRIVE
/**********************************************************
 *
 * Purpose:     Test driver for lint1d() (and simq).
 *              Prepares a table of values for
 *              interpolation of the function f(x)=x*x,
 *              and compares the resulting accuracy
 *              with usual linear interpolation.
 *
 * Command-Line Arguments:
 *              x0 (I)  Double containing the first point
 *                      in the function's domain
 *                      (default=-10.0).
 *              dx (I)  Double containing x-interval
 *                      between points in the table
 *                      (default=1.0).
 *              n (I)   Int containing the number of
 *                      points in the table
 *                      (default=21).
 *              K (I)   Int containing the exponent for
 *                      the test function f(x)=x**K
 *                      (default=2).
 *
 *********************************************************/
/* --- program defaults (x0,dx,n,K from cmdline) --- */
#define X0 (-10.0)              /* 1st point in table */
#define DX 1.0                  /* table interval */
#define N 21                    /*number pts in table*/
static  int K=2;                /* test f(x)=x**K */
#define NMAX 99                 /* easier than malloc */
#define NRMS 101                /*pts per dx for rms calc*/

/* --------------------------------------------------------
Entry point
-------------------------------------------------------- */
main    ( argc, argv )
int     argc;
char    *argv[];
{
/* --- local allocations and declarations --- */
double  x0=X0, dx=DX; int n=N;  /* defaults */
int     i,j;                    /* loop indexes */
double  x;                      /* arg for f() */
double  f();                    /* test function */
double  y[NMAX];                /* table from lint1d() */
int     iserror=0;              /*return flag from lint1d*/
double  frms=0.0,yrms=0.0;      /* mean square errors */
double  xa,xb;                  /*interval bounds for rms*/
double  sqrarg;                 /*dummy arg for sqr macro*/
/* --- linear interpolation (u(x=xa)=ya, u(x=xb)=yb) --- */
#define u(x,xa,ya,xb,yb) (ya*(xb-x)+yb*(x-xa))/(xb-xa)
/* --- square a double --- */
#define sqr(x) (sqrarg=(x),sqrarg*sqrarg)

/* --------------------------------------------------------
Pick up command-line arguments
-------------------------------------------------------- */
/* --- printf("Usage: lint1d x0 dx n expon\n"); --- */
if ( argc > 1 ) x0 = atof(argv[1]); /* user supplied x0 */
if ( argc > 2 ) dx = atof(argv[2]); /* user supplied dx */
if ( argc > 3 ) n  = atoi(argv[3]); /* user supplied n */
if ( argc > 4 ) K  = atoi(argv[4]); /* user supplied K */
if ( n<3 || n>NMAX ) exit(0);   /* sorry, Charlie */

/* --------------------------------------------------------
Call lint1d() to create interpolation table.  Note: After
this call, a normal application (what mainframe IBM
likes to call a "problem program") would simply go about
its business, using y[i=0...n-1] as a table for linear
interpolation of f(x) in the usual way.  The special
algorithm by which our table was generated is completely
transparent to further processing.
-------------------------------------------------------- */
iserror = lint1d(x0,dx,n,f,y);  /*very easy to use lint1d*/
if ( iserror )                  /* unless it fails */
  { printf("lint1d() failed.\n"); /* then just print msg */
    exit(0); }                  /* and quit */

/* --------------------------------------------------------
Print the exact and tabled values of f(x) at each x point
-------------------------------------------------------- */
/* --- stub header --- */
printf("        Table Returned By Lint1d()\n");
printf(" i     x[i]       f(x[i])        y*[i]\n");
printf("---  --------  ------------  ------------\n");
/* --- table generated by lint1d() in right column --- */
for ( i=0; i<n; i++ ) {         /*for each value in table*/
  /* --- f(x[i]) is what a normal table would contain
              y*[i] is our table to minimize errors --- */
  x = x0 + dx*i;                /* need x to print f(x) */
  printf("%3d  %8.2lf  %12.3lf  %12.6lf\n",
    i+1,x,f(x),y[i]);           /* f(x) and y[i] */
  } /* --- end-of-for(i=0...n-1) --- */

/* --------------------------------------------------------
Determine mean square error for both f(x[i]) and for y[i]
-------------------------------------------------------- */
for ( i=1; i<n; i++ ) {         /*each interval in table*/
  double fxa, fxb, fx;          /* f(xa), f(xb), f(x) */
  xa = x0+dx*(i-1); fxa=f(xa);  /*lower bound of interval*/
  xb = x0+dx*i;     fxb=f(xb);  /* upper bound */
  for ( j=0; j<NRMS; j++ ) {    /*accum err for NRMS pts*/
    x = xa + (j*dx)/(NRMS-1);   /* xa<=x<=xb */
    fx = f(x);                  /* actual value at x */
    frms += sqr(u(x,xa,fxa,xb,fxb)-fx); /* usual error */
    yrms += sqr(u(x,xa,y[i-1],xb,y[i])-fx); /* our error */
    } /* --- end-of-for(j=0...NRMS-1) --- */
  } /* --- end-of-for(i=1...n-1) --- */
/* --- print usual error and our (smaller) error --- */
printf("Mean sq err for\n%d pts/intvl  %12.8lf  %12.8lf\n",
  NRMS,frms/(NRMS*(N-1)),yrms/(NRMS*(N-1)));
} /* --- end-of-main() --- */

/* --------------------------------------------------------
Entry point (dummy function f(x)=x**K can be replaced
by any function of the user's choice)
-------------------------------------------------------- */
double  f(x)
double  x;
{
/* --- replace this with any function of your choice --- */
int     i=K;                    /* countdown index */
double  y=x;                    /* start with x */
while ( --i > 0 ) y *= x;       /*additional factors of x*/
return ( y );                   /* f(x)=x**expon */
} /* --- end-of-function f() --- */
#endif
/* -------------- END-OF-FILE LINT1D.C ----------------- */

sample output for f(x)=x^2

        Table Returned By lint1d()
 i     x[i]       f(x[i])        y*[i]
---  --------  ------------  ------------
  1    -10.00       100.000     99.833333
  2     -9.00        81.000     80.833333
  3     -8.00        64.000     63.833333
  4     -7.00        49.000     48.833333
  5     -6.00        36.000     35.833333
  6     -5.00        25.000     24.833333
  7     -4.00        16.000     15.833333
  8     -3.00         9.000      8.833333
  9     -2.00         4.000      3.833333
 10     -1.00         1.000      0.833333
 11      0.00         0.000     -0.166667
 12      1.00         1.000      0.833333
 13      2.00         4.000      3.833333
 14      3.00         9.000      8.833333
 15      4.00        16.000     15.833333
 16      5.00        25.000     24.833333
 17      6.00        36.000     35.833333
 18      7.00        49.000     48.833333
 19      8.00        64.000     63.833333
 20      9.00        81.000     80.833333
 21     10.00       100.000     99.833333
-------------  ------------  ------------
Mean sq error    0.03300330    0.00578108

Copyright © 1991-2004, John Forkosh Associates, Inc.
email: john@forkosh.com