Scientific applications often require the very frequent evaluation of complicated functions. To reduce computational overhead, such a function can be defined by a table of its values at various fixed points. Some form of interpolation is then required to estimate values of the function at intermediate points. Linear interpolation is the quickest procedure; however, it is usually the least accurate.
A lookup table for a function
is a sequence of numbers
Using linear interpolation,
is approximated by straight line segments connecting the
Consider the very simple function
Because this function is concave (its second derviative
is always positive) the straight line segments will always
lie above the curve, i.e., linearly interpolated values will always
be too large.
To avoid such systematic errors, we suggest line segments connecting
a sequence of "pseudo-values"
that are determined by minimizing the sum of the integrated square errors
attributable to each segment. A least-squares technique is used to determine
the appropriate
The result is that the table does not contain exact values for the function
at the points
However, the mean square error due to sampling random points in the
function's domain is minimized. For instance, for
between
with table values at each integer, the mean square error is reduced
by a factor of 6 (the rms error by a factor of 2.45).
Our discussion is pitched at about the level of "Numerical Recipes
in C," Press et.al., Cambridge University Press 1988. It illustrates
not only interpolation (Chapter 3), but also least squares (Chapter 14) and
the solution of linear algebraic equations (Chapter 2). The latter arises
because, for a table with
values, our analysis expresses the
as
simultaneous equations in terms of the
Complete code is provided so that the reader can apply the algorithm to any
function
whatsoever.
Interpolation techniques represent a function
by a table of values
Between
and
standard linear interpolation approximates
by a line segment, which we'll denote by
with
and
chosen so that
and
This provides exact values of
at the
but does not constrain the mean square errors
In addition, it gives rise to systematic errors when the sign of
doesn't change between points. If
(concave function) then
for all
in the interval (and similarly for a convex function). This situation
is illustrated in Figure 1a.
| Figure 1a | Figure 1b |
|
|
|
Usual linear interpolation approximates
|
Linear interpolation approximating
|
Figure 1b illustrates the procedure to be developed. In this case the
line segments
connect points
and
The
are determined by a least squares technique that minimizes the sum of
the error terms given in Eq.(1), so that
This gives
equations for the
unknown
The work between here and Eq.(8) contains the algebra necessary to
formulate these
equations so that we can program their solution.
The reader who doesn't care to follow this derivation can skip to Eq.(8)
which contains the required result. If you do skip there, note that
denotes the middle of each
interval.
To keep the analysis tractable we assume that
is constant. Then the line segments can be written
and the sum of the error terms given by Eqs.(1) and (2) is
At the interior points
two terms from the sum in Eq.(3) contain subscript
so that
while at the boundaries
and
only one term contributes, so that
These derivatives can be evaluated directly as follows:
Substitute
and
to obtain:
Now separate the
term from each integral and substitute
on the top line,
on the bottom line:
The left-hand integrals can be evaluated directly:
So far the analysis is exact. To finish the problem we approximate the
right-hand integrals using Simpson's rule applied at the midpoint of each
interval. Let
so that
from which we can evaluate
Any desired degree of accuracy can be obtained by using better
approximations in Eqs.(6). The results are then substituted back into
Eqs.(5), and in this case
Eqs.(2), (4) and (7) now let us write down
equations for the
unknown
in terms of the known
:
These results are most easily written in matrix form:
Eq.(8) specifies how to construct a table of
,
for linearly interpolating any function
,
that minimizes (modulo Simpson's rule) the mean square error due to
randomly sampling points in the function's domain. However,
must be a function of a single variable. The application for
which this alogorithm was originally designed requires interpolation of a
function
of two variables. This more general situation is conceptually quite
similar to the one-dimensional case we discussed; however,
the required algebra becomes noticeably more complicated.
I therefore decided to stop the formal discussion of the algorithm
at this point.
The implementation of Eq.(8) is illustrated in the listings below.
They contain a "C" function lint1d(x0,dx,n,f,y)
that returns a table of
,
suitable for one-dimensional linear interpolation, as prescribed by Eq.(8).
It simply malloc's a temporary
array in which the coefficient matrix is set up, then calls the
user-supplied function
to initialize array
with the vector of constants, and finally solves the
simultaneous equations, returning the
,
solutions to the caller in the same array
.
The listings contain a test driver for lint1d(),
and some sample output for
as discussed in the introduction. They also contain the simultaneous
equation solver used by lint1d(). It's a Fortran-to-C port
of an old IBM Scientific Subroutine Package routine, simq(),
as noted in the code. As you can see, Eq.(8) specifies a very
straightforward procedure, and the resulting code for lint1d()
is correspondingly simple. The code for simq() is a bit dense,
but can be treated as a black box for the purposes of this application.
By supplying the address of your own function to lint1d()'s f-argument, an interpolation table can very easily be constructed to suit any purpose whatsoever. Thus, almost any application that employs one-dimensional linear interpolation at fixed intervals should be able to realize noticeable improvements in accuracy. All it takes is a simple one-line call to lint1d(). Afterwards, interpolation on the improved table is performed totally transparently, so absolutely no application-level recoding is required.
lint1d() contains an embedded driver
(compiled when TESTDRIVE is #define'd) to facilitate testing and to
illustrate usage. I've also tried to carefully document the code (except
for some of the more elaborate indexing arithmetic required near the bottom
of simq()). My personal experience with scientific code
is that it frequently tends to be under-documented and also under-robust.
The latter problem can be particularly pernicious since the code usually
won't blow up, but will simply supply the wrong result due to an
inappropriate numerical method or due to some more trivial problem.
(For instance, I once worked on a model where the value of
had been inadvertantly entered incorrectly.) Since numerical results
are usually not analytically verifiable, and since feedback mechanisms
frequently prevent small errors from growing uncontrollably,
it may take quite some time (if ever) before a researcher realizes
that his results are little more than massaged noise. The only effective
procedure to guard against such difficulties is a design that carefully
decomposes a complicated calculation into easily describable functional
components that can be individually tested for known correct results.
/**********************************************************
*
* Copyright (c) 1991, John Forkosh. All rights reserved.
* Released to the public domain only for use that is both
* (a) by an individual, and (b) not for profit.
* --------------------------------------------------------
*
* Function: lint1d ( x0,dx,n, f,y )
*
* Purpose: Creates a table containing n values,
* starting with y[0]=f(x0), y[1]=f(x0+dx),
* through y[n-1]=f(x0+(n-1)*dx).
* However, the value in y[i] won't
* necessarily exactly equal f(x0+i*dx)
* as indicated above. Instead, the table
* is constucted to minimize the mean square
* error due to sampling random points in
* f(x)'s domain.
*
* Arguments: x0 (I) Double containing the first point
* in the function's domain.
* dx (I) Double containing x-interval
* between points in the table.
* n (I) Int containing the number of
* points in the table.
* f (I) Address of function returning
* double whose values are to be
* represented in the table.
* y (O) Address of double returning the
* table of n values, as discussed.
*
* Returns: (int) 0 for a normal solution, or
* 1 for any error (e.g., a singular
* set of equations).
*
* Notes: o See the accompanying article for a complete
* description of lint1d() and simq().
* o -DTESTDRIVE to compile a sample main()
* (see below).
*
* --------------------------------------------------------
* Revision History:
* 01/07/91 J.Forkosh Installation
*
*********************************************************/
/* --- standard headers --- */
#include <stdio.h> /* need NULL ptr value */
#include <stdlib.h> /*for malloc() prototype*/
/* --------------------------------------------------------
Entry point
-------------------------------------------------------- */
int lint1d ( x0,dx,n, f,y ) /* returns 0=ok, 1=error */
double x0,dx; /*1st point and interval*/
int n; /* number of points */
double (*f)(); /*func to be interpolated*/
double *y; /*table for interpolation*/
{
/* --- local allocations and declarations --- */
int iserror = 0; /* no error detected yet */
int i,j; /* row,col indexes */
double x; /* arg for f(x) */
/* --- need temporary array for coefficient matrix --- */
double *a = (double *)malloc((n*n+1)*sizeof(double));
/* --------------------------------------------------------
Set up coefficient matrix as per discussion in article
(since the matrix is symmetric, the columnwise requirement
for simq() input is irrelevant).
-------------------------------------------------------- */
/* --- first check that malloc() was successful --- */
if ( a == NULL ) /*failed to malloc matrix*/
return ( iserror=1 ); /*return error to caller*/
for (i=0;i<n*n;i++) a[i]=0.0; /* init array to zeros */
/* --- 4's on diag (2's at extremes) and 1's offdiag --- */
for ( i=1; i<n; i++ ) { /* skip 0,0 element */
j = n*i + i; /* index of diag element */
a[j] = 4.0; /* set diagonal element */
a[j-1] = a[j+1] = 1.0; /* and off-diag elements */
} /* --- end-of-for(i=1...n-1) --- */
a[0] = a[n*n-1] = 2.0; /* set 1st,last diag */
a[1] = a[n*n-2] = 1.0; /*and remaining off-diags*/
/* --------------------------------------------------------
Set up vector of constants as per discussion in article
-------------------------------------------------------- */
/* --- interior points --- */
for ( i=1; i<n; i++ ) { /* loop skips 1st point */
x = x0 + dx*i; /* next arg to be tabled */
y[i] = 2.0*(f(x-0.5*dx)+ /*const vector calculated*/
f(x)+f(x+0.5*dx)); /* as per article */
} /* --- end-of-for(i=1...n-1) --- */
/* --- boundary points --- */
y[0] = f(x0) + 2.0*f(x0+0.5*dx); /* first x in domain */
y[n-1] = f(x) + 2.0*f(x -0.5*dx); /*use last x from loop*/
/* --------------------------------------------------------
Solve the simultaneous equations
-------------------------------------------------------- */
iserror = simq(a,y,n); /* y[] returns solution */
free ( (void *)a ); /*don't need coeff matrix*/
return ( iserror ); /*back to caller with y[]*/
} /* --- end-of-function lint1d() --- */
/**********************************************************
*
* Copyright (c) 1991, John Forkosh. All rights reserved.
* Released to the public domain only for use that is both
* (a) by an individual, and (b) not for profit.
* --------------------------------------------------------
*
* Function: simq ( a, b, n )
*
* Purpose: Solves the set of n simultaneous linear
* equations ax=b.
*
* Arguments: a (I) Address of n-by-n array of doubles
* stored columnwise (see notes)
* containing the matrix of
* coefficients (which are destroyed
* during the computation).
* b (I/O) Address of vector of n doubles,
* containing the original constants,
* and returning the final solution.
* n (I) Int containing the number of
* equations to be solved (must be >1)
*
* Returns: (int) 0 for a normal solution, or
* 1 for a singular set of equations.
*
* Notes: o Simq() is derived from the Fortran routine
* of the same name in IBM's System/360
* Scientific Subroutine Package, Publication
* GH20-0205-4, Fifth Edition (August 1970),
* Page 120. See the discussion of Gauss-
* Jordan elimination in Chapter 2 of
* Numerical Recipes in C for a similar
* treatment.
* o The matrix of coefficients, a[], must be
* stored columnwise in a singly-dimensioned
* array, e.g., for a 3x3 matrix the nine
* elements of a[] are
* a[0] a[3] a[6]
* a[1] a[4] a[7]
* a[2] a[5] a[8]
* In other words, the index for row i and
* column j (i,j=0,...,n-1) is a[n*j+i].
* Note that many index calculations are
* buried in initialization steps of loops.
* o The method of solution is by elimination
* using largest pivotal divisor. Each stage
* of elimination consists of interchanging
* rows when necessary to avoid division by
* zero or small elements. First the forward
* solution to obtain the n-th variable is
* done in n stages. Then the back solution
* for the other variables is done by
* successive substitution. Final solution
* values are returned in vector b[], with
* variable 1 in b[0], variable 2 in b[1],
* etc. If no pivot can be found exceeding
* the #defined TOLerance, the matrix is
* considered singular and an error returned.
*
* --------------------------------------------------------
* Revision History:
* 01/07/91 J.Forkosh Installation
*
*********************************************************/
/* --- macros --- */
#define dabs(x) ((x)>=0.0?(x):(-(x))) /* absolute value */
/* --- user-adjustable tolerance --- */
#define TOL 0.0
/* --------------------------------------------------------
Entry point
-------------------------------------------------------- */
int simq ( a, b, n ) /* returns 0=ok, 1=error */
double *a; /* coefficient matrix */
double *b; /* constant vector */
int n; /* number of equations */
{
/* --- local allocations and declarations --- */
int i,j, ix,jx; /* row,col indexes */
int it,k; /* temp indexes */
/* --------------------------------------------------------
Forward Solution
-------------------------------------------------------- */
for ( j=0; j<n; j++ ) /* cols */
{
/* --- local allocations and declarations --- */
int imax; /* pivot row */
double biga=0.0; /*largest element in col*/
double save; /*save value during pivot*/
/* ------------------------------------------------------
Find maximum coefficient (pivot row) in column
------------------------------------------------------ */
for ( it=j*n,i=j; i<n; i++ ) /*rows in lower diagonal*/
if ( dabs(a[it+i]) > dabs(biga) ) /* got bigger biga */
{ biga = a[it+i]; /* so store biggest */
imax = i; } /* and its col offset */
/* --- error if pivot less than tolerance --- */
if ( dabs(biga) <= TOL ) /* pivot too small so... */
return ( 1 ); /*back to caller with err*/
/* ------------------------------------------------------
Interchange rows as necessary and divide by leading coeff
------------------------------------------------------ */
/* --- easy to interchange constant vector --- */
save = b[imax]; /* save b[imax] */
b[imax] = b[j]; /* replace it with b[j] */
b[j] = save/biga; /* swap and rescale */
/* --- must interchange row one col at a time --- */
for ( i=j*(n+1),it=imax-j,k=j; k<n; k++,i+=n ) {
save = a[it+i]; /* save swap value */
a[it+i] = a[i]; /* replace it with a[i] */
a[i] = save/biga; /* swap and rescale */
} /* --- end-of-for(k=j...n-1) --- */
/* ------------------------------------------------------
Eliminate next variable (Note: the index calculations
required beyond this point get much more complicated.)
------------------------------------------------------ */
if ( j < (n-1) ) /* except on last j */
for ( ix=j+1; ix<n; ix++ ) { /* lower diagonal */
int ixj = (n*j)+ix; /*lowr diag rows in col j*/
for( it=j-ix,jx=j+1; jx<n; jx++ )
{ int ixjx = (n*jx)+ix;
a[ixjx] -= a[ixj]*a[it+ixjx]; }
b[ix] -= b[j]*a[ixj];
} /* --- end-of-for(ix=j+1...n-1) --- */
} /* --- end-of-for(j=0...n-1) --- */
/* --------------------------------------------------------
Back Solution
-------------------------------------------------------- */
for ( it=n*n,i=2; i<=n; i++ ) { /* all rows except last */
int ia=it-i, ib=n-i, ic=n-1;
for ( k=1; k<i; k++,ia-=n,ic-- )
b[ib] -= a[ia]*b[ic];
} /* --- end-of-for(i=2...n) --- */
return ( 0 ); /*back to caller with b[]*/
} /* --- end-of-function simq() --- */
#ifdef TESTDRIVE
/**********************************************************
*
* Purpose: Test driver for lint1d() (and simq).
* Prepares a table of values for
* interpolation of the function f(x)=x*x,
* and compares the resulting accuracy
* with usual linear interpolation.
*
* Command-Line Arguments:
* x0 (I) Double containing the first point
* in the function's domain
* (default=-10.0).
* dx (I) Double containing x-interval
* between points in the table
* (default=1.0).
* n (I) Int containing the number of
* points in the table
* (default=21).
* K (I) Int containing the exponent for
* the test function f(x)=x**K
* (default=2).
*
*********************************************************/
/* --- program defaults (x0,dx,n,K from cmdline) --- */
#define X0 (-10.0) /* 1st point in table */
#define DX 1.0 /* table interval */
#define N 21 /*number pts in table*/
static int K=2; /* test f(x)=x**K */
#define NMAX 99 /* easier than malloc */
#define NRMS 101 /*pts per dx for rms calc*/
/* --------------------------------------------------------
Entry point
-------------------------------------------------------- */
main ( argc, argv )
int argc;
char *argv[];
{
/* --- local allocations and declarations --- */
double x0=X0, dx=DX; int n=N; /* defaults */
int i,j; /* loop indexes */
double x; /* arg for f() */
double f(); /* test function */
double y[NMAX]; /* table from lint1d() */
int iserror=0; /*return flag from lint1d*/
double frms=0.0,yrms=0.0; /* mean square errors */
double xa,xb; /*interval bounds for rms*/
double sqrarg; /*dummy arg for sqr macro*/
/* --- linear interpolation (u(x=xa)=ya, u(x=xb)=yb) --- */
#define u(x,xa,ya,xb,yb) (ya*(xb-x)+yb*(x-xa))/(xb-xa)
/* --- square a double --- */
#define sqr(x) (sqrarg=(x),sqrarg*sqrarg)
/* --------------------------------------------------------
Pick up command-line arguments
-------------------------------------------------------- */
/* --- printf("Usage: lint1d x0 dx n expon\n"); --- */
if ( argc > 1 ) x0 = atof(argv[1]); /* user supplied x0 */
if ( argc > 2 ) dx = atof(argv[2]); /* user supplied dx */
if ( argc > 3 ) n = atoi(argv[3]); /* user supplied n */
if ( argc > 4 ) K = atoi(argv[4]); /* user supplied K */
if ( n<3 || n>NMAX ) exit(0); /* sorry, Charlie */
/* --------------------------------------------------------
Call lint1d() to create interpolation table. Note: After
this call, a normal application (what mainframe IBM
likes to call a "problem program") would simply go about
its business, using y[i=0...n-1] as a table for linear
interpolation of f(x) in the usual way. The special
algorithm by which our table was generated is completely
transparent to further processing.
-------------------------------------------------------- */
iserror = lint1d(x0,dx,n,f,y); /*very easy to use lint1d*/
if ( iserror ) /* unless it fails */
{ printf("lint1d() failed.\n"); /* then just print msg */
exit(0); } /* and quit */
/* --------------------------------------------------------
Print the exact and tabled values of f(x) at each x point
-------------------------------------------------------- */
/* --- stub header --- */
printf(" Table Returned By Lint1d()\n");
printf(" i x[i] f(x[i]) y*[i]\n");
printf("--- -------- ------------ ------------\n");
/* --- table generated by lint1d() in right column --- */
for ( i=0; i<n; i++ ) { /*for each value in table*/
/* --- f(x[i]) is what a normal table would contain
y*[i] is our table to minimize errors --- */
x = x0 + dx*i; /* need x to print f(x) */
printf("%3d %8.2lf %12.3lf %12.6lf\n",
i+1,x,f(x),y[i]); /* f(x) and y[i] */
} /* --- end-of-for(i=0...n-1) --- */
/* --------------------------------------------------------
Determine mean square error for both f(x[i]) and for y[i]
-------------------------------------------------------- */
for ( i=1; i<n; i++ ) { /*each interval in table*/
double fxa, fxb, fx; /* f(xa), f(xb), f(x) */
xa = x0+dx*(i-1); fxa=f(xa); /*lower bound of interval*/
xb = x0+dx*i; fxb=f(xb); /* upper bound */
for ( j=0; j<NRMS; j++ ) { /*accum err for NRMS pts*/
x = xa + (j*dx)/(NRMS-1); /* xa<=x<=xb */
fx = f(x); /* actual value at x */
frms += sqr(u(x,xa,fxa,xb,fxb)-fx); /* usual error */
yrms += sqr(u(x,xa,y[i-1],xb,y[i])-fx); /* our error */
} /* --- end-of-for(j=0...NRMS-1) --- */
} /* --- end-of-for(i=1...n-1) --- */
/* --- print usual error and our (smaller) error --- */
printf("Mean sq err for\n%d pts/intvl %12.8lf %12.8lf\n",
NRMS,frms/(NRMS*(N-1)),yrms/(NRMS*(N-1)));
} /* --- end-of-main() --- */
/* --------------------------------------------------------
Entry point (dummy function f(x)=x**K can be replaced
by any function of the user's choice)
-------------------------------------------------------- */
double f(x)
double x;
{
/* --- replace this with any function of your choice --- */
int i=K; /* countdown index */
double y=x; /* start with x */
while ( --i > 0 ) y *= x; /*additional factors of x*/
return ( y ); /* f(x)=x**expon */
} /* --- end-of-function f() --- */
#endif
/* -------------- END-OF-FILE LINT1D.C ----------------- */
Table Returned By lint1d()
i x[i] f(x[i]) y*[i]
--- -------- ------------ ------------
1 -10.00 100.000 99.833333
2 -9.00 81.000 80.833333
3 -8.00 64.000 63.833333
4 -7.00 49.000 48.833333
5 -6.00 36.000 35.833333
6 -5.00 25.000 24.833333
7 -4.00 16.000 15.833333
8 -3.00 9.000 8.833333
9 -2.00 4.000 3.833333
10 -1.00 1.000 0.833333
11 0.00 0.000 -0.166667
12 1.00 1.000 0.833333
13 2.00 4.000 3.833333
14 3.00 9.000 8.833333
15 4.00 16.000 15.833333
16 5.00 25.000 24.833333
17 6.00 36.000 35.833333
18 7.00 49.000 48.833333
19 8.00 64.000 63.833333
20 9.00 81.000 80.833333
21 10.00 100.000 99.833333
------------- ------------ ------------
Mean sq error 0.03300330 0.00578108
|
email: john@forkosh.com |
|
|