Extended Linear Models

Polynomial splines and their tensor products are natural building blocks for constructing finite dimensional estimates of infinite dimensional main effects and low-order interactions, and the resulting ANOVA decompositions provide an insightful tool for data analysis. In order to synthesize the theoretical properties of these estimates, we have developed the notion of an extended linear model. Many statistical problems of theoretical and practical importance can be effectively treated within this framework: Regression, generalized regression, polychotomous regression, hazard regression, censored regression, density estimation and conditional density estimation are all examples of extended linear models. These theoretical investigations provide valuable clues to developing sound statistical methodology.

The general asymptotics for polynomial splines in the context of an extended linear model were derived for uniform grids that are gradually refined as the amount of data increases. In practice, these fixed knot splines are rarely adequate and so adaptive knot spline procedures have been developed that alternate between adding knots in regions where the unknown function being estimated exhibits significant features and deleting knots in regions where, subject to noise considerations, this function is relatively smooth. Our standard approach to this problem involves a greedy search that alternates between stepwise addition (made efficient through the use of Rao statistics) and stepwise deletion (employing Wald statistics). Both the theory and standard ELM methodology are discussed in a lengthy Annals discussion paper.

Together with Robert Kohn at the University of New South Wales, we have begun to explore the use of Bayesian computational procedures for knot placement. By casting the problem as (essentially) model selection, newly developed tools for model averaging can be employed to generate models that perform dramatically better in some contexts. Our first application involves mixing estimates built from the reproducing kernels derived in the solution of classical smoothing spline problems. Despite these roots, our procedure exhibits a great deal of spatial adaptability and is computationally efficient thanks to a new partial Gibbs sampler we have designed for this problem. Another interesting application of this machinery involves Triogram modeling . In this case, averaging serves to smooth out the sharp features of the ordinary greedy Triograms, while allowing significant ridges to be maintained. Further Bayesian approaches are being followed up with Charles Kooperberg, now at the Fred Hutchison Cancer Research Institute, in the context of logspline density estimation. We presented some of this work at the recent ASA meeting and slides from the talk are available in PDF.

Recently, several forms of the minimum description length (MDL) principle have been applied in this context. Together with Bin Yu we have considered three types of the mixture MDL criteria that are differentiated by the prior chosen for the vector of spline coefficients:

  1. the classic ridge or identiy prior,
  2. Zellner's g-prior, and
  3. the familiar smoothing spline prior of Silverman.
The MDL approach adapts nicely to the present model selection problem in which the number of possible predictors is essentially n, the sample size. Preliminary simulation results suggest that this framework outperforms both the greedy and the simple Bayesian approaches mentioned above. By considering reproducing kernels as our underlying basis, we have derived a new algorithm for fitting smoothing splines that is spatially adaptive (due to the knot selection), while still incorporating a smoothing parameter that, in this case, is optimized by a simple iterative procedure not involving GCV.

For more information, contact Mark Hansen