BAYESIAN REGRESSION ANALYSIS IN THE LARGE P, SMALL N PARADIGM
Mike West
Duke University
September 2000
Statistical modelling and inference problems in which sample sizes are substantially
smaller than the number of available and potentially interesting predictors
abound in applied statistics. These Large p, Small n
problems pose challenges to standard statistical methods and demand new concepts and
models for regression and classification.
We discuss issues of regression modelling utilising singular-value decompositions
of design matrices that are massively rank deficient, and the development of
informative prior specifications on high-dimensional regression parameters. We
introduce new classes of structured prior distributions for this problem, and
develop computational methods and modes of posterior inference for regression
estimation and predictive inference for out-of-sample prediction and validation.
We describe an underlying latent factor modelling framework that provides a
formal interpretation and justification for the new prior structure and resulting
analysis. Extensions to binary regression are discussed, as are connections with non-parametric
regression, and an example illustrates the new methodology.
Keywords: Bayesian regression analysis, dimension reduction, high-dimensional covariates,
latent factor models, regression prediction, singular value decompositions, smoothness priors
The manuscript is available in postscript and
pdf formats