BAYESIAN REGRESSION ANALYSIS IN THE LARGE P, SMALL N PARADIGM

Mike West

Duke University

September 2000

Statistical modelling and inference problems in which sample sizes are substantially smaller than the number of available and potentially interesting predictors abound in applied statistics. These Large p, Small n problems pose challenges to standard statistical methods and demand new concepts and models for regression and classification. We discuss issues of regression modelling utilising singular-value decompositions of design matrices that are massively rank deficient, and the development of informative prior specifications on high-dimensional regression parameters. We introduce new classes of structured prior distributions for this problem, and develop computational methods and modes of posterior inference for regression estimation and predictive inference for out-of-sample prediction and validation. We describe an underlying latent factor modelling framework that provides a formal interpretation and justification for the new prior structure and resulting analysis. Extensions to binary regression are discussed, as are connections with non-parametric regression, and an example illustrates the new methodology.

Keywords: Bayesian regression analysis, dimension reduction, high-dimensional covariates, latent factor models, regression prediction, singular value decompositions, smoothness priors


The manuscript is available in postscript and pdf formats