Bayesian semi-parametric multiple shrinkage

Rich MacLehose and David B. Dunson

Biostatistics Branch, NIEHS

September, 2007

High dimensional and highly correlated data leading to non- or weakly-identified effects are commonplace. Maximum likelihood will typically fail in such situations and a variety of shrinkage methods have been proposed. Standard techniques, such as ridge regression or the lasso, shrink estimates towards zero, with some approaches allowing coefficients to be selected out of the model by achieving a value of zero. When substantive information is available, estimates can be shrunk to non-null values; however, such information may not be available. We propose a Bayesian semi-parametric approach that allows shrinkage to multiple locations. Coefficients are given a mixture of heavy-tailed double exponential priors, with lcoation and scale parameters assigned Dirichlet process hyperpriors to allow groups of coefficients to be shrunk toward the same, possibly non-zero, mean. Our approach favors sparse, but flexible structure, by shrinking towards a small number of random locations. The methods are illustrated using a study of genetic polymorphisms and multiple myeloma.

Keywords: Dirichlet process, Hierarchical model, Lasso, MCMC, Mixture model, Nonparametric, Regularization, Shrinkage prior.

The manuscript is available in PDF formats.