Sparse Bayesian analysis & data synthesis in gene expression experiments

Joe Lucas, Quanli Wang, Andrea Bild, Joe Nevins & Mike West

Revision for release forthcoming in November 2006

Analysis of high-dimensional DNAmicroarray gene expression data from designed experiments aims to: (a) isolate real effects of experimental intervention on changes in gene expression, and guard against false discovery, (b) model the impact on changes in variation of gene expression as well as changes in level, and (c) address critical issues of gene-sample specific variation that impact multiple genes but that represent artifacts and experimental biases.

Multivariate regression models using sparsity-inducing priors on regression parameters and allow for changes in variability as well as level address these key practical issues. Utilising normalisation control data provides a mechanism for within-model analysis and calibration to correct gene-sample specific biases and allow synthesis across samples within a study, and also across studies. These issues, and the dangers of ignoring them in expression experiments, are highlighted in analysis of oncogene intervention experiments in cancer genomics.


The revised version of the original 2005 report will be available in November 2006, together with a companion paper on sparse factor models and the BFRM software release. Come back in mid November for full details and manuscripts.