Paper Abstract

Integrative Analysis of Cancer Gene Expression Studies using Bayesian Latent Factor Modelling

Dan Merl, Julia Lin-Yu Chen, Jen-Tsan Chi & Mike West

Annals of Applied Statistics

Original Manuscript: November 2008

We discuss an applied study in cancer genomics that integrates data and inferences from laboratory experiments on cancer cell lines with observational data from human breast cancers. The biological focus is on improving understanding of transcriptional responses of cells to changes in acidity in the cellular environment, and our integrative analysis aims to connect experimentally defined biomarkers of such responses to clinical outcomes in breast cancer. The analysis is a case study that exemplifies a general strategy for this kind of integration: connecting patterns of biological response linked to specific experimental interventions into observational studies where such responses may be evidenced via variation in gene expression across samples, with potential to define biomarkers of clinically relevant physiological states and outcomes. Statistical methods use Bayesian analysis with sparse latent factor regression models to identify, explore and relate signatures of aggregate gene expression changes between laboratory and observational studies. Identifying potential clinically useful prognostic factors can help to direct future laboratory studies as well as generate potential for therapeutic advances.

Data and input/output files from the analyses in this case study are available here.

Research partially supported by National Science Foundation (DMS-0342172) and National Institutes of Health (NCI U54-CA-112952). Any opinions, findings and conclusions or recommendations expressed in this work are those of the authors and do not necessarily reflect the views of the NSF or NIH.