Paper Abstract

Trans-Study Projection of Genomic Biomarkers
in Analysis of Oncogene Deregulation and Breast Cancer

Dan Merl, Joe Lucas, Joe Nevins, Haige Shen & Mike West

June 2008

To appear in: The Handbook of Applied Bayesian Analysis
Eds: Tony O'Hagan & Mike West

In cancer studies as in many areas of human disease research, gene expression microarray technology has been central to the emergent field of genomic medicine. Expression profiles of physiological states and clinical outcomes play increasing roles as biomarkers in both experimental and human observational studies. Central challenges in moving towards clinical applications include hard questions of how to link and combine such measures across contexts: from laboratory experiments with cultured cells, to animal model experiments, to human outcome studies and clinical trials. The question of how to translate and transfer experimental, laboratory findings to the context of human observational studies sits at the core of current translational research agendas. This case study focuses on precisely this question in cancer genomics, where the in vitro laboratory results involve gene expression signatures of changes in human cells in response to a set of interventions on cancer related genes, the oncogene intervention experiments, and the in vivo context is gene expression studies with data generated from human breast tumours.

The analyses involve a series of applications of sparse Bayesian latent factor regression models, and are illustrative of the use of these models for large-scale multivariate data arising from both designed experiments and observational studies.

We exploit a range of posterior summaries from analysis of the oncogene intervention data to project the resulting in vitro-defined signatures of biological responses to the interventions into the in vivo data from breast cancers. Bayesian latent factor analysis of gene expression linked to these signatures in the breast cancer data then reveals the greater complexity of patterns of expression evidenced in vivo, and evolutionary model search links the oncogene signatures to a number of cancer-relevant biological pathways not initially represented in the experimental context. We follow this with a study of how latent factors estimated in the breast data project back to the oncogene experimental context, highlighting and providing interpretation of some of the inferred factors. Further, using posterior estimates of the latent factors as covariates, we examine Bayesian survival models for recurrence of breast cancer that identify several key latent factors that clearly have value as clinical biomarkers with respect to recurrence. Bayesian pathway annotation analysis provides clear evidence of the biological relevance of these factors by linking them to known biological pathways of relevance in cancer progression and development; beyond immediate interpretation, this has led to follow-on biological investigations.

This case study in integrative, trans-study Bayesian analysis of gene expression data sets is illustrative of the use of the overall strategy and approach -- enabled by relevant Bayesian concepts, models and computational tools -- in a number of other studies in genomics.


Data and input/output files from the analyses in this case study are available here.


We are most grateful to Carlos Carvalho and Quanli Wang for discussion and past collaborations on ideas, models and methods utilised in the study discussed here. We acknowledge support of the National Science Foundation (grant DMS-0342172) and the National Institutes of Health (NCI U54-CA-112952-01 under the Integrative Cancer Biology program). Any opinions, findings and conclusions or recommendations expressed in this work are those of the authors and do not necessarily reflect the views of the NSF or NIH.