Paper Abstract

Bayesian modeling for biological pathway annotation of genomic signatures

Haige Shen & Mike West

April 2008

We present Bayesian models and computational methods for the problem of matching predictions from molecular studies with known biological pathway databases - the problem of pathway annotation of summary results of an experiment or observational study. In areas such as cancer genomics, linking quantified, experimentally defined gene expression signatures with known biological pathway gene sets is essential to improving the understanding of the complexity of molecular pathways related to outcome. Our new models address this key challenge. Our focus and examples are on studies using gene expression microarrays, though the theory and methods are quite general. Our models for probabilistic pathway annotation (PROPA) analysis address the problem formally, statistically, and deliver probabilities over pathways for any experimental signature. This allows quantitative assessment and ranking of pathways putatively linked to an experimental or observational phenotype. The models integrate qualitative biological information into the analysis and generate coherent inference on uncertainties about gene pathway membership that can inform the revision of pathway data bases.

Our analysis relies on simulation-based computation in high-dimensional models, and introduces a novel extension of variational methods for computation of model evidence, or marginal likelihood functions, that are central to the comparison of multiple biological pathways. Three examples highlight the methodology using both simulated and real data, and we developed detailed analyses in two cases studies in breast cancer genomics. This first study involves breast cancer estrogen-receptor and ErbB2 phenotypes. The second study concerns pathway activities underlying the cellular response to lactic acidosis in breast cancer, involving the analyses of both in vitro and in vivo data; this last example demonstrates the application of the method in decomposing the complexity of gene expression-based predictions about interacting biological pathway activation.

Keywords: biological pathway analysis, cancer genomics, factor regression models, gene expression signatures, gene set enrichment, marginal likelihood computation, Monte Carlo variational approximation, sparse factor analysis
We are grateful to Ashley Chi, Joe Lucas and Chunlin Ji of Duke University for discussions and important input. We acknowledge support of the National Science Foundation (grants DMS-0102227 and DMS-0342172) and the National Institutes of Health (grants NHLBI P01-HL-73042-02 and NCI U54-CA-112952-01). Any opinions, findings and conclusions or recommendations expressed in this work are those of the authors and do not necessarily reflect the views of the NSF or NIH.