Paper Abstract

A Bayesian Analysis Strategy for Cross-Study Translation of Gene Expression Biomarkers

Joe E. Lucas, Carlos M. Carvalho & Mike West

On-line published version

We describe a strategy for analysis of experimentally derived gene expression signatures and their translation to human observational data. Sparse multivariate regression models are used to identify expression signature gene sets representing ``downstream'' biological pathway events following interventions in designed experiments. When translated into in vivo human observational data, analysis using sparse latent factor models can yield multiple quantitative factors characterizing expression patterns that are often more complex than in the controlled, in vitro setting. The estimation of common patterns in expression that reflect all aspects of covariation evident in vivo offers an enhanced, modular view of the complexity of biological associations of signature genes. This can identify substructure in the biological process under experimental investigation and improved biomarkers of clinical outcomes. We illustrate the approach in a detailed study from an oncogene intervention experiment where in vivo factor profiling of an in vitro signature generates biological insights related to underlying pathway activities and chromosomal structure, and leads to refinements of cancer recurrence risk stratification across several cancer studies.


Data and input/output files from the analyses are available here.


We acknowledge support of the National Science Foundation (grant DMS-0342172) and the National Institutes of Health (NCI U54-CA-112952-01 under the Integrative Cancer Biology program). Any opinions, findings and conclusions or recommendations expressed in this work are those of the authors and do not necessarily reflect the views of the NSF or NIH.