Paper Abstract
Integrative Analysis of Cancer Gene Expression Studies using Bayesian Latent Factor Modelling
Dan Merl, Julia Lin-Yu Chen, Jen-Tsan Chi & Mike West
Annals of Applied
Statistics
Original Manuscript: November 2008
We discuss an applied study in cancer genomics that integrates
data and inferences from laboratory experiments on cancer cell lines with
observational data from human breast cancers. The biological focus is
on improving understanding of transcriptional responses of cells to
changes in acidity in the cellular environment, and our integrative analysis
aims to connect experimentally defined biomarkers of such responses
to clinical outcomes in breast cancer. The analysis is a case
study that exemplifies a general strategy for this kind of integration: connecting
patterns of biological response linked to specific experimental interventions into
observational studies where such responses may be evidenced via variation in
gene expression across samples, with potential to define biomarkers of clinically
relevant physiological states and outcomes. Statistical methods use Bayesian
analysis with sparse latent factor regression models to identify, explore and
relate signatures of aggregate gene expression changes between laboratory
and observational studies. Identifying potential clinically useful prognostic
factors can help to direct future laboratory studies as well as generate potential
for therapeutic advances.
Data and input/output files from the analyses in this case study are available here.
- In Vitro signature analysis:
Text files containing the data, BFRM input input and parameter setting files
and all BFRM output files from the analysis of the
neutralisation/lactic acidosis intervention data. Further information
on running BFRM and using output files is available at the BFRM software page.
- In Vivo factor analysis details:
Text files containing the data, BFRM input input and parameter setting files
and all BFRM output files from the sparse latent factor analysis of
the Miller breast tumor data.
- Survival analysis details:
Text files containing the data, SSS survival analysis input and parameter setting files
and all output files for the survival analysis of the Miller breast
tumor data, using latent factors, signatures, and clinical variables
as possible covariates. Further information
on running SSS and using output files is available at the SSS software page.
- Matlab Scripts: Matlab scripts
for generating the figures present in the manuscript.
- Validation Data: Matlab
workspaces containing the Pawitan and Sotiriou breast tumor data sets,
along with the imputed latent factor scores associated with these
samples.
Research partially supported by National Science Foundation
(DMS-0342172) and National Institutes of Health (NCI U54-CA-112952).
Any opinions, findings and conclusions or recommendations expressed
in this work are those of the authors and do not necessarily reflect
the views of the NSF or NIH.