Journal of Computational & Graphical Statistics (JCGS), 19, 419-438, 2010
This paper initially appeared via advance JCGS e-publication
DOI: 10.1198/jcgs.2010.10016 on the JCGS fast-track for work
" ... on a hot topic of significant interest to the JCGS readership and computational statistics community"
Original manuscript - January 2010
We describe advances in statistical computation for large-scale data analysis in structured Bayesian mixture models via GPU (graphics processing unit) programming. The developments are partly motivated by computational challenges arising in increasingly prevalent biological studies using high-throughput flow cytometry methods, generating many, very large data sets and requiring increasingly high-dimensional mixture models with large numbers of mixture components. The paper describes the strategies and process for GPU computation in Bayesian simulation and optimization approaches, examples of the benefits of GPU implementations in terms of processing speed and scale-up in ability to analyze large data sets, while providing a detailed, tutorial-style exposition that will benefit readers interested in developing GPU-based approaches in other statistical models.
Keywords: Bayesian computation; Desktop parallel computing; Flow cytometry; GPU programming; Large data sets; Mixture models
We are grateful to Fernando Bonassi of Duke University for assistance with web page development for the supporting web site (below). Research reported here was partially supported by grants from the U.S. National Science Foundation (DMS-0342172) and National Institutes of Health (U54-CA-112952, P50-GM081883 and RC1 AI086032). Any opinions, findings and conclusions or recommendations expressed in this work are those of the authors and do not necessarily reflect the views of the NSF.
Computer code, tutorial development and examples are available at the GPU Statistical Science resource site: www.stat.duke.edu/gpustatsci