January, 2005
We present advances in Bayesian modelling and computation for CART (classification and regression tree) models. The modelling innovations include a formal prior distributional structure for tree generation — the pinball prior — that allows for the combination of an explicit specification of a distribution for both the tree size and the tree shape. The core computational innovations involve a novel Metropolis–Hastings method that can dramatically improve the convergence and mixing properties of MCMC methods of Bayesian CART analysis. Earlier MCMC methods have simulated Bayesian CART models using very local MCMC moves, proposing only small changes to a "current" CART model. Our new Metropolis–Hastings move makes large changes in the CART tree, but is at the same time local in that it leaves unchanged the partition of observations to terminal nodes. We evaluate the effectiveness of the proposed algorithm in two examples, one with a constructed data set and one concerning analysis of a published breast cancer data set.
The authors acknowledge the support of grants from the NSF (DMS 0102227 and 0342172), and also the support of SAMSI and Duke University (H.T.) during the 2003-04. Any opinions, findings and conclusions or recommendations expressed in this work are those of the authors and do not necessarily reflect the views of the NSF or NIH.
This paper is published in the Journal of Computational and Graphical Statistics, 2007, 16, 44-66 and available here . The published paper is a much revised version of this original (2005) working paper.
Software implementing the tree model search and analysis approaches in this paper is also freely available here.