This work is primary interested in the problem of, given the observed data, selecting a single decision (or classification) tree. Although a single decision tree has a high risk to be overfitted, the induced tree is easily interpreted. Researchers have invented various methods such as tree pruning or tree averaging for preventing the induced tree from overfitting (and from underfitting) the data. In this paper, instead of using those conventional approaches, we apply the Bayesian evidence framework of Gull, Skilling and Mackay to a process of selecting a decision tree. We derive a formal function to measure ‘the fitness’ for each decision tree given a set of observed data. Our method, in fact, is analogous to a well‐known Bayesian model selection method for interpolating noisy continuous‐value data. As in regression problems, given reasonable assumptions, this derived score function automatically quantifies the principle of Ockham’s razor, and hence reasonably deals with the issue of underfitting‐overfitting tradeoff.

This content is only available via PDF.
You do not currently have access to this content.