Today I read a paper titled “The Bayesian Decision Tree Technique with a Sweeping Strategy”
The abstract is:
The uncertainty of classification outcomes is of crucial importance for many safety critical applications including, for example, medical diagnostics.
In such applications the uncertainty of classification can be reliably estimated within a Bayesian model averaging technique that allows the use of prior information.
Decision Tree (DT) classification models used within such a technique gives experts additional information by making this classification scheme observable.
The use of the Markov Chain Monte Carlo (MCMC) methodology of stochastic sampling makes the Bayesian DT technique feasible to perform.
However, in practice, the MCMC technique may become stuck in a particular DT which is far away from a region with a maximal posterior.
Sampling such DTs causes bias in the posterior estimates, and as a result the evaluation of classification uncertainty may be incorrect.
In a particular case, the negative effect of such sampling may be reduced by giving additional prior information on the shape of DTs.
In this paper we describe a new approach based on sweeping the DTs without additional priors on the favorite shape of DTs.
The performances of Bayesian DT techniques with the standard and sweeping strategies are compared on a synthetic data as well as on real datasets.
Quantitatively evaluating the uncertainty in terms of entropy of class posterior probabilities, we found that the sweeping strategy is superior to the standard strategy.