Fuzzy decision trees

Fuzzy decision trees. Alberto Suárez (1,2) and James F. Lutsko (2) (1) Escuela Técnica Superior de Informática Universidad Autónoma de Madrid Cra. de Colmenar Viejo Km. 15 28048 Madrid Spain e-mail: alberto.suarez@ii.uam.es phone: +34-1-3975543 fax: +34-1-3975277 (2) Expert Systems Applications and Development Group. Katholieke Universiteit Leuven. de Croylaan 46 B-3001 Heverlee-Leuven Belgium

Contents

Introduction.
Construction of fuzzy decision trees.
Experiments.
Bibliography.

Introduction

The introduction of a real-valued degrees of membership [1] has been carried out with success in the area of unsupervised learning, in particular in the clustering problem [2]. The principal advantage to be obtained by the elimination of the Boolean restriction for the cluster memberships is that the combinatorial problem of grouping elements into crisp clusters is transformed into an analytic problem that can be tackled with the help of the tools of analysis.

In the area of supervised learning there is a sharp separation between connectionist approaches, such as neural networks and symbolic approaches, such as decision trees. Analyticity is exploited in neural networks in order to devise powerful learning algorithms, such as back-propagation, that in principle lead to globally-optimal solutions in terms of a collection of parameters (weigths, thresholds) that characterize the operation of the basic network units and their connections. The resulting representation is flexible and robust, but not easily amenable to interpretation. On the other hand, decision trees are generated by combinatoric methods that lead to representations that are transparent and easily interpretable, but which suffer from rigidity, specially for regression problems, and sensitivity to the data used in their construction.

It would be desirable to maintain the ease of interpretation of a decision tree while increasing the flexibility of its representation, so that analytic tools could be used in their construction. By including fuzzy tests in the decision tree generation, we are increasing its expressive capability and incorporating into its construction characteristics of connectionist methods that are useful to generate a learning device that is flexible, robust and capable to generalize. Furthermore, the continous nature of the representation allows us to make use of analytic calculus to design learning algorithms, similar to the back-propagation algorithm, for the construction of globally optimal trees.

Construction of fuzzy trees

A decision tree consists of a collection of nodes diposed in a hierarchical manner. At the top of the hierarchy stands the root node, which englobes all of the data in the training set. Each of the nodes composing a tree corresponds to a subset of examples of the training data. The membership of a particular data point to a given node is determined by a test that is made on the node that is immediatly superior in the hierarchy, the parent node. We assume for simplicity that there are D continuous (ordinal) predictor variables , which can be collected in a vector x = { x_{1 ,}x_{2, ...,}x_D}. Nominal variables can be included without difficulty. In a binary tree, a given parent node generates two children nodes characterized by complementary sigmoidal membership functions, which depend on three types of parameters: The parameter c defines the linear combination of attributes c·x that is involved in the test; parameter a specifies the center of the split; finally the quantity w determines the width of the split. The reasons why sigmoidal fuzzy splits have been chosen is that in the limit w ® 0 the usual multivariate crisp splits (c·x < a) are recovered. Thus only one extra parameter per split is introduced by fuzzification.

The basic architecture of the tree is fixed by growing a crisp tree by means of a traditional decision tree generator [3, 4]. The fuzzy splits are initially chosen to have their center at the value for the theresholdof the corresponding crisp split, but with a non-zero width. These initial values are refined by means of a learning algorithm analogous to back-propagation in neural networks. All the terminal nodes in a fuzzy tree is invoved in a given prediction. This stands in contrast with the crisp case, where a only one terminal node is active during a single prediction. This global character of the prediction, reminiscent of the global character of connectionist methods, is the key to the improvement of the tree's performance by fuzzification.

Experiments

In order to illustrate the fuzzification methodology described abvove, let us examine the results obtained in a regression problem proposed in Ref. [5]. The problem has two predictor variables and a dependent variable containing some normally distributed noise (s = 0.1)

Attributes	Dependent variable
x₁= a²; x₂= (a²–0.5)²	y = a + noise

[

]

The crisp tree is generated with a training set of 100 points by the CART algorithm with univariate splits [2]. The CART tree is then fuzzified by replacing the crisp splits by sigmoidal fuzzy splits of non-zero width. The parameters are then optimized by a back-propagation algorithm. The root mean square error (RMS), as estimated from a training set with $100$ samples, is presented in table I for 10 realizations of the data.

Size of tree	RMS / s(noise) Crisp tree	RMS / s(noise) Fuzzy tree
4	1.2738	0.9975
5	1.2029	1.0677
6	1.0639	1.0241
5	1.0872	1.0027
5	1.1875	1.0462
5	1.2583	1.0270
6	1.1410	1.0962
5	1.3566	1.1658
6	1.2511	1.0462
6	1.2054	1.0374

Table I. Results for 10 realizations of the data. Results are normalized by the standard deviation of the noise in the original data. Therefore, a result of a normalized RMS in the neighborhood of 1 indicates a good quality fit. Table I shows how fuzzification improves the performance of the tree upon fuzzification. More detailed results for other regression and
classification problems will be presented.

Bibliography

[1] L. Zadeh. Fuzzy sets. Information and Control, 8:338-353, 1965.

[2] J. Bedzek. Fuzzy Models for Pattern Recognition, S. Pal ed. IEE Press, NY, 1992.

[3] L. Breiman, J. H.. Friedman, R. A. Olshen, and C.J. Stone. Classification and Regression Trees. Chapman & Hall, NY, 1984.

[4] J.R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81-106, 1986.

[5] V. Cherkassky and F.Mulier. Statistical and neural network techniques for nonparametric
regression. In Selecting Models from Data, P. Cheeseman R.W. Oldford eds. pages 383--392, Springer-Verlag, NY, 1994.

Back to beginning of document