Address: Instituto de Automática Industrial, Ctra Campo Real, Km. 0,200. La Poveda.

28500 Arganda del Rey. Madrid. Spain

Phone: (91) 8711900

Fax: (91) 8717050


Abstract. Nowadays there are many society sectors claiming for the necessity of performing an analysis of the massive data collected during the development of their activities. In general terms, the main goal of this analysis process is to learn from past experience. This paper describes an inductive learning strategy used to approach the task about knowledge extraction in an industrial environment where certain product batches are manufactured. Data consist of the transformation steps carried out on the product during manufacturing process and of the checked final quality. The application of a proper learning technique may give an explanation about what has happened and so may allow to establish future plans which can improve the performance of such activities. The data associated to the manufactured products in a certain time make up the examples used by the inductive learning strategy for generating new knowledge.

Although, historically, data analysis has been performed by statistical means, at present it is possible to apply new methods of learning and knowledge representation in several domains and to obtain results of great interest. Their successful application has caused the emergence of commercial tools incorporating inductive learning strategies that represent the knowledge by decision trees or rules. The main goal by using these tools is to detect relations among the variables contained into the collected examples. These tools, due to their generic features, usually use rigid learning techniques which, sometimes, make their application difficult in more than one learning task even in the same environment.

The implementation of a learning method in a non experimental environment, like a manufacturing industrial environment, requires the method to be flexible if it is to be effective. In such environment, there exist many variables presenting complex relations among them. Some of these relations are known by the domain expert and others are implicit. The existence of a specific domain knowledge may make the searching task of the learning strategy easy. The strategy flexibility is given by its ability for using this knowledge. But the knowledge domain is often not available. In this case, a procedure devoted to make implicit relations into explicit relations has been developed. This procedure uses stored manufacturing data and it builds a knowledge base looked up by the learning strategy. This knowledge is represented by generalization hierarchies among the value of the variables involved in the selected sample.

The strategy shown in this paper is based on a well-known inductive method. Its job is to induce the general description of a concept from examples and counterexamples of the target concept or class. According to this method, the procedure for generating a concept description begins with the expressions contained in a concept instance. Every expression has got a value that measures the validity of the expression in representing the instances of its class and no instances of the remaining ones. This value is calculated by using a preference criterion. Every valid expression is specialized until consistency is achieved, that is, until each expression does not cover instances belonging to another classes, and then it is generalized so that each one covers the maximum number of instances belonging to its class. The specialization procedure changes an expression by appending to it another expression of lower preference than its last term. The generalization procedure is carried out by applying the generalization rules from the strategy knowledge background.

The proposed strategy fulfills two tasks. One of them is to induce a complete and correct description of a given concept from the examples and counterexamples of the concept and the other is to construct the taxonomic description characterizing a set of non labeled examples. The obtained descriptions are symbolic descriptions which can be easily verified by a domain expert.

This strategy introduces a new feature with regard to the method mentioned above. The inductive process does not start by choosing a seed instance from the sample but also with the generalization hierarchies contained in the domain knowledge base. This change is a direct consequence from the accurate research about the application of this learning method in a real environment. So, the specialization procedure leads to valid expressions not only in the model of the learning strategy but also in the real environment. Starting the inductive process with the generalization hierarchies improves the searching procedure efficiency since it allows to reduce the number of candidate expressions candidate to be a general description of a concept before they may be evaluated by the preference criterion.

When this strategy is used to seek the taxonomic description for a set of instances, the procedure for finding the classes also makes use of the domain knowledge. In this case, the learning strategy utilizes the non-fulfillment of consistency condition in order to generate the different classes.

This learning strategy is included in a larger project dealing with the development of a set of tools for data mining and knowledge discovery in a manufacturing environment.