Vector Quantization Techniques for Pattern Recognition

Batuhan Ulug, William E. Pierson, Stanley C. Ahalt

Department of Electrical Engineering
2015 Neil Avenue
The Ohio State University
Columbus, OH 43210
USA

email: ulugb@er4.eng.ohio-state.edu
Tel: 1-614-292-6502
Fax: 1-614-292-7596

ABSTRACT: In this paper we review existing techniques used to train Vector Quantizers (VQ) for supervised learning. We propose a new set of algorithms which can be used train VQ classifiers and compare their performance to other prominent VQ classification algorithms. Additionally, we discuss generalizations of ordinary VQ classifiers to more complex forms of non-polytopal partitions, and discuss training algorithms for these generalized classifiers.

Pattern recognition theory is the field of applied mathematics in which principles and methods are developed and implemented to perform classification and identification of objects, phenomena, processes, and signals. Here we designate an object as a mathematical primitive that can be specified by a finite set of features, or properties, which characterize the objects [Mathematical Encyclopedia, 1984].

There are two main approaches typically used to solve pattern recognition problems: the statistical approach and the syntactic approach. Statistical approaches, which are the focus of this research, assume stochastic models of feature variables under the assumption that the data can be described by a probabilistic model. In contrast, syntactic approaches usually rely on fixed structural relationships between the constituent groups of the considered feature variables, although stochastic structures are possible [Fu and Booth, 1986]

There are basically two broad classes of problems [D.J. Hand, 1997] considered in statistical pattern recognition: supervised and unsupervised learning. In supervised learning (classification) the classes of the objects are known a priori (prespecified by a ``teacher'' or ``supervisor'') and the goal is to formulate ``good'' rules to assign objects to their appropriate classes. In contrast, the aim of unsupervised learning (clustering) is to formulate a class structure, i.e. to decide how many classes there are as well as assigning objects to their appropriate classes.

In this paper, we will investigate supervised learning and in particular the application of Vector Quantization (VQ) techniques to supervised learning. It should be noted that while VQ techniques are generally considered unsupervised, the work presented here is more closely related to recent work by Kohonen and Gray, and is a departure from strictly unsupervised techniques.

For supervised learning, Bayes decision theory develops the best (in terms of probability of error, P_e or more generally Bayes risk, R) decision making process under complete knowledge of the underlying probabilistic model. However, in applications, complete knowledge of the underlying statistics is seldom the case. Instead, what is typically available is a training set, T = {(x₁,y₁),(x₂,y₂), ... ,(x_N,y_N)} consisting of N feature vectors, x_i and their class labels, y_i.

Under the umbrella of Bayes decision theory and Bayes' theorem, it appears that there are basically two main approaches to designing a classifier using the training set in supervised learning. The first approach is to estimate the class conditional probabilities, p(x|w_i) and the a priori class probabilities, $p(w$ _i). Alternatively, one can estimate the posterior probabilities, p(w_i|x) directly. The first approach can be loosely called informative learning while the latter can be considered as discriminative in nature.

Furthermore, both the informative and the discriminative learning approach can be performed via parametric or nonparametric techniques. In parametric techniques, a form (or a model), such as Gaussian, uniform, or linear, quadratic, etc. for what is being estimated is assumed (or sometimes known) and the parameters of the model are then estimated using the training set. In comparison, the nonparametric techniques obtain the desired probability estimates from the training set without assuming a specific form for the estimated quantities.

For classifier design in supervised learning with a finite amount of training data, generalization is one of the most important considerations, if not the most important. Generalization is a measure of how well the classifier designed using the training set will do on previously unobserved (test) data. For this purpose, classifier complexity (as measured by, e.g., the Vapnik-Chervonenkis (VC) dimension or more crudely by the number of classifier parameters) must be neither too high nor too low so that good generalization is attained on previously unobserved data, without, respectively, overfitting (memorizing) or underfitting the training data set.

In parametric techniques the complexity of the classifier can be typically controlled by the choice of the model family and the model order (number of parameters to be estimated). In contrast, in nonparametric classifiers complexity may be controlled in slightly more indirect or nonintuitive ways, sometimes via so called smoothing parameters.

The k-nearest neighbor classifier is the quintessential nonparametric discriminative classification algorithm. For k=1, the nearest neighbor classifier corresponds to a VQ classifier with the whole training set as the codebook. As such it is of high complexity and cannot achieve the Bayes error even asymptotically (as the size of the training set goes to infinity). Thus, clever selection of a smaller codebook for the VQ classifier is a way of reducing the complexity of this technique (for improved generalization) as well as increasing efficiency and processing speeds as a result of the reduction in storage requirements.

VQ has its origins in signal processing where it is used for compact, accurate representation of input signals or images. However, since VQ results in a partitioning of the input space, it can also be used in pattern recognition. While VQ's use in pattern recognition has been mostly confined to unsupervised learning, it can and has also been used for supervised learning, this becoming more popular especially after the introduction of Kohonen's Learning Vector Quantizer (LVQ) algorithm.

The VQ classification problem can be described as follows. A training set T= {(x_i,y_i), i= 1,2, ... ,N} where x_i in Rⁿ are feature vectors and y_i in {1,2, ... ,L} are class labels is observed. The training set is a sample from a random process {(X_i, Y_i),i=1,2, ... } and the individual (X_i,Y_i) are assumed to have a common but unknown distribution F_XY(x,y). The goal is to design, based on the training set T, a VQ classifier, consisting of a codebook with labels, for the observation X that provides a good estimate of the class label, Y associated with X.

In this paper we review existing techniques to train VQs for supervised learning and propose new algorithms to train VQ classifiers and compare their performance to other prominent VQ classification algorithms to-date such as Xie's independent quantizer/classifier design, Kohonen's LVQ and Gray's Bayes Risk weighted VQ classifier. In addition generalizations of the ordinary VQ classifier extending the polytopal Voronoi regions (induced by the L₂-norm) to more complex forms are considered.