A SUPERVISED NEURAL ARCHITECTURE FOR RECOGNITION OF COLORED AND TEXTURED VISUAL STIMULI, BASED ON MULTI-SCALE PROCESSING.

Ángel T. Pérez-Camarero, Valentín Benavides, Francisco Díaz-Pernas

A neural architecture for the segmentation and recognition of colored and textured visual stimuli is presented. It borrows fundamental principles of biological visual processing in human and animals, those which explain the segmentation of a visual scene and the recognition of the elements within the scene. Thus, each stage making the architecture up has a well-defined analogy to those parts of the Human Visual System. Its functions happen to be modeled at that stage. We propose an architecture composed by three main stages: a color opponent stage, a preattentive stage, and an attentive stage. The color opponent stage takes a colored and textured image as input, and transforms the original chromatic information into simple and double color opponent signals. These signals are used by the preattentive stage to make the segmentation of the scene. It implies an extraction of the contours, not only real contours but also illusory ones, and an extraction of the chromatic features of each contour limited region to activate featural filling-in processes. This stage is based on the Boundary Contour System and Feature Contour System (BCS/FCS) models of S. Grossberg and E. Mingolla, since they offer us an excellent perceptual segmentation mechanism. The attentive stage is used for recognition. It is based on the Adaptive Resonance Theory (ART) of G. Carpenter and S. Grossberg.

Let us describe the main features of each stage. At the color opponent stage, On-center Off-surround antagonistic interactions generate retinal color opponent processes and cortical double opponent processes. To model simple and double color opponency, a dynamic competitive model is proposed. Therefore, a transformation of the original chromatic information is produced, so we obtain some signals much more useful for a good segmentation.

The preattentive stage firstly contrast-enhances that signals through On-center Off-surround competitive processes. Two spatial scales are used according to the type of processing. Thus, small receptive fields, which are associated with high spatial resolution, are used to enhance boundary information useful to extract edges. Large receptive fields, which are associated with low spatial resolution and high temporal resolution, are used to initiate the chromatic feature spreading in filling-in processes. Once the preliminary processing is finished, the extraction of the contours begins. Firstly, we have simple and complex cells phases that extract the real contours. The simple cells are sensible to orientation, amount and position of the textural and chromatic feature changes. Simple cells are modeled as Gabor filters. Two opposed polarity odd-even receptive fields pairs are used. Moreover, we propose a multi-scale model for simple cells phase. Cells with a lower spatial resolution are only sensible to strong feature contrasts, while cells with a higher spatial resolution can detect the smallest featural variations within the scene. It is clear that if we only use the first cells, it is probable a loss of information from contours too light but essential for the correct segmentation of the scene, and if we only use the second cells, noise within the scene can be interpreted as a real contour. Therefore, a good solution is the use of cells with different spatial resolutions, in order to combine adequately its responses and to obtain thus, an optimum signal-noise ratio. Since simple cells are contrast direction dependent, a complex cells phase is needed to integrate the information coming from the simple cells phase, obtaining contrast direction independence. To extract the illusory contours, a feedback cycle, composed by competitive and cooperative stages, is used. Once all contours have been extracted, the filling-in processes begin. Each chromatic feature spreads in all directions, except in those where it finds a strong contour signal.

The preattentive stage outputs are introduced in the attentive stage. The attentive recognition stage is composed by a pattern generator and a Fuzzy-ARTMAP neural network model based on Adaptive Resonance Theory (ART). The pattern generator scans the image sequentially and generates a pattern at every spatial position using the signals generated by the preattentive stage. In the attentive stage, each point within the scene is assigned to a region according to gaussian weighted values of the chromatic segmentation. These patterns are introduced in the recognition network that makes use of a supervised Fuzzy-ARTMAP model based on fuzzy logic. This neural network performs a categorization of the scene regions, according to their attributes. ART models can learn new patterns without forgetting learned patterns. It is the main reason for the use of them. In supervised models there is an expert, who supervises the behaviour of the neural network training. The expert is very important in this type of model, since he must decide about the good or bad behaviour of the system. This model can discern between similar patterns and can group different patterns in the same region. That makes our system very versatile.

To sum up, in the proposed architecture, we can identify two stages for visual pattern recognition, namely, a perceptual preattentive segmentation of the visual scene followed by a local attentive recognition within a particular visual context. They provide a mechanism for segmentation, categorization and recognition of images from different classes, based on principles of perception and pattern recognition.