ABSTRACT: Introducing biologically motivated features in models for learning has usually a double role: testing hypotheses for biological learning and finding hints to be applied in developing learning procedures for artefacts. The latter point of view is valid also in the case where for the specific problem algorithms directly developed bar any biological leaning may be more effective: the complexification of the problems may, however, soon make these efficient algorithms unefficient, while the biologically motivated hints may turn out to be fruitful.
In the work presented here we take this attitude and consider a number of learning aspects which appear biologically motivated. The latter is primarily understood functionally. Specifically we shall consider a learning model called here, for short, ``statistical learning", as involving the following features:
1. implementation by random action in a structured environment;
2. controll by non-specific reinforcement;
3. reinforcement working via the normal, internal activity of the system (requires no ``external computation").
We provide results concerning the properties of ``statistical learning" as defined above obtained in various simulations. Notice that such a model corresponds neither to the unsupervised algorithms (where only those patterns - or situations, experiences, etc - are presented which should be positively learned) nor to the supervised ones (where each answer is judged by a ``teacher") - but implies in fact a different point of view: the system performes complicated actions and finds out whether as a result it fulfills or not a certain task. An early work has dealt with these questions in the simulation of a device moving on a board and noticing the spatial situations and its own moves taken on a probabilistic basis (Mlodinow and Stamatescu 1985). The device realizes therefore a biased random walk, the bias being obtained by trying to recognize situations and considering the ``goodness" (see below) of the previously taken, corresponding moves. There is absolutely no structure presupposed in the behavior of the device, beyond the sheer urge to move (completely at random to start with), the structure is fully hidden in the environment - see 1. above. The reinforcement (positive or negative) itself is global, it is associated to the results of long chains of moves (e.g., finding a certain place) and is assigned equally, undiscriminately to all moves in the chain to modify their ``degree of goodness" (see 2. above). Under these conditions the device shows a number of interesting, quantifiable features: ``flexible stability" (its behavior fluctuates around a solution - path to the goal - without loosing it, unless a better solution is found - usually as a result of the fluctuations); ``development" (in the course of the training on harder and harder problems, solutions to the simpler problems are taken over and applied to subproblems of the complex case); ``alternatives handling" (in a continuously changing environment the device developes alternative solutions, which it dinguishes by simple cues found in the environment); ``learning from success and failure" (all experiences contribute in learning); etc.
This simulation produced therefore evidence for the realizability of the model of ``statistical learning" as mentioned above. However the last point (3.) appeared less clear, since the reinforcement, although simple, did not proceed via the activity of the system itself. On the other hand a neural network realization, which appears more natural, has also a special interest by itself. In this context many features of relevance for this model have already found substantiation (see, e.g., Watkins 1989, Bremermann and Anderson 1989, Chialvo and Bak 1997, to quote only few examples beyond the general results for neural networks learning). In preliminary analyses we have dealt with partial aspects, e.g. the effectiveness of learning rules derived from the Hebb rule for non-specific reinforcement in multi-layer perceptrons (Kühn and Stamatescu 1997).
The present work completes the analyses of the learning rules for non-specific reinforcement in neural networks and considers a realization of the full model as specified by the points 1.-3. above in a neural network simulation. This program is achieved by first performing an analysis of the convergence of non-specific reinforcement learning rules for various networks and with an acceptable statistics. We consider multi-layer, feed forward lattices and connected lattices with non-linear response neurons. We study two types of non-specificity: The one corresponding to the layers in multi-layer perceptrons (without using back-propagation - type or hierarchical algorithms); And the one introduced by using only the average performance of a series of trials in defining the reward. For the different settings we study the convergence of the learning algorithms as function of: - the activity level; - the temperature (amount of noise); - the learning parameters, and - the reinforcement parameters. Previous results (Stamatescu 1996, Kühn and Stamatescu 1997) suggested as usefull the introduction of a certain amount of ``positive habituation" at each step without consideration of the final result. The interplay between this and the (non-specific) reinforcement is also studied. In a second part a neural network simulation of a device behaving according to the ``statistical learning" model above is considered. Preliminary results show that a rather simple neuronal architecture already can account for many of our requirements. Although the tasks with which the device has to cope are rather simple, this simulation not only illustrates the model but also permits to analyze the interplay between its different ingredients. The paper gives an account of this work.
Bremermann, H.J. and Anderson, R.W. (1989): ``An alternative to back propagation", Technical report, U.C. Berkeley Center for Pure and Applied Mathematics PAM-483.
Chialvo, D.R. and Bak P. (1997): ``Learning from mistakes", adap-org/9707006.
Kühn, R. and Stamatescu, I.-O. (1997):``Statistical Learning for Neural Networks", contribution to the workshop Fuzzy Logic, Fuzy Control and Neural Networks, ZiF, Bielefeld, April 7-11, 1997 (unpublished).
Mlodinow, L. and Stamatescu, I.-O. (1985): ``An evolutionary procedure for machine learning", Int. Journal of Computer and Inform. Sciences, 14, 201.
Stamatescu,I.-O. (1996): ``The Neural Network Approach" in Proceedings of the International Conference on the Philosophy of Biology, Vigo, 1996
Watkins, C.J.C.H. (1989): ``Learning from delayed rewards", Ph.D.Thesis