*
*

*
*

*
ABSTRACT: Introducing biologically motivated features in
models for learning has usually a double role: testing hypotheses
for biological learning and finding hints to be applied in
developing
learning procedures for artefacts. The latter point of view is valid
also in the case where for the specific problem
algorithms directly developed bar any biological
leaning may be more effective: the complexification of the problems may,
however,
soon make these efficient algorithms unefficient, while the biologically
motivated hints may turn out to be fruitful.
*

*In the work presented here we take this attitude and consider a number of
learning aspects which appear biologically motivated. The latter
is primarily understood functionally. Specifically we shall consider a learning
model called here, for short, ``statistical learning", as
involving the following features:*

*1. implementation by random action in a structured environment;*

*2. controll
by non-specific reinforcement;*

*3. reinforcement working via the normal, internal activity
of the system (requires no ``external computation").*

*We provide results concerning the properties of ``statistical learning"
as defined above obtained in various simulations. Notice that such a model
corresponds neither to the unsupervised algorithms (where only those patterns
- or situations, experiences, etc - are presented which should be positively learned) nor to the supervised ones
(where each answer is judged by a ``teacher") - but implies in fact a different
point of view: the system performes complicated actions and finds out whether
as a result it fulfills or not a certain task.
An early work has dealt with these questions in the simulation of a device
moving on a board and
noticing the spatial situations and its own moves taken on a
probabilistic basis (Mlodinow and Stamatescu 1985). The device realizes
therefore a biased random walk, the bias being obtained by trying to
recognize situations and considering the ``goodness" (see below) of the previously
taken, corresponding moves. There is absolutely no structure presupposed
in the behavior of the device, beyond the sheer urge to move
(completely at random to start with), the structure is fully
hidden in the environment - see 1. above. The
reinforcement (positive or negative) itself is global, it is associated
to the results of long
chains of moves (e.g., finding a certain place)
and is assigned equally, undiscriminately to all
moves in the chain to modify their ``degree of goodness" (see 2. above). Under these conditions the device shows a
number of interesting, quantifiable features: ``flexible stability"
(its behavior fluctuates around a solution - path to the goal - without loosing it, unless
a better solution is found - usually as a result of the
fluctuations); ``development"
(in the course of the
training on harder and harder problems, solutions to the simpler problems are
taken over and applied to subproblems
of the complex case); ``alternatives handling"
(in a continuously changing environment the device
developes alternative solutions, which it dinguishes by simple cues
found
in the environment); ``learning from success and failure" (all
experiences contribute in learning); etc.
*

*This simulation produced therefore evidence for the realizability of the
model of ``statistical learning"
as mentioned above. However the last point (3.) appeared less
clear, since the reinforcement, although simple, did not proceed
via the activity of the system itself. On the other hand a neural network
realization, which appears more natural, has also a special interest by itself.
In this context many features of
relevance for this model have already found
substantiation
(see, e.g., Watkins 1989, Bremermann and
Anderson 1989,
Chialvo and Bak 1997, to quote only few examples beyond the general
results for neural networks learning). In
preliminary analyses we have dealt
with partial aspects, e.g. the effectiveness of learning rules derived
from the Hebb rule for non-specific reinforcement in multi-layer
perceptrons (Kühn and Stamatescu 1997).
*

*The present work completes the analyses of the
learning rules for non-specific
reinforcement in neural networks and considers a
realization of the full model as specified by the points 1.-3. above
in a neural network simulation. This program is achieved by
first performing an
analysis of the convergence of non-specific reinforcement learning rules for
various networks and with an acceptable statistics.
We consider multi-layer, feed forward lattices and
connected lattices with non-linear response neurons.
We study two types of non-specificity: The one corresponding to the layers
in multi-layer perceptrons (without using back-propagation - type
or hierarchical algorithms); And the one introduced by using only the
average performance of a series of trials in defining the reward.
For the different settings
we study the convergence of the learning algorithms as function of:
- the activity level; - the temperature (amount of noise);
- the learning parameters, and - the reinforcement parameters.
Previous results (Stamatescu 1996, Kühn and Stamatescu 1997)
suggested as usefull the introduction of a certain amount
of ``positive habituation" at each step without consideration
of the final result. The interplay between this and
the (non-specific) reinforcement is also studied.
In a second part a neural network simulation of a device
behaving according
to the ``statistical learning" model above is considered. Preliminary
results show that a rather simple neuronal architecture
already can account for many of our requirements. Although
the tasks with which the device has to cope
are rather simple, this simulation not only illustrates the model
but also permits
to analyze the interplay between its different ingredients. The paper
gives an account of this work.
*

*
*

*
*

*
References
*

*
*

*
*

*
Bremermann, H.J. and Anderson, R.W. (1989): ``An
alternative to back propagation", Technical report, U.C. Berkeley Center
for Pure and Applied Mathematics PAM-483.
*

*
*

*
Chialvo, D.R. and Bak P. (1997): ``Learning from
mistakes",
adap-org/9707006.
*

*
*

*
Kühn, R. and Stamatescu, I.-O. (1997):``Statistical
Learning for Neural Networks", contribution to the workshop Fuzzy Logic, Fuzy Control and Neural Networks, ZiF, Bielefeld,
April 7-11, 1997 (unpublished).
*

*
*

*
Mlodinow, L. and
Stamatescu, I.-O. (1985): ``An evolutionary procedure for
machine learning",
Int. Journal
of Computer and Inform. Sciences, 14, 201.
*

*
*

*
Stamatescu,I.-O. (1996): ``The Neural Network Approach" in
Proceedings of the International Conference on the Philosophy
of Biology, Vigo, 1996
*

*
*

*
Watkins, C.J.C.H. (1989): ``Learning from delayed rewards", Ph.D.Thesis
*