TRAINING FUNDAMENTAL PRINCIPLES AND BLOCK TRAINING OF LAYERED NEURAL NETWORKS

A. Navia-Vázquez and Aníbal R. Figueiras-Vidal

ATSC/DTC, Univ. Carlos III de Madrid

C/ Butarque, 15,

28911 Leganés, Madrid, Spain

Phone: + 34 1 624 99 03 / + 34 1 624 99 23

Fax: + 34 1 624 94 30

E-Mail: navia@ing.uc3m.es / arfv@ing.uc3m.es

**KEYWORDS** : Training, Block, Sensitivity, Selection, Generalization.

Extended Summary

The difficulty of artificial learning is universally accepted, both for rule-based and learning-by-example situations; however, there are some fundamental principles that can be applied (at least conceptually) in all the cases.

Among them, that saying that difficult cases require more attention or work have appeared in many different forms: from selecting the simpler rule in conflicts to concentrate more on limit or difficult samples ([1] declares this principle in its title); the Occam Razor appears when saying that architectures must be as simple, or, parameters as few as possible, etc.; this being the base for selecting, growing or pruning; combining simple machines in modular schemes [2] or committees [3] follows analogous reasons; and so on. No less important is to keep the machine parameters inside reasonable margins, as suggested by generalization results [4] and forced in some recent powerful approaches to inference machine design [5].

When considering the particular, (but representative) case of layered neural networks, training presents additional difficulties due to the need of applying "chained" algorithms, such as the Backpropagation rule. An alternative solution is to proceed layerwise (i.e., the global problem is decomposed into minimizations at every layer): an example of this approach can be found in [6], where, after applying the inverse activation function to the output values, Least Squares minimizations are solved to obtain optimal weights and to propagate targets to the previous layer.

Unfortunately, as analyzed in [7], this layerwise block implementation presents some problems concerning convergency and generalization capability of the system. Nevertheless, the performance of this training algorithm can be improved if we reduce the influence of less meaningful patterns: this is the solution proposed in [7] by solving at every layer weighted minimizations, the weighting values being proportional to the sensitivity of every pattern.

With this reduced sensitivity approach, we are implicitly incorporating a sample selection strategy (which benefits the learning process), as well as obtaining weights of reasonable size by using minimum norm solutions (which improves generalization); additionally, a direct relationship between the reduced sensitivity algorithm and other efficient training approaches can also be established [8]. All these interesting properties encourage us to propose, as further work, extensions of the algorithm to other learning machines.

A more detailed description will be provided in the full length paper, as well as some simulation examples and further discussion of the results.

REFERENCES

[1] P.W. Munro: "Repeat Until Bored: A Pattern Selection Strategy"; in J.E. Moody et al. (eds.): *Advances in Neural Information Processing Systems 4*, pp. 1001-1008; San Mateo, CA: Morgan Kaufmann; 1992.

[2] M.I. Jordan, and R.A. Jacobs: "Hierarchical Mixtures of experts and the EM Algorithm"; *Neural Computation*, vol. 6, pp. 181-214; 1994.

[3] A.J.C. Sharkley (ed.): "Combining Artificial Neural Nets: Ensemble Approaches"; *Special Issue of Connection Science*, vol. 8, no. 3 & 4; 1996.

[4] P.L. Bartlett: "For Valid Generalization, the Size of the Weights is More Important than the Size of the Network"; in M.C. Mozer et al. (eds.): *Advances in Neural Information Processing Systems 9*, pp. 134-40; San Mateo, CA: Morgan Kaufmann; 1997.

[5] K.-K. S. Schölkopf, C.J.C. Burges, F. Girosi, P. Nigoyi, T. Poggio, and V. Vapnik: "Comparing support vector machines with Gaussian kernels to radial basis function classifiers"; *IEEE Trans. on Signal Processing*, vol. 45, no. 11, pp. 2758-2765; 1997.

[6] F. Biegler-König, and F. Bärmann: "A Learning Algorithm for Multilayered Neural Networks based on Linear Least Squares Problems"; *Neural Networks*, vol. 6., pp. 127-131; 1993.

[7] A. Navia-Vázquez, and A.R. Figueiras-Vidal: "Block Training Methods for Perceptrons and Their Applications"; *Proc. SPIE Aerosense Intl. Conf.: Applications of Artificial Neural Networks II*, Orlando, Florida, USA, vol. 3077, pp. 600-610; 1997.

[8] S.-I. Amari: "Natural gradient works efficiently in learning"; *Neural Computation*, vol. 10, no. 2, pp. 251-276; 1998.