An Unified Perspective of Blind Source Separation Adaptive Algorithms

Sergio Cruces*
TSC Group, Escuela de Ingenieros
Universidad de Sevilla
Camino de los Descubrimientos s/n
41092 Sevilla, SPAIN
E-mail: sergio@viento.us.es
Tel/Fax: ++34 5 4487333.

Andrzej Cichocki
Brain Information Processing Group
Frontier Research Program, RIKEN
Wako-Shi, Saitama 351-01, JAPAN
E-mail: cia@hare.riken.go.jp
Tel: ++81 48 4679668
Fax: ++81 48 4679686.

Luis Castedo
Facultad de Informática
Universidad de La Coruña
Campus de Elviña s/n
15.071 A Coruña, SPAIN.
e-mail: luis@des.fi.udc.es
Tel: ++34 81 167150
Fax: ++34 81 167160.

Blind source separation is a fundamental problem in signal processing that consists in separating a linear mixture of non-Gaussian signals. This problem is termed blind because it is only assumed that the sources are statistically independent and the mixing system invertible. Blind source separation is strongly related to neural networks because, due to the Darmois-Skitovich theorem [1], sources are recovered if and only if the outputs of the separating system are statistically independent. Therefore, unsupervised learning rules that seek statistical independence between the neuron outputs are valid algorithms for source separation. Moreover, it has been shown [2] that information transfer in a single layer neural network is maximized when the outputs become statistically independent.

Since the pioneering work of Jutten and Herault [3], a lot of adaptive algorithms for blind source separation have been proposed. Algorithms are derived from a number of different points of view such as contrast functions [1], non-linear principal component analysis [4], information transfer maximization [5], etc ... In this paper we present an unified approach to adaptive blind source separation in which most of existing learning algorithms are obtained as particular cases of a more general algorithm.

Let us consider a linear neural network in which the output y and the input x are related through y = Wx being W the synaptic weights that will be interpreted as the separating system. We will start by stating an optimization problem which is the minimization of the following cost function with respect to the inverse of the separation matrix, W-1,

C = ||Wo-1E[f(y)gH(y)]-W-1||2F
(1)

Here f(y) and g(y) denote two non-linear functions of the output vector y, ||·||F is the Frobenius norm of a matrix and Wo is the exact separation matrix at which sources are recovered. The reason for introducing this optimization problem is that when source separation is achieved E[f(y)gH(y)] = I (I denotes the identity matrix) and the cost function C vanishes. Note that although C involves the exact separation matrix, Wo, this does not constitute a practical limitation since in the learning algorithms it can be substituted by its current estimate.

In the paper we will show that, under some mild conditions (similar to those found in Bussgang techniques for blind equalization [11]), the resulting Gauss-Newton adaptive algorithm that minimizes C has the form

Wn+1 = Wn( (1-m)I + m f(yn)gH(yn) )-1
(2)

where m is the algorithm step-size. Note that this is a stochastic algorithm where the expectations have been dropped and the exact separation matrix Wo has been replaced by its current estimate Wn.

The main limitation of the above algorithm is that it requires a matrix inversion at each iteration. However, this matrix inversion can be avoided using different matrix inversion formulas such as the Sherman-Morrison and the Woodbury formulas. Moreover, in the paper it will be shown that many existing blind source separation algorithms can be interpreted as particular cases of (2) when different inversion formulas and nonlinearities are selected. The algorithms that fit into our type are the decorrelation formula independently proposed by Almeida et al. [6] and Cichocki et al. [7], the natural gradient algorithm for ICA [8], the equivariant adaptive algorithm [9,7] and the non-linear PCA algorithm [4]. It is also important to note that the learning algorithm (2) not only generalizes some of the existing ones, but also gives some theoretical clues for the selection of an appropiate adaptation step-size. These clues closely correspond to those empirically found in practice.

Recently, Amari and Cardoso have shown in [12] that the best local estimating function (f(y)gH(y)) is obtained when one of both functions f(·) or g(·) is linear, i.e., when f(y) = y or g(y) = y. Nevertheless, they also point out that their results characterize only the local asymptotic behaviour while other estimating functions, with two non-linearities, may have preferred global properties.

Finally, we will also present a stability analysis of the general algorithm (2). We will obtain three necessary and sufficient conditions that the nonlinearities f(·) and g(·) have to satisfy in order to ensure asymptotic stability of the proposed algorithm. The resulting conditions generalize the results obtained in [9] and [10].

References

[1]
P. Comon, ``Independent Component Analysis, A New Concept?'', Signal Processing, vol. 36, pp. 287-314, April 1994.
[2]
J. P. Nadal, N. Parga, ``Non Linear Neurons in the Low Noise Limit: a Factorial Code Maximizes Information Transfer'', Network, vol. 5, pp. 565-581, 1994.
[3]
C. Jutten, J. Herault, ``Blind Separation of Sources, PartI: An Adaptive Algorithm Based on Neuromimetic Architectures'', Signal Processing, vol. 24, pp. 1-10, 1991.
[4]
J. Karhunen, J. Joutsensalo, ``Representation and Separation of Signals Using Nonlinear PCA Type Learning'', Neural Networks, vol. 7, no. 1, pp.113-127, 1994.
[5]
A. J. Bell, T. J. Sejnowski, ``An Information Maximization Approach to Blind Separation and Blind Deconvolution'', Neural Computation, 7, pp. 1129-1159, 1995.
[6]
L. B. Almeida, F. M. Silva, ``Adaptive Decorrelation". Artificial Neural Networks (Elsevier), vol. 2, pp. 149-156, 1992.
[7]
A. Cichocki, R. Unbehauen, ``Robust Neural Networks with On-Line Learning for Blind Identification and Blind Separation of Sources", IEEE Trans. on Circuits and Systems-I Fundamental Theory and Applications, vol. 43, no. 11, pp. 894-906, 1996.
[8]
S. Amari, S.C. Douglas, A. Cichocki, H.H. Yang,``Novel on-line adaptive learning algorithms for blind deconvolution using the natural gradient approach", in Proc. IFAC Symp. Syst. Identification, Kitakyushu-shi, pp. 1055-1062, 1997.
[9]
J. F. Cardoso, B. Laheld, ``Equivariant Adaptive Source Separation'', IEEE Transactions on Signal Processing, vol. 44, pp. 3017-3030, Dec. 1996.
[10]
S. Amari, T. Chen, A. Cichocki, ``Stability Analysis of Learning Algorithms for Blind Source Separation'', Neural Networks, vol. 10, pp. 1345-1351, Nov. 1997.
[11]
Simon Haykin Editor, ``Blind Deconvolution", Prentice Hall, Third edition, 1994.
[12]
S. Amari, J.F. Cardoso, ``Blind Source Separation- Semiparametric Statistical Approach", IEEE Trans. on Signal Proc., vol. 45, no. 11, pp. 2692-2697, Nov. 1997.


* Corresponding author.