Blind source separation is a fundamental problem in signal processing that consists in separating a linear mixture of non-Gaussian signals. This problem is termed blind because it is only assumed that the sources are statistically independent and the mixing system invertible. Blind source separation is strongly related to neural networks because, due to the Darmois-Skitovich theorem , sources are recovered if and only if the outputs of the separating system are statistically independent. Therefore, unsupervised learning rules that seek statistical independence between the neuron outputs are valid algorithms for source separation. Moreover, it has been shown  that information transfer in a single layer neural network is maximized when the outputs become statistically independent.
Since the pioneering work of Jutten and Herault , a lot of adaptive algorithms for blind source separation have been proposed. Algorithms are derived from a number of different points of view such as contrast functions , non-linear principal component analysis , information transfer maximization , etc ... In this paper we present an unified approach to adaptive blind source separation in which most of existing learning algorithms are obtained as particular cases of a more general algorithm.
Let us consider a linear neural network in which the output y and the input x are related through y = Wx being W the synaptic weights that will be interpreted as the separating system. We will start by stating an optimization problem which is the minimization of the following cost function with respect to the inverse of the separation matrix, W-1,
Here f(y) and g(y) denote two non-linear functions of the output vector y, ||·||F is the Frobenius norm of a matrix and Wo is the exact separation matrix at which sources are recovered. The reason for introducing this optimization problem is that when source separation is achieved E[f(y)gH(y)] = I (I denotes the identity matrix) and the cost function C vanishes. Note that although C involves the exact separation matrix, Wo, this does not constitute a practical limitation since in the learning algorithms it can be substituted by its current estimate.
In the paper we will show that, under some mild conditions (similar to those found in Bussgang techniques for blind equalization ), the resulting Gauss-Newton adaptive algorithm that minimizes C has the form
where m is the algorithm step-size. Note that this is a stochastic algorithm where the expectations have been dropped and the exact separation matrix Wo has been replaced by its current estimate Wn.
The main limitation of the above algorithm is that it requires a matrix inversion at each iteration. However, this matrix inversion can be avoided using different matrix inversion formulas such as the Sherman-Morrison and the Woodbury formulas. Moreover, in the paper it will be shown that many existing blind source separation algorithms can be interpreted as particular cases of (2) when different inversion formulas and nonlinearities are selected. The algorithms that fit into our type are the decorrelation formula independently proposed by Almeida et al.  and Cichocki et al. , the natural gradient algorithm for ICA , the equivariant adaptive algorithm [9,7] and the non-linear PCA algorithm . It is also important to note that the learning algorithm (2) not only generalizes some of the existing ones, but also gives some theoretical clues for the selection of an appropiate adaptation step-size. These clues closely correspond to those empirically found in practice.
Recently, Amari and Cardoso have shown in  that the best local estimating function (f(y)gH(y)) is obtained when one of both functions f(·) or g(·) is linear, i.e., when f(y) = y or g(y) = y. Nevertheless, they also point out that their results characterize only the local asymptotic behaviour while other estimating functions, with two non-linearities, may have preferred global properties.
Finally, we will also present a stability analysis of the general algorithm (2). We will obtain three necessary and sufficient conditions that the nonlinearities f(·) and g(·) have to satisfy in order to ensure asymptotic stability of the proposed algorithm. The resulting conditions generalize the results obtained in  and .
* Corresponding author.