Adaptive Online Learning Algorithms for Blind Separation: Maximum Entropy and Minimum Mutual Information

There are two major approaches for blind separation: maximum entropy (ME) and minimum mutual information (MMI). Both can be implemented by the stochastic gradient descent method for obtaining the demixing matrix. The MI is the contrast function for blind separation; the entropy is not. To justify the ME, the relation between ME and MMI is first elucidated by calculating the first derivative of the entropy and proving that the mean subtraction is necessary in applying the ME and at the solution points determined by the MI, the ME will not update the demixing matrix in the directions of increasing the cross-talking. Second, the natural gradient instead of the ordinary gradient is introduced to obtain efficient algorithms, because the parameter space is a Riemannian space consisting of matrices. The mutual information is calculated by applying the Gram-Charlier expansion to approximate probability density functions of the outputs. Finally, we propose an efficient learning algorithm that incorporates with an adaptive method of estimating the unknown cumulants. It is shown by computer simulation that the convergence of the stochastic descent algorithms is improved by using the natural gradient and the adaptively estimated cumulants.

Download Full-text

Adaptive Natural Gradient Method for Learning of Stochastic Neural Networks in Mini-Batch Mode

Applied Sciences ◽

10.3390/app9214568 ◽

2019 ◽

Vol 9 (21) ◽

pp. 4568

Author(s):

Hyeyoung Park ◽

Kwanyong Lee

Keyword(s):

Neural Networks ◽

Gradient Descent ◽

Learning Algorithm ◽

Descent Method ◽

Benchmark Problems ◽

Stochastic Neural Networks ◽

Gradient Descent Method ◽

Natural Gradient ◽

Convergence Properties ◽

Data Set

Gradient descent method is an essential algorithm for learning of neural networks. Among diverse variations of gradient descent method that have been developed for accelerating learning speed, the natural gradient learning is based on the theory of information geometry on stochastic neuromanifold, and is known to have ideal convergence properties. Despite its theoretical advantages, the pure natural gradient has some limitations that prevent its practical usage. In order to get the explicit value of the natural gradient, it is required to know true probability distribution of input variables, and to calculate inverse of a matrix with the square size of the number of parameters. Though an adaptive estimation of the natural gradient has been proposed as a solution, it was originally developed for online learning mode, which is computationally inefficient for the learning of large data set. In this paper, we propose a novel adaptive natural gradient estimation for mini-batch learning mode, which is commonly adopted for big data analysis. For two representative stochastic neural network models, we present explicit rules of parameter updates and learning algorithm. Through experiments on three benchmark problems, we confirm that the proposed method has superior convergence properties to the conventional methods.

Download Full-text

Natural Gradient Learning Algorithms for RBF Networks

Neural Computation ◽

10.1162/neco_a_00689 ◽

2015 ◽

Vol 27 (2) ◽

pp. 481-505 ◽

Cited By ~ 13

Author(s):

Junsheng Zhao ◽

Haikun Wei ◽

Chi Zhang ◽

Weiling Li ◽

Weili Guo ◽

...

Keyword(s):

Fisher Information ◽

Fisher Information Matrix ◽

Learning Algorithm ◽

Learning Algorithms ◽

Information Matrix ◽

Descent Method ◽

Gradient Descent Method ◽

Natural Gradient ◽

Rbf Networks ◽

Gradient Learning

Radial basis function (RBF) networks are one of the most widely used models for function approximation and classification. There are many strange behaviors in the learning process of RBF networks, such as slow learning speed and the existence of the plateaus. The natural gradient learning method can overcome these disadvantages effectively. It can accelerate the dynamics of learning and avoid plateaus. In this letter, we assume that the probability density function (pdf) of the input and the activation function are gaussian. First, we introduce natural gradient learning to the RBF networks and give the explicit forms of the Fisher information matrix and its inverse. Second, since it is difficult to calculate the Fisher information matrix and its inverse when the numbers of the hidden units and the dimensions of the input are large, we introduce the adaptive method to the natural gradient learning algorithms. Finally, we give an explicit form of the adaptive natural gradient learning algorithm and compare it to the conventional gradient descent method. Simulations show that the proposed adaptive natural gradient method, which can avoid the plateaus effectively, has a good performance when RBF networks are used for nonlinear functions approximation.

Download Full-text

Meteorological Data Forecast using RNN

Deep Learning and Neural Networks ◽

10.4018/978-1-7998-0414-7.ch050 ◽

2020 ◽

pp. 905-920

Author(s):

Stefan Balluff ◽

Jörg Bendfeld ◽

Stefan Krauter

Keyword(s):

Neural Networks ◽

Wind Speed ◽

Linear Prediction ◽

Learning Algorithm ◽

Meteorological Data ◽

System Modeling ◽

Descent Method ◽

Gradient Descent Method ◽

Earth System Modeling ◽

Set Up

Gathering knowledge not only of the current but also the upcoming wind speed is getting more and more important as the experience of operating and maintaining wind turbines is increasing. Not only with regards to operation and maintenance tasks such as gearbox and generator checks but moreover due to the fact that energy providers have to sell the right amount of their converted energy at the European energy markets, the knowledge of the wind and hence electrical power of the next day is of key importance. Selling more energy as has been offered is penalized as well as offering less energy as contractually promised. In addition to that the price per offered kWh decreases in case of a surplus of energy. Achieving a forecast there are various methods in computer science: fuzzy logic, linear prediction or neural networks. This paper presents current results of wind speed forecasts using recurrent neural networks (RNN) and the gradient descent method plus a backpropagation learning algorithm. Data used has been extracted from NASA's Modern Era-Retrospective analysis for Research and Applications (MERRA) which is calculated by a GEOS-5 Earth System Modeling and Data Assimilation system. The presented results show that wind speed data can be forecasted using historical data for training the RNN. Nevertheless, the current set up system lacks robustness and can be improved further with regards to accuracy.

Download Full-text

Fast identification of a human skeleton-marker model for motion capture system using stochastic gradient descent method

2020 8th IEEE RAS/EMBS International Conference for Biomedical Robotics and Biomechatronics (BioRob) ◽

10.1109/biorob49111.2020.9224442 ◽

2020 ◽

Author(s):

Tianyi Zou ◽

Tomomichi Sugihara

Keyword(s):

Motion Capture ◽

Gradient Descent ◽

Descent Method ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Gradient Descent Method ◽

Motion Capture System ◽

Human Skeleton ◽

Fast Identification

Download Full-text

RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33011544 ◽

2019 ◽

Vol 33 ◽

pp. 1544-1551 ◽

Cited By ~ 10

Author(s):

Liping Li ◽

Wei Xu ◽

Tianyi Chen ◽

Georgios B. Giannakis ◽

Qing Ling

Keyword(s):

Optimal Solution ◽

Distributed Learning ◽

Learning Task ◽

Descent Method ◽

Stochastic Gradient Descent ◽

Subgradient Methods ◽

Gradient Descent Method ◽

Aggregation Methods ◽

Heterogeneous Datasets ◽

Byzantine Attacks

In this paper, we propose a class of robust stochastic subgradient methods for distributed learning from heterogeneous datasets at presence of an unknown number of Byzantine workers. The Byzantine workers, during the learning process, may send arbitrary incorrect messages to the master due to data corruptions, communication failures or malicious attacks, and consequently bias the learned model. The key to the proposed methods is a regularization term incorporated with the objective function so as to robustify the learning task and mitigate the negative effects of Byzantine attacks. The resultant subgradient-based algorithms are termed Byzantine-Robust Stochastic Aggregation methods, justifying our acronym RSA used henceforth. In contrast to most of the existing algorithms, RSA does not rely on the assumption that the data are independent and identically distributed (i.i.d.) on the workers, and hence fits for a wider class of applications. Theoretically, we show that: i) RSA converges to a near-optimal solution with the learning error dependent on the number of Byzantine workers; ii) the convergence rate of RSA under Byzantine attacks is the same as that of the stochastic gradient descent method, which is free of Byzantine attacks. Numerically, experiments on real dataset corroborate the competitive performance of RSA and a complexity reduction compared to the state-of-the-art alternatives.

Download Full-text

Dynamics of the adaptive natural gradient descent method for soft committee machines

Physical Review E ◽

10.1103/physreve.69.056120 ◽

2004 ◽

Vol 69 (5) ◽

Cited By ~ 3

Author(s):

Masato Inoue ◽

Hyeyoung Park ◽

Masato Okada

Keyword(s):

Gradient Descent ◽

Descent Method ◽

Gradient Descent Method ◽

Natural Gradient

Download Full-text

FEATURE EXTRACTION BASED ON DIRECT CALCULATION OF MUTUAL INFORMATION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001407005892 ◽

2007 ◽

Vol 21 (07) ◽

pp. 1213-1231 ◽

Cited By ~ 9

Author(s):

NOJUN KWAK

Keyword(s):

Feature Extraction ◽

Mutual Information ◽

Direct Calculation ◽

Extraction Methods ◽

Descent Method ◽

Gradient Descent Method ◽

Probability Density Estimation ◽

Classification Problems ◽

Feature Extraction Method ◽

Window Method

In many pattern recognition problems, it is desirable to reduce the number of input features by extracting important features related to the problems. By focusing on only the problem-relevant features, the dimension of features can be greatly reduced and thereby can result in a better generalization performance with less computational complexity. In this paper, we propose a feature extraction method for handling classification problems. The proposed algorithm is used to search for a set of linear combinations of the original features, whose mutual information with the output class can be maximized. The mutual information between the extracted features and the output class is calculated by using the probability density estimation based on the Parzen window method. A greedy algorithm using the gradient descent method is used to determine the new features. The computational load is proportional to the square of the number of samples. The proposed method was applied to several classification problems, which showed better or comparable performances than the conventional feature extraction methods.

Download Full-text

Hyperparameter-free optimizer of stochastic gradient descent that incorporates unit correction and moment estimation

10.1101/348557 ◽

2018 ◽

Author(s):

Kazunori D Yamada

Keyword(s):

Deep Learning ◽

Gradient Descent ◽

Mathematical Optimization ◽

Descent Method ◽

Learning Rate ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Gradient Descent Method ◽

Moment Estimation ◽

Estimation System

ABSTRACTIn the deep learning era, stochastic gradient descent is the most common method used for optimizing neural network parameters. Among the various mathematical optimization methods, the gradient descent method is the most naive. Adjustment of learning rate is necessary for quick convergence, which is normally done manually with gradient descent. Many optimizers have been developed to control the learning rate and increase convergence speed. Generally, these optimizers adjust the learning rate automatically in response to learning status. These optimizers were gradually improved by incorporating the effective aspects of earlier methods. In this study, we developed a new optimizer: YamAdam. Our optimizer is based on Adam, which utilizes the first and second moments of previous gradients. In addition to the moment estimation system, we incorporated an advantageous part of AdaDelta, namely a unit correction system, into YamAdam. According to benchmark tests on some common datasets, our optimizer showed similar or faster convergent performance compared to the existing methods. YamAdam is an option as an alternative optimizer for deep learning.

Download Full-text

SSGD: A Safe and Efficient Method of Gradient Descent

Security and Communication Networks ◽

10.1155/2021/5404061 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Jinhuan Duan ◽

Xianxian Li ◽

Shiqi Gao ◽

Zili Zhong ◽

Jinyan Wang

Keyword(s):

Gradient Descent ◽

Large Scale ◽

Optimization Problems ◽

Unit Vector ◽

Descent Method ◽

Stochastic Gradient ◽

Learning System ◽

Training Data ◽

Stochastic Gradient Descent ◽

Gradient Descent Method

With the vigorous development of artificial intelligence technology, various engineering technology applications have been implemented one after another. The gradient descent method plays an important role in solving various optimization problems, due to its simple structure, good stability, and easy implementation. However, in multinode machine learning system, the gradients usually need to be shared, which will cause privacy leakage, because attackers can infer training data with the gradient information. In this paper, to prevent gradient leakage while keeping the accuracy of the model, we propose the super stochastic gradient descent approach to update parameters by concealing the modulus length of gradient vectors and converting it or them into a unit vector. Furthermore, we analyze the security of super stochastic gradient descent approach and demonstrate that our algorithm can defend against the attacks on the gradient. Experiment results show that our approach is obviously superior to prevalent gradient descent approaches in terms of accuracy, robustness, and adaptability to large-scale batches. Interestingly, our algorithm can also resist model poisoning attacks to a certain extent.

Download Full-text

Canoeing Motion Tracking and Analysis via Multi-Sensors Fusion

Sensors ◽

10.3390/s20072110 ◽

2020 ◽

Vol 20 (7) ◽

pp. 2110 ◽

Cited By ~ 4

Author(s):

Long Liu ◽

Sen Qiu ◽

ZheLong Wang ◽

Jie Li ◽

JiaXin Wang

Keyword(s):

Motion Tracking ◽

Learning Algorithm ◽

Inertial Sensor ◽

Recovery Phase ◽

Descent Method ◽

Sensor Nodes ◽

Sensor Calibration ◽

Gradient Descent Method ◽

Whole Process ◽

Training Methodologies

Coaches and athletes are constantly seeking novel training methodologies in an attempt to improve athletic performance. This paper proposes a method of rowing sport capture and analysis based on Inertial Measurement Units (IMUs). A canoeist’s motion was collected by multiple miniature inertial sensor nodes. The gradient descent method was used to fuse data and obtain the canoeist’s attitude information after sensor calibration, and then the motions of canoeist’s actions were reconstructed. Stroke quality was performed based on the estimated joint angles. Machine learning algorithm was used as the classification method to divide the stroke cycle into different phases, including propulsion-phase and recovery-phase, a quantitative kinematic analysis was carried out. Experiments conducted in this paper demonstrated that our method possesses the capacity to reveal the similarities and differences between novice and coach, the whole process of canoeist’s motions can be analyzed with satisfactory accuracy validated by videography method. It can provide quantitative data for coaches or athletes, which can be used to improve the skills of rowers.

Download Full-text