Application and Need-Based Architecture Design of Deep Neural Networks

This paper applies a hybrid evolutionary approach to a convolutional neural network (CNN) and determines the number of layers and filters based on the application and user need. It integrates compact genetic algorithm with stochastic gradient descent (SGD) for simultaneously evolving structure and parameters of the CNN. It defines an effectual string representation for combining structure and parameters of the CNN. The compact genetic algorithm helps in the evolution of network structure by optimizing the number of convolutional layers and number of filters in each convolutional layer. At the same time, an optimal set of weight parameters of the network is obtained using the SGD law. This approach amalgamates exploration in network space by compact genetic algorithm and exploitation in weight space with SGD in an effective manner. The proposed approach also incorporates user-defined parameters in the cost function in an elegant manner which controls the network structure and hence the performance of the network based on the users need. The effectiveness of the proposed approach has been demonstrated on four benchmark datasets, namely MNIST, COIL-100, CIFAR-10 and CIFAR-100. The obtained results clearly demonstrate the potential of the proposed approach by evolving architectures based on the nature of the application and the need of the user.

Download Full-text

Hybrid models for suspended sediment prediction: optimized random forest and multi-layer perceptron through genetic algorithm and stochastic gradient descent methods

Neural Computing and Applications ◽

10.1007/s00521-021-06550-1 ◽

2021 ◽

Author(s):

Saeed Samadianfard ◽

Katayoun Kargar ◽

Sadra Shadkani ◽

Sajjad Hashemi ◽

Akram Abbaspour ◽

...

Keyword(s):

Genetic Algorithm ◽

Random Forest ◽

Suspended Sediment ◽

Gradient Descent ◽

Hybrid Models ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Descent Methods ◽

Multi Layer Perceptron ◽

Gradient Descent Methods

Download Full-text

Regularized Instance Embedding for Deep Multi-Instance Learning

Applied Sciences ◽

10.3390/app10010064 ◽

2019 ◽

Vol 10 (1) ◽

pp. 64

Author(s):

Yi Lin ◽

Honggang Zhang

Keyword(s):

Neural Network ◽

Big Data ◽

Supervised Learning ◽

Regularization Method ◽

Gradient Descent ◽

State Of The Art ◽

Stochastic Gradient Descent ◽

Learning Framework ◽

Weakly Supervised ◽

The Cost

In the era of Big Data, multi-instance learning, as a weakly supervised learning framework, has various applications since it is helpful to reduce the cost of the data-labeling process. Due to this weakly supervised setting, learning effective instance representation/embedding is challenging. To address this issue, we propose an instance-embedding regularizer that can boost the performance of both instance- and bag-embedding learning in a unified fashion. Specifically, the crux of the instance-embedding regularizer is to maximize correlation between instance-embedding and underlying instance-label similarities. The embedding-learning framework was implemented using a neural network and optimized in an end-to-end manner using stochastic gradient descent. In experiments, various applications were studied, and the results show that the proposed instance-embedding-regularization method is highly effective, having state-of-the-art performance.

Download Full-text

Variational Information Bottleneck for Unsupervised Clustering: Deep Gaussian Mixture Embedding

Entropy ◽

10.3390/e22020213 ◽

2020 ◽

Vol 22 (2) ◽

pp. 213 ◽

Cited By ~ 1

Author(s):

Yiğit Uğur ◽

George Arvanitakis ◽

Abdellatif Zaidi

Keyword(s):

Neural Networks ◽

Lower Bound ◽

Gradient Descent ◽

Gaussian Mixture ◽

Variational Inference ◽

Stochastic Gradient Descent ◽

Information Bottleneck ◽

Latent Space ◽

Type Algorithm ◽

The Cost

In this paper, we develop an unsupervised generative clustering framework that combines the variational information bottleneck and the Gaussian mixture model. Specifically, in our approach, we use the variational information bottleneck method and model the latent space as a mixture of Gaussians. We derive a bound on the cost function of our model that generalizes the Evidence Lower Bound (ELBO) and provide a variational inference type algorithm that allows computing it. In the algorithm, the coders’ mappings are parametrized using neural networks, and the bound is approximated by Markov sampling and optimized with stochastic gradient descent. Numerical results on real datasets are provided to support the efficiency of our method.

Download Full-text

pbSGD: Powered Stochastic Gradient Descent Methods for Accelerated Non-Convex Optimization

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/451 ◽

2020 ◽

Author(s):

Beitong Zhou ◽

Jun Liu ◽

Weigao Sun ◽

Ruijuan Chen ◽

Claire Tomlin ◽

...

Keyword(s):

Gradient Descent ◽

Gradient Methods ◽

Rates Of Convergence ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Power Exponent ◽

Additional Parameter ◽

Nonconvex Functions ◽

Rate Analysis ◽

Benchmark Datasets

We propose a novel technique for improving the stochastic gradient descent (SGD) method to train deep networks, which we term pbSGD. The proposed pbSGD method simply raises the stochastic gradient to a certain power elementwise during iterations and introduces only one additional parameter, namely, the power exponent (when it equals to 1, pbSGD reduces to SGD). We further propose pbSGD with momentum, which we term pbSGDM. The main results of this paper present comprehensive experiments on popular deep learning models and benchmark datasets. Empirical results show that the proposed pbSGD and pbSGDM obtain faster initial training speed than adaptive gradient methods, comparable generalization ability with SGD, and improved robustness to hyper-parameter selection and vanishing gradients. pbSGD is essentially a gradient modifier via a nonlinear transformation. As such, it is orthogonal and complementary to other techniques for accelerating gradient-based optimization such as learning rate schedules. Finally, we show convergence rate analysis for both pbSGD and pbSGDM methods. The theoretical rates of convergence match the best known theoretical rates of convergence for SGD and SGDM methods on nonconvex functions.

Download Full-text

Optimizing Spam Detection in Twitter by Using Naïve Bayes, Logistic Regression and Stochastic Gradient Descent with Whale Optimization Algorithm and Genetic Algorithm

Journal of Xi'an University of Architecture & Technology ◽

10.37896/jxat12.03/225 ◽

2020 ◽

Vol XII (III) ◽

Keyword(s):

Genetic Algorithm ◽

Logistic Regression ◽

Optimization Algorithm ◽

Gradient Descent ◽

Naive Bayes ◽

Stochastic Gradient ◽

Whale Optimization Algorithm ◽

Stochastic Gradient Descent ◽

Spam Detection ◽

Whale Optimization

Download Full-text

Robust Fully Distributed Minibatch Gradient Descent with Privacy Preservation

Security and Communication Networks ◽

10.1155/2018/6728020 ◽

2018 ◽

Vol 2018 ◽

pp. 1-15 ◽

Cited By ~ 2

Author(s):

Gábor Danner ◽

Árpád Berta ◽

István Hegedűs ◽

Márk Jelasity

Keyword(s):

Gradient Descent ◽

Privacy Preservation ◽

Stochastic Gradient Descent ◽

Light Weight ◽

Application Domain ◽

Privacy And Security ◽

Individual Values ◽

Gradient Descent Algorithm ◽

The Cost ◽

Distributed Machine Learning

Privacy and security are among the highest priorities in data mining approaches over data collected from mobile devices. Fully distributed machine learning is a promising direction in this context. However, it is a hard problem to design protocols that are efficient yet provide sufficient levels of privacy and security. In fully distributed environments, secure multiparty computation (MPC) is often applied to solve these problems. However, in our dynamic and unreliable application domain, known MPC algorithms are not scalable or not robust enough. We propose a light-weight protocol to quickly and securely compute the sum query over a subset of participants assuming a semihonest adversary. During the computation the participants learn no individual values. We apply this protocol to efficiently calculate the sum of gradients as part of a fully distributed minibatch stochastic gradient descent algorithm. The protocol achieves scalability and robustness by exploiting the fact that in this application domain a “quick and dirty” sum computation is acceptable. We utilize the Paillier homomorphic cryptosystem as part of our solution combined with extreme lossy gradient compression to make the cost of the cryptographic algorithms affordable. We demonstrate both theoretically and experimentally, based on churn statistics from a real smartphone trace, that the protocol is indeed practically viable.

Download Full-text

MonolithNet: Training monolithic deep neural networks via a partitioned training strategy

Journal of Computational Vision and Imaging Systems ◽

10.15353/jcvis.v4i1.340 ◽

2018 ◽

Vol 4 (1) ◽

pp. 3

Author(s):

Rene Bidart ◽

Alexander Wong

Keyword(s):

Neural Network ◽

Neural Networks ◽

Gradient Descent ◽

Deep Neural Network ◽

Deep Neural Networks ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Neural Net ◽

Training Strategy ◽

Effective Manner

In this study, we explore the training of monolithic deep neural net-works in an effective manner. One of the biggest challenges withtraining such networks to the desired level of accuracy is the dif-ficulty in converging to a good solution using iterative optimizationmethods such as stochastic gradient descent due to the enormousnumber of parameters that need to be learned. To achieve this,we introduce a partitioned training strategy, where proxy layersare connected to different partitions of a deep neural network toenable isolated training of a much smaller number of parametersto convergence. To illustrate the efficacy of this training strategy,we introduce MonolithNet, a massive residual deep neural networkconsisting of 437 million parameters. The trained MonolithNet wasable to achieve a top-1 accuracy of 97% on the CIFAR10 imageclassification dataset, which demonstrates the feasibility of the pro-posed training strategy for training monolithic deep neural networksto high accuracies.

Download Full-text

Identification of Plasmodium secreted proteins based on monoDiKGap and distance-based Top-n-gram methods

Current Bioinformatics ◽

10.2174/1574893617666220106112044 ◽

2022 ◽

Vol 17 ◽

Author(s):

Xinyi Liao ◽

Xiaomei Gu ◽

Dejun Peng

Keyword(s):

Gradient Descent ◽

Malaria Parasite ◽

Stochastic Gradient ◽

Classification Model ◽

Stochastic Gradient Descent ◽

Secreted Proteins ◽

N Gram ◽

Validation Set ◽

Optimal Set

Background: Many malaria infections are caused by Plasmodium falciparum. Accurate classification of the proteins secreted by the malaria parasite, which are essential for the development of anti-malarial drugs, is essential. Objective: To accurately classify the proteins secreted by the malaria parasite. Methods: Therefore, in order to improve the accuracy of the prediction of plasmodium secreted proteins, we established a classification model MGAP-SGD. MonodikGap features (k=7) of the secreted proteins were extracted, and then the optimal features were selected by the AdaBoost method. Finally, based on the optimal set of secreted proteins, the model was used to predict the secreted proteins using the stochastic gradient descent (SGD) algorithm. Results: Our model uses a 10-fold cross-validation set and independent test set in the stochastic gradient descent (SGD) classifier to validate the model, and the accuracy rates are 98.5859% and 97.973%, respectively. Conclusion: This also fully proves that the effectiveness and robustness of the prediction results of the MGAP-SGD model can meet the prediction needs of the secreted proteins of plasmodium.

Download Full-text

The Footprint of Factorization Models and Their Applications in Collaborative Filtering

ACM Transactions on Information Systems ◽

10.1145/3490475 ◽

2022 ◽

Vol 40 (4) ◽

pp. 1-32

Author(s):

Jinze Wang ◽

Yongli Ren ◽

Jie Li ◽

Ke Deng

Keyword(s):

Collaborative Filtering ◽

Gradient Descent ◽

High Reliability ◽

Stochastic Gradient Descent ◽

Decision Making Process ◽

Intermediate Data ◽

Benchmark Datasets ◽

Rich Information ◽

Prediction Reliability

Factorization models have been successfully applied to the recommendation problems and have significant impact to both academia and industries in the field of Collaborative Filtering ( CF ). However, the intermediate data generated in factorization models’ decision making process (or training process , footprint ) have been overlooked even though they may provide rich information to further improve recommendations. In this article, we introduce the concept of Convergence Pattern, which records how ratings are learned step-by-step in factorization models in the field of CF. We show that the concept of Convergence Patternexists in both the model perspective (e.g., classical Matrix Factorization ( MF ) and deep-learning factorization) and the training (learning) perspective (e.g., stochastic gradient descent ( SGD ), alternating least squares ( ALS ), and Markov Chain Monte Carlo ( MCMC )). By utilizing the Convergence Pattern, we propose a prediction model to estimate the prediction reliability of missing ratings and then improve the quality of recommendations. Two applications have been investigated: (1) how to evaluate the reliability of predicted missing ratings and thus recommend those ratings with high reliability. (2) How to explore the estimated reliability to adjust the predicted ratings to further improve the predication accuracy. Extensive experiments have been conducted on several benchmark datasets on three recommendation tasks: decision-aware recommendation, rating predicted, and Top- N recommendation. The experiment results have verified the effectiveness of the proposed methods in various aspects.

Download Full-text

Linear Support Vector Machine (SVM) with Stochastic Gradient Descent (SGD) training and multinomial Nave Bayes (NB) in News Classification

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i4.360363 ◽

2019 ◽

Vol 7 (4) ◽

pp. 360-363

Author(s):

Feroz Ahmed ◽

Shabina Ghafir

Keyword(s):

Support Vector Machine ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Linear Support Vector Machine

Download Full-text