scholarly journals Performance Comparison of CNN Models Using Gradient Flow Analysis

Informatics ◽  
2021 ◽  
Vol 8 (3) ◽  
pp. 53
Author(s):  
Seol-Hyun Noh

Convolutional neural networks (CNNs) are widely used among the various deep learning techniques available because of their superior performance in the fields of computer vision and natural language processing. CNNs can effectively extract the locality and correlation of input data using structures in which convolutional layers are successively applied to the input data. In general, the performance of neural networks has improved as the depth of CNNs has increased. However, an increase in the depth of a CNN is not always accompanied by an increase in the accuracy of the neural network. This is because the gradient vanishing problem may arise, causing the weights of the weighted layers to fail to converge. Accordingly, the gradient flows of the VGGNet, ResNet, SENet, and DenseNet models were analyzed and compared in this study, and the reasons for the differences in the error rate performances of the models were derived.

2016 ◽  
pp. 89-112
Author(s):  
Pushpendu Kar ◽  
Anusua Das

The recent craze for artificial neural networks has spread its roots towards the development of neuroscience, pattern recognition, machine learning and artificial intelligence. The theoretical neuroscience is basically converging towards the basic concept that the brain acts as a complex and decentralized computer which can perform rigorous calculations in a different approach compared to the conventional digital computers. The motivation behind the study of neural networks is due to their similarity in the structure of the human central nervous system. The elementary processing component of an Artificial Neural Network (ANN) is called as ‘Neuron'. A large number of neurons interconnected with each other mimic the biological neural network and form an ANN. Learning is an inevitable process that can be used to train an ANN. We can only transfer knowledge to the neural network by the learning procedure. This chapter presents the detailed concepts of artificial neural networks in addition to some significant aspects on the present research work.


2019 ◽  
Vol 63 (4) ◽  
pp. 243-252 ◽  
Author(s):  
Jaret Hodges ◽  
Soumya Mohan

Machine learning algorithms are used in language processing, automated driving, and for prediction. Though the theory of machine learning has existed since the 1950s, it was not until the advent of advanced computing that their potential has begun to be realized. Gifted education is a field where machine learning has yet to be utilized, even though one of the underlying problems of gifted education is classification, which is an area where learning algorithms have become exceptionally accurate. We provide a brief overview of machine learning with a focus on neural networks and supervised learning, followed by a demonstration using simulated data and neural networks for classification issues with a practical explanation of the mechanics of the neural network and associated R code. Implications for gifted education are then discussed. Finally, the limitations of supervised learning are discussed. Code used in this article can be found at https://osf.io/4pa3b/


Author(s):  
Raghuram Mandyam Annasamy ◽  
Katia Sycara

Deep reinforcement learning techniques have demonstrated superior performance in a wide variety of environments. As improvements in training algorithms continue at a brisk pace, theoretical or empirical studies on understanding what these networks seem to learn, are far behind. In this paper we propose an interpretable neural network architecture for Q-learning which provides a global explanation of the model’s behavior using key-value memories, attention and reconstructible embeddings. With a directed exploration strategy, our model can reach training rewards comparable to the state-of-the-art deep Q-learning models. However, results suggest that the features extracted by the neural network are extremely shallow and subsequent testing using out-of-sample examples shows that the agent can easily overfit to trajectories seen during training.


1995 ◽  
Vol 17 (1) ◽  
pp. 1-15 ◽  
Author(s):  
John F. Place ◽  
Alain Truchaud ◽  
Kyoichi Ozawa ◽  
Harry Pardue ◽  
Paul Schnipelsky

The incorporation of information-processing technology into analytical systems in the form of standard computing software has recently been advanced by the introduction of artificial intelligence (AI), both as expert systems and as neural networks.This paper considers the role of software in system operation, control and automation, and attempts to define intelligence. AI is characterized by its ability to deal with incomplete and imprecise information and to accumulate knowledge. Expert systems, building on standard computing techniques, depend heavily on the domain experts and knowledge engineers that have programmed them to represent the real world. Neural networks are intended to emulate the pattern-recognition and parallel processing capabilities of the human brain and are taught rather than programmed. The future may lie in a combination of the recognition ability of the neural network and the rationalization capability of the expert system.In the second part of the paper, examples are given of applications of AI in stand-alone systems for knowledge engineering and medical diagnosis and in embedded systems for failure detection, image analysis, user interfacing, natural language processing, robotics and machine learning, as related to clinical laboratories.It is concluded that AI constitutes a collective form of intellectual propery, and that there is a need for better documentation, evaluation and regulation of the systems already being used in clinical laboratories.


2019 ◽  
Vol 11 (22) ◽  
pp. 2608 ◽  
Author(s):  
Dong Wang ◽  
Ying Li ◽  
Li Ma ◽  
Zongwen Bai ◽  
Jonathan Chan

In recent years, convolutional neural networks (CNNs) have shown promising performance in the field of multispectral (MS) and panchromatic (PAN) image fusion (MS pansharpening). However, the small-scale data and the gradient vanishing problem have been preventing the existing CNN-based fusion approaches from leveraging deeper networks that potentially have better representation ability to characterize the complex nonlinear mapping relationship between the input (source) and the targeting (fused) images. In this paper, we introduce a very deep network with dense blocks and residual learning to tackle these problems. The proposed network takes advantage of dense connections in dense blocks that have connections for arbitrarily two convolution layers to facilitate gradient flow and implicit deep supervision during training. In addition, reusing feature maps can reduce the number of parameters, which is helpful for reducing overfitting that resulted from small-scale data. Residual learning is explored to reduce the difficulty for the model to generate the MS image with high spatial resolution. The proposed network is evaluated via experiments on three datasets, achieving competitive or superior performance, e.g. the spectral angle mapper (SAM) is decreased over 10% on GaoFen-2, when compared with other state-of-the-art methods.


2018 ◽  
Vol 7 (2.32) ◽  
pp. 177 ◽  
Author(s):  
Dr M.R.Narasinga Rao ◽  
V Venkatesh Prasad ◽  
P Sai Teja ◽  
Md Zindavali ◽  
O Phanindra Reddy

Deep neural nets with a vast quantity of parameters are very effective machine getting to know structures. However, overfitting is an extreme problem in such networks. Massive networks are also sluggish to use, making it difficult to cope with overfitting by combining the predictions of many distinct large neural nets at test time. Dropout is a method for addressing this problem. The important thing concept is to randomly drop units (at the side of their connections) from the neural network for the duration of education. This prevents units from co-adapting an excessive amount of. during schooling, dropout samples from an exponential quantity of various "thinned" networks. At take a look at the time, it is simple to precise the impact of averaging the predictions of plenty of these thinned networks through in reality using a single unthinned network that has smaller weights. This considerably minimize overfitting and provides fundamental enhancements over other regularization techniques. We show that dropout enhance the overall performance of neural networks on manage gaining knowledge of obligations in imaginative and prescient, speech reputation, document type and computational biology, acquiring today's effects on many benchmark facts sets.  


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Molham Al-Maleh ◽  
Said Desouki

AbstractNatural language processing has witnessed remarkable progress with the advent of deep learning techniques. Text summarization, along other tasks like text translation and sentiment analysis, used deep neural network models to enhance results. The new methods of text summarization are subject to a sequence-to-sequence framework of encoder–decoder model, which is composed of neural networks trained jointly on both input and output. Deep neural networks take advantage of big datasets to improve their results. These networks are supported by the attention mechanism, which can deal with long texts more efficiently by identifying focus points in the text. They are also supported by the copy mechanism that allows the model to copy words from the source to the summary directly. In this research, we are re-implementing the basic summarization model that applies the sequence-to-sequence framework on the Arabic language, which has not witnessed the employment of this model in the text summarization before. Initially, we build an Arabic data set of summarized article headlines. This data set consists of approximately 300 thousand entries, each consisting of an article introduction and the headline corresponding to this introduction. We then apply baseline summarization models to the previous data set and compare the results using the ROUGE scale.


2013 ◽  
Vol 321-324 ◽  
pp. 1921-1924
Author(s):  
Yong Gang Xue ◽  
Ming Li Zhang

The methodology is proposed to forecast the daily SSE Composite Index based on artificial neural network and wavelet analysis. The original Composite Index series is decomposed into various components using wavelet techniques at first. The neural network is applied for modeling components of the decomposed series. The final forecast is obtained by combining the components series forecasts. The empirical results show the superior performance of the proposed methodology as compared to the neural network forecasting models. In addition, the results show the obvious difference among different type network in forecasting performance.


Author(s):  
Bubacarr Bah ◽  
Holger Rauhut ◽  
Ulrich Terstiege ◽  
Michael Westdickenberg

Abstract We study the convergence of gradient flows related to learning deep linear neural networks (where the activation function is the identity map) from data. In this case, the composition of the network layers amounts to simply multiplying the weight matrices of all layers together, resulting in an overparameterized problem. The gradient flow with respect to these factors can be re-interpreted as a Riemannian gradient flow on the manifold of rank-$r$ matrices endowed with a suitable Riemannian metric. We show that the flow always converges to a critical point of the underlying functional. Moreover, we establish that, for almost all initializations, the flow converges to a global minimum on the manifold of rank $k$ matrices for some $k\leq r$.


Sign in / Sign up

Export Citation Format

Share Document