The Goldilocks Zone: Towards Better Understanding of Neural Network Loss Landscapes

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013574 ◽

2019 ◽

Vol 33 ◽

pp. 3574-3581

Author(s):

Stanislav Fort ◽

Adam Scherlis

Keyword(s):

Neural Network ◽

Neural Networks ◽

Configuration Space ◽

Loss Function ◽

Positive Curvature ◽

Local Convexity ◽

Convolutional Networks ◽

Hollow Spherical Shell ◽

Low Dimensional ◽

Fully Connected

We explore the loss landscape of fully-connected and convolutional neural networks using random, low-dimensional hyperplanes and hyperspheres. Evaluating the Hessian, H, of the loss function on these hypersurfaces, we observe 1) an unusual excess of the number of positive eigenvalues of H, and 2) a large value of Tr(H)/||H|| at a well defined range of configuration space radii, corresponding to a thick, hollow, spherical shell we refer to as the Goldilocks zone. We observe this effect for fully-connected neural networks over a range of network widths and depths on MNIST and CIFAR-10 datasets with the ReLU and tanh non-linearities, and a similar effect for convolutional networks. Using our observations, we demonstrate a close connection between the Goldilocks zone, measures of local convexity/prevalence of positive curvature, and the suitability of a network initialization. We show that the high and stable accuracy reached when optimizing on random, low-dimensional hypersurfaces is directly related to the overlap between the hypersurface and the Goldilocks zone, and as a corollary demonstrate that the notion of intrinsic dimension is initialization-dependent. We note that common initialization techniques initialize neural networks in this particular region of unusually high convexity/prevalence of positive curvature, and offer a geometric intuition for their success. Furthermore, we demonstrate that initializing a neural network at a number of points and selecting for high measures of local convexity such as Tr(H)/||H||, number of positive eigenvalues of H, or low initial loss, leads to statistically significantly faster training on MNIST. Based on our observations, we hypothesize that the Goldilocks zone contains an unusually high density of suitable initialization configurations.

Download Full-text

Classification of Skin Disease Using Deep Learning Neural Networks with MobileNet V2 and LSTM

Sensors ◽

10.3390/s21082852 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2852

Author(s):

Parvathaneni Naga Srinivasu ◽

Jalluri Gnana SivaSai ◽

Muhammad Fazal Ijaz ◽

Akash Kumar Bhoi ◽

Wonjoon Kim ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Network ◽

Skin Disease ◽

Network Architecture ◽

Large Scale ◽

Short Term Memory ◽

Convolutional Networks ◽

Occurrence Matrix

Deep learning models are efficient in learning the features that assist in understanding complex patterns precisely. This study proposed a computerized process of classifying skin disease through deep learning based MobileNet V2 and Long Short Term Memory (LSTM). The MobileNet V2 model proved to be efficient with a better accuracy that can work on lightweight computational devices. The proposed model is efficient in maintaining stateful information for precise predictions. A grey-level co-occurrence matrix is used for assessing the progress of diseased growth. The performance has been compared against other state-of-the-art models such as Fine-Tuned Neural Networks (FTNN), Convolutional Neural Network (CNN), Very Deep Convolutional Networks for Large-Scale Image Recognition developed by Visual Geometry Group (VGG), and convolutional neural network architecture that expanded with few changes. The HAM10000 dataset is used and the proposed method has outperformed other methods with more than 85% accuracy. Its robustness in recognizing the affected region much faster with almost 2× lesser computations than the conventional MobileNet model results in minimal computational efforts. Furthermore, a mobile application is designed for instant and proper action. It helps the patient and dermatologists identify the type of disease from the affected region’s image at the initial stage of the skin disease. These findings suggest that the proposed system can help general practitioners efficiently and effectively diagnose skin conditions, thereby reducing further complications and morbidity.

Download Full-text

Estimation of Site Amplification from Geotechnical Array Data Using Neural Networks

Bulletin of the Seismological Society of America ◽

10.1785/0120200346 ◽

2021 ◽

Author(s):

Daniel Roten ◽

Kim B. Olsen

Keyword(s):

Neural Network ◽

Neural Networks ◽

Site Response ◽

Site Amplification ◽

Data Driven ◽

Vertical Array ◽

The Neural Network ◽

Simplifying Assumptions ◽

The Mean ◽

Fully Connected

ABSTRACT We use deep learning to predict surface-to-borehole Fourier amplification functions (AFs) from discretized shear-wave velocity profiles. Specifically, we train a fully connected neural network and a convolutional neural network using mean AFs observed at ∼600 KiK-net vertical array sites. Compared with predictions based on theoretical SH 1D amplifications, the neural network (NN) results in up to 50% reduction of the mean squared log error between predictions and observations at sites not used for training. In the future, NNs may lead to a purely data-driven prediction of site response that is independent of proxies or simplifying assumptions.

Download Full-text

KLASIFIKASI BATIK RIAU DENGAN MENGGUNAKAN CONVOLUTIONAL NEURAL NETWORKS (CNN)

Jurnal Ilmu Komputer ◽

10.33060/jik/2020/vol9.iss1.144 ◽

2020 ◽

Vol 9 (1) ◽

pp. 7-10

Author(s):

Hendry Fonda

Keyword(s):

Neural Network ◽

Neural Networks ◽

Artificial Neural Networks ◽

Deep Learning ◽

Convolutional Neural Networks ◽

18Th Century ◽

The Public ◽

Artificial Neural ◽

The Difference ◽

Fully Connected

ABSTRACT Riau batik is known since the 18th century and is used by royal kings. Riau Batik is made by using a stamp that is mixed with coloring and then printed on fabric. The fabric used is usually silk. As its development, comparing Javanese batik with riau batik Riau is very slowly accepted by the public. Convolutional Neural Networks (CNN) is a combination of artificial neural networks and deeplearning methods. CNN consists of one or more convolutional layers, often with a subsampling layer followed by one or more fully connected layers as a standard neural network. In the process, CNN will conduct training and testing of Riau batik so that a collection of batik models that have been classified based on the characteristics that exist in Riau batik can be determined so that images are Riau batik and non-Riau batik. Classification using CNN produces Riau batik and not Riau batik with an accuracy of 65%. Accuracy of 65% is due to basically many of the same motifs between batik and other batik with the difference lies in the color of the absorption in the batik riau. Kata kunci: Batik; Batik Riau; CNN; Image; Deep Learning ABSTRAK Batik Riau dikenal sejak abad ke 18 dan digunakan oleh bangsawan raja. Batik Riau dibuat dengan menggunakan cap yang dicampur dengan pewarna kemudian dicetak di kain. Kain yang digunakan biasanya sutra. Seiring perkembangannya, dibandingkan batik Jawa maka batik Riau sangat lambat diterima oleh masyarakat. Convolutional Neural Networks (CNN) merupakan kombinasi dari jaringan syaraf tiruan dan metode deeplearning. CNN terdiri dari satu atau lebih lapisan konvolutional, seringnya dengan suatu lapisan subsampling yang diikuti oleh satu atau lebih lapisan yang terhubung penuh sebagai standar jaringan syaraf. Dalam prosesnya CNN akan melakukan training dan testing terhadap batik Riau sehingga didapat kumpulan model batik yang telah terklasi fikasi berdasarkan ciri khas yang ada pada batik Riau sehingga dapat ditentukan gambar (image) yang merupakan batik Riau dan yang bukan merupakan batik Riau. Klasifikasi menggunakan CNN menghasilkan batik riau dan bukan batik riau dengan akurasi 65%. Akurasi 65% disebabkan pada dasarnya banyak motif yang sama antara batik riau dengan batik lainnya dengan perbedaan terletak pada warna cerap pada batik riau. Kata kunci: Batik; Batik Riau; CNN; Image; Deep Learning

Download Full-text

Binary and Multiclass Text Classification by Means of Separable Convolutional Neural Network

Inventions ◽

10.3390/inventions6040070 ◽

2021 ◽

Vol 6 (4) ◽

pp. 70

Author(s):

Elena Solovyeva ◽

Ali Abdullah

Keyword(s):

Neural Network ◽

Neural Networks ◽

Convolutional Neural Network ◽

Recurrent Neural Networks ◽

Low Cost ◽

Computational Cost ◽

High Accuracy ◽

Activation Functions ◽

Fully Connected ◽

Fully Connected Networks

In this paper, the structure of a separable convolutional neural network that consists of an embedding layer, separable convolutional layers, convolutional layer and global average pooling is represented for binary and multiclass text classifications. The advantage of the proposed structure is the absence of multiple fully connected layers, which is used to increase the classification accuracy but raises the computational cost. The combination of low-cost separable convolutional layers and a convolutional layer is proposed to gain high accuracy and, simultaneously, to reduce the complexity of neural classifiers. Advantages are demonstrated at binary and multiclass classifications of written texts by means of the proposed networks under the sigmoid and Softmax activation functions in convolutional layer. At binary and multiclass classifications, the accuracy obtained by separable convolutional neural networks is higher in comparison with some investigated types of recurrent neural networks and fully connected networks.

Download Full-text

Solving the motion planning problem by using neural networks

Robotica ◽

10.1017/s0263574700017343 ◽

1994 ◽

Vol 12 (4) ◽

pp. 323-333 ◽

Cited By ~ 4

Author(s):

R.H.T. Chan ◽

P.K.S. Tam ◽

D.N.K. Leung

Keyword(s):

Neural Network ◽

Neural Networks ◽

Motion Planning ◽

Configuration Space ◽

Logic Gates ◽

Moving Object ◽

Planning Problem ◽

Processing Unit ◽

Configuration Point ◽

Motion Planning Problem

SUMMARYThis paper presents a new neural networks-based method to solve the motion planning problem, i.e. to construct a collision-free path for a moving object among fixed obstacles. Our ‘navigator’ basically consists of two neural networks: The first one is a modified feed-forward neural network, which is used to determine the configuration space; the moving object is modelled as a configuration point in the configuration space. The second neural network is a modified bidirectional associative memory, which is used to find a path for the configuration point through the configuration space while avoiding the configuration obstacles. The basic processing unit of the neural networks may be constructed using logic gates, including AND gates, OR gates, NOT gate and flip flops. Examples of efficient solutions to difficult motion planning problems using our proposed techniques are presented.

Download Full-text

Convergence Behavior of DNNs with Mutual-Information-Based Regularization

Entropy ◽

10.3390/e22070727 ◽

2020 ◽

Vol 22 (7) ◽

pp. 727 ◽

Cited By ~ 1

Author(s):

Hlynur Jónsson ◽

Giovanni Cherubini ◽

Evangelos Eleftheriou

Keyword(s):

Neural Networks ◽

Mutual Information ◽

Low Complexity ◽

High Dimensional ◽

Test Accuracy ◽

Compression Phase ◽

Hidden Layer ◽

Low Dimensional ◽

Fully Connected ◽

Fully Connected Networks

Information theory concepts are leveraged with the goal of better understanding and improving Deep Neural Networks (DNNs). The information plane of neural networks describes the behavior during training of the mutual information at various depths between input/output and hidden-layer variables. Previous analysis revealed that most of the training epochs are spent on compressing the input, in some networks where finiteness of the mutual information can be established. However, the estimation of mutual information is nontrivial for high-dimensional continuous random variables. Therefore, the computation of the mutual information for DNNs and its visualization on the information plane mostly focused on low-complexity fully connected networks. In fact, even the existence of the compression phase in complex DNNs has been questioned and viewed as an open problem. In this paper, we present the convergence of mutual information on the information plane for a high-dimensional VGG-16 Convolutional Neural Network (CNN) by resorting to Mutual Information Neural Estimation (MINE), thus confirming and extending the results obtained with low-dimensional fully connected networks. Furthermore, we demonstrate the benefits of regularizing a network, especially for a large number of training epochs, by adopting mutual information estimates as additional terms in the loss function characteristic of the network. Experimental results show that the regularization stabilizes the test accuracy and significantly reduces its variance.

Download Full-text

Classification of prostate cancer based on clinical and omics data using neural networks techniques to improve prognostic power.

Journal of Clinical Oncology ◽

10.1200/jco.2019.37.15_suppl.e16569 ◽

2019 ◽

Vol 37 (15_suppl) ◽

pp. e16569-e16569

Author(s):

Laura Marin ◽

Fanny Lys Casado ◽

Daniel Racoceanu ◽

Joseph A. Pinto

Keyword(s):

Neural Network ◽

Prostate Cancer ◽

Neural Networks ◽

Learning Networks ◽

Genomic Expression ◽

Gleason Grading ◽

Time To Recurrence ◽

The Neural Networks ◽

Interpretable Model ◽

Fully Connected

e16569 Background: In 2017, prostate cancer (PCa) was the second most common cancer in men after lung cancer. While there are different courses of action to treat the disease, its mortality in Peru is higher than 50%. Conventionally, PCa is diagnosed by evaluating tissue biopsies, and classified according to the Gleason grading system. Novel molecular classifications of PCa have been proposed for diagnostic and prognostic purposes. The main goal of this work is to implement a tool predicting the disease free time of patient according to the genomic expression and highlight the genes playing an influential role on the prediction. Methods: Modern techniques to classify data keep getting broader and more accurate, in particular with the introduction of Neural Networks(NN). We implement an Artificial Neural Network automatic genomic classification strategy based on a Local Interpretable Model-Agnostic Explanations (LIME) algorithm because it allows the network to choose the features of major discriminative significance. As a proof-of-concept, we selected a sub-set of 3530 genes related to recurrence from 499 PCa genomes to build the neural networks. Results: The resulting neural network, trained and tested on cancer cell 2010 database and validate on the MSKCC data the can predict the time of recurrence within a range of three months based on the genomic expression with an accuracy of 96,9% and a loss of less than 9%. Using the implemented LIME algorithm, our results indicate that this subset of genes is informative of recurrence and plays a substantial role in the prediction. Conclusions: Instead of using a classic fully connected layer, we implemented different types of Deep Learning networks where the final network provides the predicted survival rate or time to recurrence. This information will allow the doctors to propose the best course of treatment. Our method is able to generate an augmented score, enabling a more accurate evaluation of risk and personalized treatment strategy

Download Full-text

A HYBRID MODEL USING THE PRETRAINED BERT AND DEEP NEURAL NETWORKS WITH RICH FEATURE FOR EXTRACTIVE TEXT SUMMARIZATION

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/37/2/15980 ◽

2021 ◽

Vol 37 (2) ◽

pp. 123-143

Author(s):

Tuan Minh Luu ◽

Huong Thanh Le ◽

Tan Minh Hoang

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Neural Networks ◽

Text Summarization ◽

Training Dataset ◽

Extractive Summarization ◽

Input Text ◽

Summarization System ◽

Fully Connected

Deep neural networks have been applied successfully to extractive text summarization tasks with the accompany of large training datasets. However, when the training dataset is not large enough, these models reveal certain limitations that affect the quality of the system’s summary. In this paper, we propose an extractive summarization system basing on a Convolutional Neural Network and a Fully Connected network for sentence selection. The pretrained BERT multilingual model is used to generate embeddings vectors from the input text. These vectors are combined with TF-IDF values to produce the input of the text summarization system. Redundant sentences from the output summary are eliminated by the Maximal Marginal Relevance method. Our system is evaluated with both English and Vietnamese languages using CNN and Baomoi datasets, respectively. Experimental results show that our system achieves better results comparing to existing works using the same dataset. It confirms that our approach can be effectively applied to summarize both English and Vietnamese languages.

Download Full-text

Neural networks for classification of strokes in electrical impedance tomography on a 3D head model

Mathematics in Engineering ◽

10.3934/mine.2022029 ◽

2022 ◽

Vol 4 (4) ◽

pp. 1-22

Author(s):

Valentina Candiani ◽

◽

Matteo Santacesaria ◽

Keyword(s):

Neural Network ◽

Neural Networks ◽

Electrical Impedance Tomography ◽

Electrical Impedance ◽

Network Architectures ◽

Impedance Tomography ◽

Average Accuracy ◽

Neural Network Architectures ◽

Fully Connected

<abstract><p>We consider the problem of the detection of brain hemorrhages from three-dimensional (3D) electrical impedance tomography (EIT) measurements. This is a condition requiring urgent treatment for which EIT might provide a portable and quick diagnosis. We employ two neural network architectures - a fully connected and a convolutional one - for the classification of hemorrhagic and ischemic strokes. The networks are trained on a dataset with $ 40\, 000 $ samples of synthetic electrode measurements generated with the complete electrode model on realistic heads with a 3-layer structure. We consider changes in head anatomy and layers, electrode position, measurement noise and conductivity values. We then test the networks on several datasets of unseen EIT data, with more complex stroke modeling (different shapes and volumes), higher levels of noise and different amounts of electrode misplacement. On most test datasets we achieve $ \geq 90\% $ average accuracy with fully connected neural networks, while the convolutional ones display an average accuracy $ \geq 80\% $. Despite the use of simple neural network architectures, the results obtained are very promising and motivate the applications of EIT-based classification methods on real phantoms and ultimately on human patients.</p></abstract>

Download Full-text

Can Pre-Trained Convolutional Neural Networks be directly used as a Feature Extractor for Video-based Neonatal Sleep and Wake Classification?

10.21203/rs.3.rs-56693/v3 ◽

2020 ◽

Author(s):

Muhammad Awais ◽

Xi Long ◽

Bin Yin ◽

Chen chen ◽

Saeed Akbarzadeh ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Convolutional Neural Networks ◽

Video Recording ◽

Principal Component ◽

Classification Performance ◽

Support Vector ◽

Video Frames ◽

Feature Extractor ◽

Fully Connected

Abstract Objective: In this paper, we propose to evaluate the use of a pre-trained convolutional neural networks (CNNs) as a features extractor followed by the Principal Component Analysis (PCA) to find the best discriminant features to perform classification using support vector machine (SVM) algorithm for neonatal sleep and wake states using Fluke® facial video frames. Using pre-trained CNNs as feature extractor would hugely reduce the effort of collecting new neonatal data for training a neural network which could be computationally very expensive. The features are extracted after fully connected layers (FCL’s), where we compare several pre-trained CNNs, e.g., VGG16, VGG19, InceptionV3, GoogLeNet, ResNet, and AlexNet. Results: From around 2-h Fluke® video recording of seven neonate, we achieved a modest classification performance with an accuracy, sensitivity, and specificity of 65.3%, 69.8%, 61.0%, respectively with AlexNet using Fluke® (RGB) video frames. This indicates that using a pre-trained model as a feature extractor could not fully suffice for highly reliable sleep and wake classification in neonates. Therefore, in future a dedicated neural network trained on neonatal data or a transfer learning approach is required.

Download Full-text