Theoretical analysis of skip connections and batch normalization from generalization and optimization perspectives

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2020.7 ◽

2020 ◽

Vol 9 ◽

Cited By ~ 1

Author(s):

Yasutaka Furusho ◽

Kazushi Ikeda

Keyword(s):

Deep Neural Networks ◽

Fisher Information Matrix ◽

Information Matrix ◽

The Other ◽

Expected Loss ◽

Paper Briefly ◽

Batch Normalization ◽

Algorithmic Stability ◽

The Difference ◽

Theoretical Analyses

Abstract Deep neural networks (DNNs) have the same structure as the neocognitron proposed in 1979 but have much better performance, which is because DNNs include many heuristic techniques such as pre-training, dropout, skip connections, batch normalization (BN), and stochastic depth. However, the reason why these techniques improve the performance is not fully understood. Recently, two tools for theoretical analyses have been proposed. One is to evaluate the generalization gap, defined as the difference between the expected loss and empirical loss, by calculating the algorithmic stability, and the other is to evaluate the convergence rate by calculating the eigenvalues of the Fisher information matrix of DNNs. This overview paper briefly introduces the tools and shows their usefulness by showing why the skip connections and BN improve the performance.

Download Full-text

Effects of Skip-Connection in ResNet and Batch-Normalization on Fisher Information Matrix

Proceedings of the International Neural Networks Society - Recent Advances in Big Data and Deep Learning ◽

10.1007/978-3-030-16841-4_35 ◽

2019 ◽

pp. 341-348 ◽

Cited By ~ 1

Author(s):

Yasutaka Furusho ◽

Kazushi Ikeda

Keyword(s):

Fisher Information ◽

Fisher Information Matrix ◽

Information Matrix ◽

Batch Normalization

Download Full-text

Approximate Fisher Information Matrix to Characterize the Training of Deep Neural Networks

IEEE Transactions on Pattern Analysis and Machine Intelligence ◽

10.1109/tpami.2018.2876413 ◽

2020 ◽

Vol 42 (1) ◽

pp. 15-26

Author(s):

Zhibin Liao ◽

Tom Drummond ◽

Ian Reid ◽

Gustavo Carneiro

Keyword(s):

Neural Networks ◽

Fisher Information ◽

Deep Neural Networks ◽

Fisher Information Matrix ◽

Information Matrix

Download Full-text

The Current Research on STATCOM

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.940.333 ◽

2014 ◽

Vol 940 ◽

pp. 333-335

Author(s):

You Jie Ma ◽

De Xiang Wang ◽

Xue Song Zhou

Keyword(s):

Power Electronics ◽

Power Quality ◽

The Other ◽

Paper Briefly ◽

Industrial Control ◽

Research Status ◽

Voltage Quality ◽

Key Technologies ◽

The Future ◽

The Difference

Power electronics products are widely used in industrial control, requirements of power quality have become more sophisticated. So how to improve voltage quality and how to ensure that the system is stable is an important and urgent issue. This paper briefly discusses evolution of STATCOM development, including the difference from the other compensation devices, the characteristics of STATCOM, the research status, the key technologies of STATCOM, and the trend in the future.

Download Full-text

EigenNet: Towards Fast and Structural Learning of Deep Neural Networks

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/338 ◽

2017 ◽

Author(s):

Ping Luo

Keyword(s):

Gradient Descent ◽

Deep Neural Network ◽

Deep Neural Networks ◽

Fisher Information Matrix ◽

Information Matrix ◽

Stochastic Gradient Descent ◽

Information Flows ◽

Clock Time ◽

Generalization Capability ◽

Hidden Neurons

Deep Neural Network (DNN) is difficult to train and easy to overfit in training. We address these two issues by introducing EigenNet, an architecture that not only accelerates training but also adjusts number of hidden neurons to reduce over-fitting. They are achieved by whitening the information flows of DNNs and removing those eigenvectors that may capture noises. The former improves conditioning of the Fisher information matrix, whilst the latter increases generalization capability. These appealing properties of EigenNet can benefit many recent DNN structures, such as network in network and inception, by wrapping their hidden layers into the layers of EigenNet. The modeling capacities of the original networks are preserved. Both the training wall-clock time and number of updates are reduced by using EigenNet, compared to stochastic gradient descent on various datasets, including MNIST, CIFAR-10, and CIFAR-100.

Download Full-text

On the locality of the natural gradient for learning in deep Bayesian networks

Information Geometry ◽

10.1007/s41884-020-00038-y ◽

2020 ◽

Author(s):

Nihat Ay

Keyword(s):

Bayesian Networks ◽

Gradient Method ◽

Fisher Information Matrix ◽

Information Matrix ◽

Learning Systems ◽

The Other ◽

Natural Gradient ◽

Full System ◽

Auxiliary Model ◽

Deep Networks

AbstractWe study the natural gradient method for learning in deep Bayesian networks, including neural networks. There are two natural geometries associated with such learning systems consisting of visible and hidden units. One geometry is related to the full system, the other one to the visible sub-system. These two geometries imply different natural gradients. In a first step, we demonstrate a great simplification of the natural gradient with respect to the first geometry, due to locality properties of the Fisher information matrix. This simplification does not directly translate to a corresponding simplification with respect to the second geometry. We develop the theory for studying the relation between the two versions of the natural gradient and outline a method for the simplification of the natural gradient with respect to the second geometry based on the first one. This method suggests to incorporate a recognition model as an auxiliary model for the efficient application of the natural gradient method in deep networks.

Download Full-text

Enhancement of ADP-induced Platelet Aggregation by Adrenaline in Vivo and Its Prevention

Thrombosis and Haemostasis ◽

10.1055/s-0038-1647788 ◽

1973 ◽

Vol 29 (02) ◽

pp. 490-498 ◽

Cited By ~ 6

Author(s):

Hiroh Yamazaki ◽

Itsuro Kobayashi ◽

Tadahiro Sano ◽

Takio Shimamoto

Keyword(s):

Platelet Count ◽

Platelet Aggregation ◽

Platelet Rich Plasma ◽

The Other ◽

Endothelial Surface ◽

Important Interaction ◽

Blood Coagulability ◽

The Difference

SummaryThe authors previously reported a transient decrease in adhesive platelet count and an enhancement of blood coagulability after administration of a small amount of adrenaline (0.1-1 µg per Kg, i. v.) in man and rabbit. In such circumstances, the sensitivity of platelets to aggregation induced by ADP was studied by an optical density method. Five minutes after i. v. injection of 1 µg per Kg of adrenaline in 10 rabbits, intensity of platelet aggregation increased to 115.1 ± 4.9% (mean ± S. E.) by 10∼5 molar, 121.8 ± 7.8% by 3 × 10-6 molar and 129.4 ± 12.8% of the value before the injection by 10”6 molar ADP. The difference was statistically significant (P<0.01-0.05). The above change was not observed in each group of rabbits injected with saline, 1 µg per Kg of 1-noradrenaline or 0.1 and 10 µg per Kg of adrenaline. Also, it was prevented by oral administration of 10 mg per Kg of phenoxybenzamine or propranolol or aspirin or pyridinolcarbamate 3 hours before the challenge. On the other hand, the enhancement of ADP-induced platelet aggregation was not observed in vitro, when 10-5 or 3 × 10-6 molar and 129.4 ± 12.8% of the value before 10∼6 molar ADP was added to citrated platelet rich plasma (CPRP) of rabbit after incubation at 37°C for 30 second with 0.01, 0.1, 1, 10 or 100 µg per ml of adrenaline or noradrenaline. These results suggest an important interaction between endothelial surface and platelets in connection with the enhancement of ADP-induced platelet aggregation by adrenaline in vivo.

Download Full-text

Energy Approximation

10.23943/princeton/9780691174822.003.0023 ◽

2017 ◽

Author(s):

Philip Isett

Keyword(s):

Main Lemma ◽

The Other ◽

Coarse Scale ◽

Continuous Solutions ◽

Material Derivative ◽

Energy Variation ◽

Energy Approximation ◽

The Difference ◽

Derivatives Of

This chapter presents the equations and calculations for energy approximation. It establishes the estimates (261) and (262) of the Main Lemma (10.1) for continuous solutions; these estimates state that we are able to accurately prescribe the energy that the correction adds to the solution, as well as bound the difference between the time derivatives of these two quantities. The chapter also introduces the proposition for prescribing energy, followed by the relevant computations. Each integral contributing to the other term can be estimated. Another proposition for estimating control over the rate of energy variation is given. Finally, the coarse scale material derivative is considered.

Download Full-text

Pengaruh Unsur Budaya Lokal dalam Ungkapan Berbahasa Perancis

Metahumaniora ◽

10.24198/metahumaniora.v7i3.18859 ◽

2017 ◽

Vol 7 (3) ◽

pp. 378

Author(s):

Vincentia Tri Handayani

Keyword(s):

Mental Representation ◽

Cultural Context ◽

Oral Tradition ◽

Minimal Element ◽

The Other ◽

Local Culture ◽

Community Groups ◽

Cultural Community ◽

The Difference ◽

Complex Words

AbstrakFolklor yang menghasilkan tradisi lisan merupakan perwujudan budaya yang lahirdari pengalaman kelompok masyarakat. Salah satu bentuk tradisi lisan adalah ungkapan yangmengandung unsur budaya lokal dalam konstruksinya yang tidak dimiliki budaya lainnya.Ungkapan idiomatis memberikan warna pada bahasa melalui penggambaran mental. Dalambahasa Perancis, ungkapan dapat berupa locution dan expression. Perbedaan motif acuansuatu ungkapan dapat terlihat dari pengaruh budaya masyarakat pengguna bahasa. Sebuahleksem tidak selalu didefinisikan melalui unsur minimal, tidak juga melalui kata-kata,baik kata dasar atau kata kompleks, namun dapat melalui kata-kata beku yang maknanyatetap. Hubungan analogis dari makna tambahan yang ada pada suatu leksem muncul dariidentifikasi semem yang sama. Semem tersebut mengarah pada term yang diasosiasikan danyang diperkaya melalui konteks (dalam ungkapan berhubungan dengan konteks budaya).Kata kunci: folklor, ungkapan, struktur, makna idiomatis, kebudayaanAbstractFolklore which produces the oral tradition is a cultural manifestation born out theexperience of community groups. One form of the oral tradition is a phrase that containsthe elements of local culture in its construction that is not owned the other culture. Theidiomatic phrase gives the color to the language through the mental representation. InFrench, the expression can consist of locution and expression. The difference motivesreference of an expression can be seen from the influence of the cultural community thelanguage users. A lexeme is not always defined through a minimal element, nor throughwords, either basic or complex words, but can be through the frost words whose meaningsare fixed. The analogical connection of the additional meanings is on a lexeme arises fromthe identification of the same meaning. The meaning ‘semem’ leads to the associated termsand which are enriched through the context (in idiom related to the cultural context).Keywords : folklore, idioms, structure, idiom meaning, cultureI PENDAHULUAN

Download Full-text

Rhetoric and argumentation: the unity of the field

10.1093/oso/9780199691821.003.0003 ◽

2017 ◽

Author(s):

Michel Meyer

Keyword(s):

The Other ◽

Classic Problem ◽

False Dilemma ◽

The Difference ◽

As If

Rhetoric has always been torn between the rhetoric of figures and the rhetoric of conflicts or arguments, as if rhetoric were exclusively one or the other. This is a false dilemma. Both types of rhetoric hinge on the same structure. A common formula is provided in Chapter 3 which unifies rhetoric stricto sensu and rhetoric as argumentation as two distinct but related strategies adopted according to the level of problematicity of the questions at stake, thereby giving unity to the field called “Rhetoric.” Highly problematic questions require arguments to justify their answers; non-divisive ones can be treated rhetorically through their answers as if they were self-evident. Another classic problem is how to understand the difference between logic and rhetoric. The difference between the two is due to the presence of questions explicitly answered in the premises in logic and only suggested (or remaining indeterminate) in rhetoric.

Download Full-text

The Respiration of some Planktonic copepods II. The Effect Of Temperature

Journal of the Marine Biological Association of the United Kingdom ◽

10.1017/s0025315400011607 ◽

1953 ◽

Vol 31 (3) ◽

pp. 447-460 ◽

Cited By ~ 28

Author(s):

D. T. Gauld ◽

J. E. G. Raymont

Keyword(s):

Respiratory Rate ◽

The Body ◽

The Other ◽

Effect Of Temperature ◽

Acartia Clausi ◽

Temora Longicornis ◽

Different Temperatures ◽

Planktonic Copepods ◽

The Difference ◽

The Relationship

The respiratory rates of three species of planktonic copepods, Acartia clausi, Centropages hamatus and Temora longicornis, were measured at four different temperatures.The relationship between respiratory rate and temperature was found to be similar to that previously found for Calanus, although the slope of the curves differed in the different species.The observations on Centropages at 13 and 170 C. can be divided into two groups and it is suggested that the differences are due to the use of copepods from two different generations.The relationship between the respiratory rates and lengths of Acartia and Centropages agreed very well with that previously found for other species. That for Temora was rather different: the difference is probably due to the distinct difference in the shape of the body of Temora from those of the other species.The application of these measurements to estimates of the food requirements of the copepods is discussed.

Download Full-text