Hausdorff dimension, heavy tails, and generalization in neural networks*

Abstract Despite its success in a wide range of applications, characterizing the generalization properties of stochastic gradient descent (SGD) in non-convex deep learning problems is still an important challenge. While modeling the trajectories of SGD via stochastic differential equations (SDE) under heavy-tailed gradient noise has recently shed light over several peculiar characteristics of SGD, a rigorous treatment of the generalization properties of such SDEs in a learning theoretical framework is still missing. Aiming to bridge this gap, in this paper, we prove generalization bounds for SGD under the assumption that its trajectories can be well-approximated by a Feller process, which defines a rich class of Markov processes that include several recent SDE representations (both Brownian or heavy-tailed) as its special case. We show that the generalization error can be controlled by the Hausdorff dimension of the trajectories, which is intimately linked to the tail behavior of the driving process. Our results imply that heavier-tailed processes should achieve better generalization; hence, the tail-index of the process can be used as a notion of ‘capacity metric’. We support our theory with experiments on deep neural networks illustrating that the proposed capacity metric accurately estimates the generalization error, and it does not necessarily grow with the number of parameters unlike the existing capacity metrics in the literature.

Download Full-text

Location Tests for Biomarker Studies: A Comparison Using Simulations for the Two-sample Case

Methods of Information in Medicine ◽

10.3414/me12-02-0014 ◽

2013 ◽

Vol 52 (04) ◽

pp. 351-359 ◽

Cited By ~ 1

Author(s):

M. O. Scheinhardt ◽

A. Ziegler

Keyword(s):

Heavy Tails ◽

Type I Error ◽

Adaptive Methods ◽

Type I ◽

Skewed Data ◽

Error Level ◽

Adaptive Tests ◽

Wide Range ◽

Heavy Tailed ◽

Non Parametric

Summary Background: Gene, protein, or metabolite expression levels are often non-normally distributed, heavy tailed and contain outliers. Standard statistical approaches may fail as location tests in this situation. Objectives: In three Monte-Carlo simulation studies, we aimed at comparing the type I error levels and empirical power of standard location tests and three adaptive tests [O’Gorman, Can J Stat 1997; 25: 269 –279; Keselman et al., Brit J Math Stat Psychol 2007; 60: 267– 293; Szymczak et al., Stat Med 2013; 32: 524 – 537] for a wide range of distributions. Methods: We simulated two-sample scena -rios using the g-and-k-distribution family to systematically vary tail length and skewness with identical and varying variability between groups. Results: All tests kept the type I error level when groups did not vary in their variability. The standard non-parametric U-test per -formed well in all simulated scenarios. It was outperformed by the two non-parametric adaptive methods in case of heavy tails or large skewness. Most tests did not keep the type I error level for skewed data in the case of heterogeneous variances. Conclusions: The standard U-test was a powerful and robust location test for most of the simulated scenarios except for very heavy tailed or heavy skewed data, and it is thus to be recommended except for these cases. The non-parametric adaptive tests were powerful for both normal and non-normal distributions under sample variance homogeneity. But when sample variances differed, they did not keep the type I error level. The parametric adaptive test lacks power for skewed and heavy tailed distributions.

Download Full-text

Data augmentation for computed tomography angiography via synthetic image generation and neural domain adaptation

Current Directions in Biomedical Engineering ◽

10.1515/cdbme-2020-0015 ◽

2020 ◽

Vol 6 (1) ◽

Author(s):

Malte Seemann ◽

Lennart Bargsten ◽

Alexander Schlaefer

Keyword(s):

Computed Tomography ◽

Neural Networks ◽

Deep Learning ◽

Medical Imaging ◽

Computed Tomography Angiography ◽

Data Augmentation ◽

Domain Adaptation ◽

Synthetic Image ◽

Wide Range ◽

The Impact

AbstractDeep learning methods produce promising results when applied to a wide range of medical imaging tasks, including segmentation of artery lumen in computed tomography angiography (CTA) data. However, to perform sufficiently, neural networks have to be trained on large amounts of high quality annotated data. In the realm of medical imaging, annotations are not only quite scarce but also often not entirely reliable. To tackle both challenges, we developed a two-step approach for generating realistic synthetic CTA data for the purpose of data augmentation. In the first step moderately realistic images are generated in a purely numerical fashion. In the second step these images are improved by applying neural domain adaptation. We evaluated the impact of synthetic data on lumen segmentation via convolutional neural networks (CNNs) by comparing resulting performances. Improvements of up to 5% in terms of Dice coefficient and 20% for Hausdorff distance represent a proof of concept that the proposed augmentation procedure can be used to enhance deep learning-based segmentation for artery lumen in CTA images.

Download Full-text

Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks

Nature Communications ◽

10.1038/s41467-021-23103-1 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Abdulkadir Canatar ◽

Blake Bordelon ◽

Cengiz Pehlevan

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Kernel Regression ◽

Learning Task ◽

Learning Curves ◽

Generalization Error ◽

Theoretical Understanding ◽

Classical Statistics ◽

Deep Networks ◽

Model Alignment

AbstractA theoretical understanding of generalization remains an open problem for many machine learning models, including deep networks where overparameterization leads to better performance, contradicting the conventional wisdom from classical statistics. Here, we investigate generalization error for kernel regression, which, besides being a popular machine learning method, also describes certain infinitely overparameterized neural networks. We use techniques from statistical mechanics to derive an analytical expression for generalization error applicable to any kernel and data distribution. We present applications of our theory to real and synthetic datasets, and for many kernels including those that arise from training deep networks in the infinite-width limit. We elucidate an inductive bias of kernel regression to explain data with simple functions, characterize whether a kernel is compatible with a learning task, and show that more data may impair generalization when noisy or not expressible by the kernel, leading to non-monotonic learning curves with possibly many peaks.

Download Full-text

Improved Effort and Cost Estimation Model Using Artificial Neural Networks and Taguchi Method with Different Activation Functions

Entropy ◽

10.3390/e23070854 ◽

2021 ◽

Vol 23 (7) ◽

pp. 854

Author(s):

Nevena Rankovic ◽

Dragica Rankovic ◽

Mirjana Ivanovic ◽

Ljubomir Lazic

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Cost Estimation ◽

Time Estimation ◽

Effort Estimation ◽

Activation Functions ◽

Estimation Model ◽

Wide Range ◽

Software Product ◽

Artificial Neural

Software estimation involves meeting a huge number of different requirements, such as resource allocation, cost estimation, effort estimation, time estimation, and the changing demands of software product customers. Numerous estimation models try to solve these problems. In our experiment, a clustering method of input values to mitigate the heterogeneous nature of selected projects was used. Additionally, homogeneity of the data was achieved with the fuzzification method, and we proposed two different activation functions inside a hidden layer, during the construction of artificial neural networks (ANNs). In this research, we present an experiment that uses two different architectures of ANNs, based on Taguchi’s orthogonal vector plans, to satisfy the set conditions, with additional methods and criteria for validation of the proposed model, in this approach. The aim of this paper is the comparative analysis of the obtained results of mean magnitude relative error (MMRE) values. At the same time, our goal is also to find a relatively simple architecture that minimizes the error value while covering a wide range of different software projects. For this purpose, six different datasets are divided into four chosen clusters. The obtained results show that the estimation of diverse projects by dividing them into clusters can contribute to an efficient, reliable, and accurate software product assessment. The contribution of this paper is in the discovered solution that enables the execution of a small number of iterations, which reduces the execution time and achieves the minimum error.

Download Full-text

Growth Optimal Investment Strategy: The Impact of Reallocation Frequency and Heavy Tails

German Economic Review ◽

10.1111/j.1468-0475.2011.00553.x ◽

2012 ◽

Vol 13 (2) ◽

pp. 228-240 ◽

Cited By ~ 1

Author(s):

G. Bamberg ◽

A. Neuhierl

Keyword(s):

Heavy Tails ◽

Geometric Mean ◽

Asymptotic Optimality ◽

Optimal Investment ◽

Investment Strategy ◽

Risky Asset ◽

Optimal Investment Strategy ◽

Optimality Properties ◽

Heavy Tailed ◽

The Impact

Abstract The strategy to maximize the long-term growth rate of final wealth (maximum expected log strategy, maximum geometric mean strategy, Kelly criterion) is based on probability theoretic underpinnings and has asymptotic optimality properties. This article reviews the allocation of wealth in a two-asset economy with one risky asset and a risk-free asset. It is also shown that the optimal fraction to be invested in the risky asset (i) depends on the length of the basic return period and (ii) is lower for heavy-tailed log returns than for light-tailed log returns.

Download Full-text

Space Modeling of Problem Solving Strategies of “Prisoner's Dilemma”

Sibirskiy psikhologicheskiy zhurnal ◽

10.17223/17267080/78/6 ◽

2021 ◽

pp. 99-116

Author(s):

D.J. Balanev ◽

Keyword(s):

Neural Networks ◽

Social Identity ◽

Identity Formation ◽

Social Factors ◽

Prisoner's Dilemma ◽

Prisoner’S Dilemma ◽

Real People ◽

Space Modeling ◽

Wide Range ◽

Before And After

An iterated version of the game "Prisoner's Dilemma" is used as a model of cooperation largely due to the wide range of strategies that the subjects can use. The problem of the effec-tiveness of strategies for solving the Iterated Prisoner's Dilemma (IPD) is most often considered from the point of view of information models, where strategies do not take into account the relationship that arise when real people play. Some of these strategies are obvious, others depend upon social context. In our paper, we use one of the promising directions in the development of studying IPD strategies – the use of artificial neural networks. We use neural networks as a modeling tool and as a part of game environment. The main goal of our work is to build an information model that predicts the behavior of an individual person as well as group of people in the situation of solving of social dilemma. It takes into account social relationship, including those caused by experimental influence, gender differences, and individual differences in the strategy for solving cognitive tasks. The model demonstrates the transition of individual actions into socially determined behavior. Evaluation of the effect of socialization associated with the procedure of the game provides additional information about the effectiveness and characteristics of the experimental impact.The paper defines the minimum unit of analysis of the IPD player's strategy in a group, the identity with which can be considered as a variable. It discusses the influence of the experi-mentally formed group identity on the change of preferred strategies in social dilemmas. We use the possibilities of neural networks as means of categorizing the results of the prisoner's iterative dilemma in terms of the strategy applied by the player, as well as social factors. We define the patterns of changes in the IPD player's strategy before and after socialization are determined. The paper discusses the questions of real player's inclination to use IPD solution strategies in their pure form or to use the same strategy before and after experimental inter-ventions related to social identity formation. It is shown that experimentally induced socialization can be considered as a mechanism for increasing the degree of certainty in the choice of strategies when solving IPD task. It is found out that the models based on neural networks turn out to be more efficient after experi-mentally evoked social identity in a group of 6 people; and the models based on neural net-works are least effective in the case of predicting a subject's belonging to a gender group. When solving IPD problems by real people, it turns out to be possible to talk about generalized strategies that take into account not only the evolutionary properties of «pure» strategies, but also reflect various social factors.

Download Full-text

Application of neural networks in membrane separation

Reviews in Chemical Engineering ◽

10.1515/revce-2018-0011 ◽

2020 ◽

Vol 36 (2) ◽

pp. 265-310 ◽

Cited By ~ 7

Author(s):

Morteza Asghari ◽

Amir Dashti ◽

Mashallah Rezakazemi ◽

Ebrahim Jokar ◽

Hadi Halakoei

Keyword(s):

Neural Networks ◽

Chemical Engineering ◽

Membrane Separation ◽

Mechanistic Model ◽

Computing Time ◽

Permeate Flux ◽

Practical Applications ◽

Advantages And Disadvantages ◽

Wide Range ◽

Physical And Chemical

AbstractArtificial neural networks (ANNs) as a powerful technique for solving complicated problems in membrane separation processes have been employed in a wide range of chemical engineering applications. ANNs can be used in the modeling of different processes more easily than other modeling methods. Besides that, the computing time in the design of a membrane separation plant is shorter compared to many mass transfer models. The membrane separation field requires an alternative model that can work alone or in parallel with theoretical or numerical types, which can be quicker and, many a time, much more reliable. They are helpful in cases when scientists do not thoroughly know the physical and chemical rules that govern systems. In ANN modeling, there is no requirement for a deep knowledge of the processes and mathematical equations that govern them. Neural networks are commonly used for the estimation of membrane performance characteristics such as the permeate flux and rejection over the entire range of the process variables, such as pressure, solute concentration, temperature, superficial flow velocity, etc. This review investigates the important aspects of ANNs such as methods of development and training, and modeling strategies in correlation with different types of applications [microfiltration (MF), ultrafiltration (UF), nanofiltration (NF), reverse osmosis (RO), electrodialysis (ED), etc.]. It also deals with particular types of ANNs that have been confirmed to be effective in practical applications and points out the advantages and disadvantages of using them. The combination of ANN with accurate model predictions and a mechanistic model with less accurate predictions that render physical and chemical laws can provide a thorough understanding of a process.

Download Full-text

Multiellipsoidal Mapping Algorithm

Applied Sciences ◽

10.3390/app8081239 ◽

2018 ◽

Vol 8 (8) ◽

pp. 1239 ◽

Cited By ~ 2

Author(s):

Carlos Villaseñor ◽

Nancy Arana-Daniel ◽

Alma Alanis ◽

Carlos Lopez-Franco ◽

Javier Gomez-Avila

Keyword(s):

Point Clouds ◽

Mapping Algorithm ◽

Robotic Mapping ◽

Wide Range ◽

Important Challenge ◽

Memory Cost ◽

Environmental Representation ◽

Object Shapes ◽

Quadratic Surfaces ◽

Object Mapping

The robotic mapping problem, which consists in providing a spatial model of the environment to a robot, is a research topic with a wide range of applications. One important challenge of this problem is to obtain a map that is information-rich (i.e., a map that preserves main structures of the environment and object shapes) yet still has a low memory cost. Point clouds offer a highly descriptive and information-rich environmental representation; accordingly, many algorithms have been developed to approximate point clouds and lower the memory cost. In recent years, approaches using basic and “simple” (i.e., using only planes or spheres) geometric entities for approximating point clouds have been shown to provide accurate representations at low memory cost. However, a better approximation can be implemented if more complex geometric entities are used. In the present paper, a new object-mapping algorithm is introduced for approximating point clouds with multiple ellipsoids and other quadratic surfaces. We show that this algorithm creates maps that are rich in information yet low in memory cost and have features suitable for other robotics problems such as navigation and pose estimation.

Download Full-text

NEI Modelling of the ISM - Turbulent Dissipation and Hausdorff Dimension

Proceedings of the International Astronomical Union ◽

10.1017/s1743921310010318 ◽

2009 ◽

Vol 5 (H15) ◽

pp. 468-469 ◽

Cited By ~ 1

Author(s):

Miguel A. de Avillez ◽

Dieter Breitschwerdt

Keyword(s):

High Resolution ◽

Hausdorff Dimension ◽

Dissipative Structures ◽

Self Similarity ◽

Extended Self ◽

Turbulent Dissipation ◽

Independent Information ◽

Ionization Structure ◽

Wide Range ◽

Self Gravity

AbstractHigh-resolution non-ideal magnetohydrodynamical simulations of the turbulent magnetized ISM, powered by supernovae types Ia and II at Galactic rate, including self-gravity and non-equilibriuim ionization (NEI), taking into account the time evolution of the ionization structure of H, He, C, N, O, Ne, Mg, Si, S and Fe, were carried out. These runs cover a wide range (from kpc to sub-parsec) of scales, providing resolution independent information on the injection scale, extended self-similarity and the fractal dmension of the most dissipative structures.

Download Full-text

A New Class of Heavy-Tailed Distribution and the Stock Market Returns in Germany

Research in Business and Management ◽

10.5296/rbm.v4i2.11575 ◽

2017 ◽

Vol 4 (2) ◽

pp. 13 ◽

Cited By ~ 1

Author(s):

John Oden ◽

Kevin Hurt ◽

Susan Gentry

Keyword(s):

Stock Market ◽

Global Economy ◽

Heavy Tails ◽

Financial Sector ◽

Superior Performance ◽

Market Returns ◽

Stock Market Returns ◽

Volatility Clustering ◽

Empirical Performance ◽

Heavy Tailed

As the fourth largest economy over the world, Germany’s financial sector plays a key role in the global economy. As one of the most important components of the financial sector, the equity market played a more and more important role. Thus, risk management of its stock market is crucial for welfare of its market participants. To account for the two stylized facts, volatility clustering and conditional heavy tails, we take advantage of the framework in Guo (2016) and consider empirical performance of the GARCH model with normal reciprocal inverse Gaussian distribution in fitting the German stock return series. Our results indicate the NRIG distribution has superior performance in fitting the stock market returns.

Download Full-text