Deep double descent: where bigger models and more data hurt*

Abstract We show that a variety of modern deep learning tasks exhibit a ‘double-descent’ phenomenon where, as we increase model size, performance first gets worse and then gets better. Moreover, we show that double descent occurs not just as a function of model size, but also as a function of the number of training epochs. We unify the above phenomena by defining a new complexity measure we call the effective model complexity and conjecture a generalized double descent with respect to this measure. Furthermore, our notion of model complexity allows us to identify certain regimes where increasing (even quadrupling) the number of train samples actually hurts test performance.

Download Full-text

A Lightweight Fusion Distillation Network for Image Deblurring and Deraining

Sensors ◽

10.3390/s21165312 ◽

2021 ◽

Vol 21 (16) ◽

pp. 5312

Author(s):

Yanni Zhang ◽

Yiming Liu ◽

Qiang Li ◽

Jianzhong Wang ◽

Miao Qi ◽

...

Keyword(s):

Deep Learning ◽

Image Deblurring ◽

Image Features ◽

Image Feature ◽

Model Complexity ◽

Small Scale ◽

Feature Maps ◽

Learning Framework ◽

Channel Information ◽

Scale Spaces

Recently, deep learning-based image deblurring and deraining have been well developed. However, most of these methods fail to distill the useful features. What is more, exploiting the detailed image features in a deep learning framework always requires a mass of parameters, which inevitably makes the network suffer from a high computational burden. We propose a lightweight fusion distillation network (LFDN) for image deblurring and deraining to solve the above problems. The proposed LFDN is designed as an encoder–decoder architecture. In the encoding stage, the image feature is reduced to various small-scale spaces for multi-scale information extraction and fusion without much information loss. Then, a feature distillation normalization block is designed at the beginning of the decoding stage, which enables the network to distill and screen valuable channel information of feature maps continuously. Besides, an information fusion strategy between distillation modules and feature channels is also carried out by the attention mechanism. By fusing different information in the proposed approach, our network can achieve state-of-the-art image deblurring and deraining results with a smaller number of parameters and outperform the existing methods in model complexity.

Download Full-text

Real-Time Environment Monitoring Using a Lightweight Image Super-Resolution Network

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18115890 ◽

2021 ◽

Vol 18 (11) ◽

pp. 5890

Author(s):

Qiang Yu ◽

Feiqiang Liu ◽

Long Xiao ◽

Zitao Liu ◽

Xiaomin Yang

Keyword(s):

Deep Learning ◽

Real Time ◽

Super Resolution ◽

Model Complexity ◽

Practical Application ◽

Single Image ◽

Feature Maps ◽

Benchmark Datasets ◽

Image Super Resolution ◽

Single Image Super Resolution

Deep-learning (DL)-based methods are of growing importance in the field of single image super-resolution (SISR). The practical application of these DL-based models is a remaining problem due to the requirement of heavy computation and huge storage resources. The powerful feature maps of hidden layers in convolutional neural networks (CNN) help the model learn useful information. However, there exists redundancy among feature maps, which can be further exploited. To address these issues, this paper proposes a lightweight efficient feature generating network (EFGN) for SISR by constructing the efficient feature generating block (EFGB). Specifically, the EFGB can conduct plain operations on the original features to produce more feature maps with parameters slightly increasing. With the help of these extra feature maps, the network can extract more useful information from low resolution (LR) images to reconstruct the desired high resolution (HR) images. Experiments conducted on the benchmark datasets demonstrate that the proposed EFGN can outperform other deep-learning based methods in most cases and possess relatively lower model complexity. Additionally, the running time measurement indicates the feasibility of real-time monitoring.

Download Full-text

Deep learning library testing via effective model generation

Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering ◽

10.1145/3368089.3409761 ◽

2020 ◽

Cited By ~ 1

Author(s):

Zan Wang ◽

Ming Yan ◽

Junjie Chen ◽

Shuang Liu ◽

Dongdi Zhang

Keyword(s):

Deep Learning ◽

Effective Model ◽

Model Generation

Download Full-text

Representing Deep Neural Networks Latent Space Geometries with Graphs

Algorithms ◽

10.3390/a14020039 ◽

2021 ◽

Vol 14 (2) ◽

pp. 39

Author(s):

Carlos Lassance ◽

Vincent Gripon ◽

Antonio Ortega

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Objective Function ◽

Learning Process ◽

Deep Neural Networks ◽

State Of The Art ◽

The Core ◽

Learning Tasks ◽

Latent Space

Deep Learning (DL) has attracted a lot of attention for its ability to reach state-of-the-art performance in many machine learning tasks. The core principle of DL methods consists of training composite architectures in an end-to-end fashion, where inputs are associated with outputs trained to optimize an objective function. Because of their compositional nature, DL architectures naturally exhibit several intermediate representations of the inputs, which belong to so-called latent spaces. When treated individually, these intermediate representations are most of the time unconstrained during the learning process, as it is unclear which properties should be favored. However, when processing a batch of inputs concurrently, the corresponding set of intermediate representations exhibit relations (what we call a geometry) on which desired properties can be sought. In this work, we show that it is possible to introduce constraints on these latent geometries to address various problems. In more detail, we propose to represent geometries by constructing similarity graphs from the intermediate representations obtained when processing a batch of inputs. By constraining these Latent Geometry Graphs (LGGs), we address the three following problems: (i) reproducing the behavior of a teacher architecture is achieved by mimicking its geometry, (ii) designing efficient embeddings for classification is achieved by targeting specific geometries, and (iii) robustness to deviations on inputs is achieved via enforcing smooth variation of geometry between consecutive latent spaces. Using standard vision benchmarks, we demonstrate the ability of the proposed geometry-based methods in solving the considered problems.

Download Full-text

Encoding Health Records into Pathway Representations for Deep Learning

10.3233/shti210800 ◽

2021 ◽

Author(s):

Marco Luca Sbodio ◽

Natasha Mulligan ◽

Stefanie Speichert ◽

Vanessa Lopez ◽

Joao Bettencourt-Silva

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Source Code ◽

Training Dataset ◽

Health Records ◽

Learning Tasks ◽

Patient Pathways ◽

Computational Resources ◽

The Impact

There is a growing trend in building deep learning patient representations from health records to obtain a comprehensive view of a patient’s data for machine learning tasks. This paper proposes a reproducible approach to generate patient pathways from health records and to transform them into a machine-processable image-like structure useful for deep learning tasks. Based on this approach, we generated over a million pathways from FAIR synthetic health records and used them to train a convolutional neural network. Our initial experiments show the accuracy of the CNN on a prediction task is comparable or better than other autoencoders trained on the same data, while requiring significantly less computational resources for training. We also assess the impact of the size of the training dataset on autoencoders performances. The source code for generating pathways from health records is provided as open source.

Download Full-text

Achieving 90% In Data-Centric Industry Deep Learning Task

10.36227/techrxiv.17128475 ◽

2021 ◽

Author(s):

Tong Guo

Keyword(s):

Deep Learning ◽

Object Detection ◽

Noisy Data ◽

Learning Task ◽

Simple Method ◽

Sequence Generation ◽

Learning Tasks ◽

Human Evaluation ◽

Model Predictions ◽

Click Through Rate

In industry deep learning application, our manually labeled data has a certain number of noisy data. To solve this problem and achieve more than 90 score in dev dataset, we present a simple method to find the noisy data and re-label the noisy data by human, given the model predictions as references in human labeling. In this paper, we illustrate our idea for a broad set of deep learning tasks, includes classification, sequence tagging, object detection, sequence generation, click-through rate prediction. The experimental results and human evaluation results verify our idea.

Download Full-text

Survey of Deep-Learning Approaches for Remote Sensing Observation Enhancement

Sensors ◽

10.3390/s19183929 ◽

2019 ◽

Vol 19 (18) ◽

pp. 3929 ◽

Cited By ~ 22

Author(s):

Grigorios Tsagkatakis ◽

Anastasia Aidini ◽

Konstantina Fotiadou ◽

Michalis Giannopoulos ◽

Anastasia Pentari ◽

...

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

Large Body ◽

Super Resolution ◽

Imaging Systems ◽

Learning Approaches ◽

Language Understanding ◽

Learning Tasks ◽

Trade Offs ◽

Sensing Platforms

Deep Learning, and Deep Neural Networks in particular, have established themselves as the new norm in signal and data processing, achieving state-of-the-art performance in image, audio, and natural language understanding. In remote sensing, a large body of research has been devoted to the application of deep learning for typical supervised learning tasks such as classification. Less yet equally important effort has also been allocated to addressing the challenges associated with the enhancement of low-quality observations from remote sensing platforms. Addressing such channels is of paramount importance, both in itself, since high-altitude imaging, environmental conditions, and imaging systems trade-offs lead to low-quality observation, as well as to facilitate subsequent analysis, such as classification and detection. In this paper, we provide a comprehensive review of deep-learning methods for the enhancement of remote sensing observations, focusing on critical tasks including single and multi-band super-resolution, denoising, restoration, pan-sharpening, and fusion, among others. In addition to the detailed analysis and comparison of recently presented approaches, different research avenues which could be explored in the future are also discussed.

Download Full-text

An Accurate Full Car Ride Model Using Model Reducing Techniques

Journal of Mechanical Design ◽

10.1115/1.1503065 ◽

2002 ◽

Vol 124 (4) ◽

pp. 697-705 ◽

Cited By ~ 31

Author(s):

Chul Kim ◽

Paul I. Ro

Keyword(s):

Model Complexity ◽

Ride Quality ◽

Singular Perturbation Method ◽

Suspension Systems ◽

Proposed Model ◽

Model Size ◽

Improved Performance ◽

Accurate Matching ◽

Multibody Dynamics Model ◽

Selection Of

In this study, an approach to obtain an accurate yet simple model for full-vehicle ride analysis is proposed. The approach involves linearization of a full car MBD (multibody dynamics) model to obtain a large-order vehicle model. The states of the model are divided into two groups depending on their effects on the ride quality and handling performance. Singular perturbation method is then applied to reduce the model size. Comparing the responses of the proposed model and the original MBD model shows an accurate matching between the two systems. A set of identified parameters that makes the well-known seven degree-of-freedom model very close to the full car MBD model is obtained. Finally, the benefits of the approach are illustrated through design of an active suspension system. The identified model exhibits improved performance over the nominal models in the sense that the accurate model leads to the appropriate selection of control gains. This study also provides an analytical method to investigate the effects of model complexity on model accuracy for vehicle suspension systems.

Download Full-text

Variational Approach for Learning Community Structures

Complexity ◽

10.1155/2018/4867304 ◽

2018 ◽

Vol 2018 ◽

pp. 1-13 ◽

Cited By ~ 3

Author(s):

Jun Jin Choong ◽

Xin Liu ◽

Tsuyoshi Murata

Keyword(s):

Deep Learning ◽

Community Structure ◽

Learning Community ◽

Community Detection ◽

Representation Learning ◽

Semisupervised Learning ◽

Detection Methods ◽

Community Structures ◽

Learning Tasks ◽

Proposed Model

Discovering and modeling community structure exist to be a fundamentally challenging task. In domains such as biology, chemistry, and physics, researchers often rely on community detection algorithms to uncover community structures from complex systems yet no unified definition of community structure exists. Furthermore, existing models tend to be oversimplified leading to a neglect of richer information such as nodal features. Coupled with the surge of user generated information on social networks, a demand for newer techniques beyond traditional approaches is inevitable. Deep learning techniques such as network representation learning have shown tremendous promise. More specifically, supervised and semisupervised learning tasks such as link prediction and node classification have achieved remarkable results. However, unsupervised learning tasks such as community detection remain widely unexplored. In this paper, a novel deep generative model for community detection is proposed. Extensive experiments show that the proposed model, empowered with Bayesian deep learning, can provide insights in terms of uncertainty and exploit nonlinearities which result in better performance in comparison to state-of-the-art community detection methods. Additionally, unlike traditional methods, the proposed model is community structure definition agnostic. Leveraging on low-dimensional embeddings of both network topology and feature similarity, it automatically learns the best model configuration for describing similarities in a community.

Download Full-text

Variation-Oriented Data Filtering for Improvement in Model Complexity of Air Pollutant Prediction Model

Mathematical Problems in Engineering ◽

10.1155/2014/310478 ◽

2014 ◽

Vol 2014 ◽

pp. 1-14 ◽

Cited By ~ 1

Author(s):

Chi Man Vong ◽

Weng Fai Ip ◽

Pak Kin Wong

Keyword(s):

Prediction Model ◽

Air Pollutants ◽

Prediction Models ◽

Air Pollutant ◽

Accurate Prediction ◽

Model Complexity ◽

Support Vector ◽

Discrete Wavelet ◽

Training Time ◽

Model Size

Accurate prediction models for air pollutants are crucial for forecast and health alarm to local inhabitants. In recent literature,discrete wavelet transform(DWT) was employed to decompose a series of air pollutant levels, followed by modeling usingsupport vector machine(SVM). This combination of DWT and SVM was reported to produce a more accurate prediction model for air pollutants by investigating different levels of frequency bands. However, DWT has a significant demand in model complexity, namely, the training time and the model size of the prediction model. In this paper, a new method calledvariation-oriented filtering(VF) is proposed to remove the data with low variation, which can be considered asnoiseto a prediction model. By VF, the noise and the size of the series of air pollutant levels can be reduced simultaneously and hence so are the training time and model size. The SO2(sulfur dioxide) level in Macau was selected as a test case. Experimental results show that VF can effectively and efficiently reduce the model complexity with improvement in predictive accuracy.

Download Full-text