scholarly journals Deep double descent: where bigger models and more data hurt*

2021 ◽  
Vol 2021 (12) ◽  
pp. 124003
Author(s):  
Preetum Nakkiran ◽  
Gal Kaplun ◽  
Yamini Bansal ◽  
Tristan Yang ◽  
Boaz Barak ◽  
...  

Abstract We show that a variety of modern deep learning tasks exhibit a ‘double-descent’ phenomenon where, as we increase model size, performance first gets worse and then gets better. Moreover, we show that double descent occurs not just as a function of model size, but also as a function of the number of training epochs. We unify the above phenomena by defining a new complexity measure we call the effective model complexity and conjecture a generalized double descent with respect to this measure. Furthermore, our notion of model complexity allows us to identify certain regimes where increasing (even quadrupling) the number of train samples actually hurts test performance.

Sensors ◽  
2021 ◽  
Vol 21 (16) ◽  
pp. 5312
Author(s):  
Yanni Zhang ◽  
Yiming Liu ◽  
Qiang Li ◽  
Jianzhong Wang ◽  
Miao Qi ◽  
...  

Recently, deep learning-based image deblurring and deraining have been well developed. However, most of these methods fail to distill the useful features. What is more, exploiting the detailed image features in a deep learning framework always requires a mass of parameters, which inevitably makes the network suffer from a high computational burden. We propose a lightweight fusion distillation network (LFDN) for image deblurring and deraining to solve the above problems. The proposed LFDN is designed as an encoder–decoder architecture. In the encoding stage, the image feature is reduced to various small-scale spaces for multi-scale information extraction and fusion without much information loss. Then, a feature distillation normalization block is designed at the beginning of the decoding stage, which enables the network to distill and screen valuable channel information of feature maps continuously. Besides, an information fusion strategy between distillation modules and feature channels is also carried out by the attention mechanism. By fusing different information in the proposed approach, our network can achieve state-of-the-art image deblurring and deraining results with a smaller number of parameters and outperform the existing methods in model complexity.


Author(s):  
Qiang Yu ◽  
Feiqiang Liu ◽  
Long Xiao ◽  
Zitao Liu ◽  
Xiaomin Yang

Deep-learning (DL)-based methods are of growing importance in the field of single image super-resolution (SISR). The practical application of these DL-based models is a remaining problem due to the requirement of heavy computation and huge storage resources. The powerful feature maps of hidden layers in convolutional neural networks (CNN) help the model learn useful information. However, there exists redundancy among feature maps, which can be further exploited. To address these issues, this paper proposes a lightweight efficient feature generating network (EFGN) for SISR by constructing the efficient feature generating block (EFGB). Specifically, the EFGB can conduct plain operations on the original features to produce more feature maps with parameters slightly increasing. With the help of these extra feature maps, the network can extract more useful information from low resolution (LR) images to reconstruct the desired high resolution (HR) images. Experiments conducted on the benchmark datasets demonstrate that the proposed EFGN can outperform other deep-learning based methods in most cases and possess relatively lower model complexity. Additionally, the running time measurement indicates the feasibility of real-time monitoring.


Algorithms ◽  
2021 ◽  
Vol 14 (2) ◽  
pp. 39
Author(s):  
Carlos Lassance ◽  
Vincent Gripon ◽  
Antonio Ortega

Deep Learning (DL) has attracted a lot of attention for its ability to reach state-of-the-art performance in many machine learning tasks. The core principle of DL methods consists of training composite architectures in an end-to-end fashion, where inputs are associated with outputs trained to optimize an objective function. Because of their compositional nature, DL architectures naturally exhibit several intermediate representations of the inputs, which belong to so-called latent spaces. When treated individually, these intermediate representations are most of the time unconstrained during the learning process, as it is unclear which properties should be favored. However, when processing a batch of inputs concurrently, the corresponding set of intermediate representations exhibit relations (what we call a geometry) on which desired properties can be sought. In this work, we show that it is possible to introduce constraints on these latent geometries to address various problems. In more detail, we propose to represent geometries by constructing similarity graphs from the intermediate representations obtained when processing a batch of inputs. By constraining these Latent Geometry Graphs (LGGs), we address the three following problems: (i) reproducing the behavior of a teacher architecture is achieved by mimicking its geometry, (ii) designing efficient embeddings for classification is achieved by targeting specific geometries, and (iii) robustness to deviations on inputs is achieved via enforcing smooth variation of geometry between consecutive latent spaces. Using standard vision benchmarks, we demonstrate the ability of the proposed geometry-based methods in solving the considered problems.


2021 ◽  
Author(s):  
Marco Luca Sbodio ◽  
Natasha Mulligan ◽  
Stefanie Speichert ◽  
Vanessa Lopez ◽  
Joao Bettencourt-Silva

There is a growing trend in building deep learning patient representations from health records to obtain a comprehensive view of a patient’s data for machine learning tasks. This paper proposes a reproducible approach to generate patient pathways from health records and to transform them into a machine-processable image-like structure useful for deep learning tasks. Based on this approach, we generated over a million pathways from FAIR synthetic health records and used them to train a convolutional neural network. Our initial experiments show the accuracy of the CNN on a prediction task is comparable or better than other autoencoders trained on the same data, while requiring significantly less computational resources for training. We also assess the impact of the size of the training dataset on autoencoders performances. The source code for generating pathways from health records is provided as open source.


2021 ◽  
Author(s):  
Tong Guo

In industry deep learning application, our manually labeled data has a certain number of noisy data. To solve this problem and achieve more than 90 score in dev dataset, we present a simple method to find the noisy data and re-label the noisy data by human, given the model predictions as references in human labeling. In this paper, we illustrate our idea for a broad set of deep learning tasks, includes classification, sequence tagging, object detection, sequence generation, click-through rate prediction. The experimental results and human evaluation results verify our idea.


Sensors ◽  
2019 ◽  
Vol 19 (18) ◽  
pp. 3929 ◽  
Author(s):  
Grigorios Tsagkatakis ◽  
Anastasia Aidini ◽  
Konstantina Fotiadou ◽  
Michalis Giannopoulos ◽  
Anastasia Pentari ◽  
...  

Deep Learning, and Deep Neural Networks in particular, have established themselves as the new norm in signal and data processing, achieving state-of-the-art performance in image, audio, and natural language understanding. In remote sensing, a large body of research has been devoted to the application of deep learning for typical supervised learning tasks such as classification. Less yet equally important effort has also been allocated to addressing the challenges associated with the enhancement of low-quality observations from remote sensing platforms. Addressing such channels is of paramount importance, both in itself, since high-altitude imaging, environmental conditions, and imaging systems trade-offs lead to low-quality observation, as well as to facilitate subsequent analysis, such as classification and detection. In this paper, we provide a comprehensive review of deep-learning methods for the enhancement of remote sensing observations, focusing on critical tasks including single and multi-band super-resolution, denoising, restoration, pan-sharpening, and fusion, among others. In addition to the detailed analysis and comparison of recently presented approaches, different research avenues which could be explored in the future are also discussed.


2002 ◽  
Vol 124 (4) ◽  
pp. 697-705 ◽  
Author(s):  
Chul Kim ◽  
Paul I. Ro

In this study, an approach to obtain an accurate yet simple model for full-vehicle ride analysis is proposed. The approach involves linearization of a full car MBD (multibody dynamics) model to obtain a large-order vehicle model. The states of the model are divided into two groups depending on their effects on the ride quality and handling performance. Singular perturbation method is then applied to reduce the model size. Comparing the responses of the proposed model and the original MBD model shows an accurate matching between the two systems. A set of identified parameters that makes the well-known seven degree-of-freedom model very close to the full car MBD model is obtained. Finally, the benefits of the approach are illustrated through design of an active suspension system. The identified model exhibits improved performance over the nominal models in the sense that the accurate model leads to the appropriate selection of control gains. This study also provides an analytical method to investigate the effects of model complexity on model accuracy for vehicle suspension systems.


Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-13 ◽  
Author(s):  
Jun Jin Choong ◽  
Xin Liu ◽  
Tsuyoshi Murata

Discovering and modeling community structure exist to be a fundamentally challenging task. In domains such as biology, chemistry, and physics, researchers often rely on community detection algorithms to uncover community structures from complex systems yet no unified definition of community structure exists. Furthermore, existing models tend to be oversimplified leading to a neglect of richer information such as nodal features. Coupled with the surge of user generated information on social networks, a demand for newer techniques beyond traditional approaches is inevitable. Deep learning techniques such as network representation learning have shown tremendous promise. More specifically, supervised and semisupervised learning tasks such as link prediction and node classification have achieved remarkable results. However, unsupervised learning tasks such as community detection remain widely unexplored. In this paper, a novel deep generative model for community detection is proposed. Extensive experiments show that the proposed model, empowered with Bayesian deep learning, can provide insights in terms of uncertainty and exploit nonlinearities which result in better performance in comparison to state-of-the-art community detection methods. Additionally, unlike traditional methods, the proposed model is community structure definition agnostic. Leveraging on low-dimensional embeddings of both network topology and feature similarity, it automatically learns the best model configuration for describing similarities in a community.


2014 ◽  
Vol 2014 ◽  
pp. 1-14 ◽  
Author(s):  
Chi Man Vong ◽  
Weng Fai Ip ◽  
Pak Kin Wong

Accurate prediction models for air pollutants are crucial for forecast and health alarm to local inhabitants. In recent literature,discrete wavelet transform(DWT) was employed to decompose a series of air pollutant levels, followed by modeling usingsupport vector machine(SVM). This combination of DWT and SVM was reported to produce a more accurate prediction model for air pollutants by investigating different levels of frequency bands. However, DWT has a significant demand in model complexity, namely, the training time and the model size of the prediction model. In this paper, a new method calledvariation-oriented filtering(VF) is proposed to remove the data with low variation, which can be considered asnoiseto a prediction model. By VF, the noise and the size of the series of air pollutant levels can be reduced simultaneously and hence so are the training time and model size. The SO2(sulfur dioxide) level in Macau was selected as a test case. Experimental results show that VF can effectively and efficiently reduce the model complexity with improvement in predictive accuracy.


Sign in / Sign up

Export Citation Format

Share Document