scholarly journals Deep learning for brains?: Different linear and nonlinear scaling in UK Biobank brain images vs. machine-learning datasets

2019 ◽  
Author(s):  
Marc-Andre Schulz ◽  
B.T. Thomas Yeo ◽  
Joshua T. Vogelstein ◽  
Janaina Mourao-Miranada ◽  
Jakob N. Kather ◽  
...  

AbstractIn recent years, deep learning has unlocked unprecedented success in various domains, especially in image, text, and speech processing. These breakthroughs may hold promise for neuroscience and especially for brain-imaging investigators who start to analyze thousands of participants. However, deep learning is only beneficial if the data have nonlinear relationships and if they are exploitable at currently available sample sizes. We systematically profiled the performance of deep models, kernel models, and linear models as a function of sample size on UK Biobank brain images against established machine learning references. On MNIST and Zalando Fashion, prediction accuracy consistently improved when escalating from linear models to shallow-nonlinear models, and further improved when switching to deep-nonlinear models. The more observations were available for model training, the greater the performance gain we saw. In contrast, using structural or functional brain scans, simple linear models performed on par with more complex, highly parameterized models in age/sex prediction across increasing sample sizes. In fact, linear models kept improving as the sample size approached ∼10,000 participants. Our results indicate that the increase in performance of linear models with additional data does not saturate at the limit of current feasibility. Yet, nonlinearities of common brain scans remain largely inaccessible to both kernel and deep learning methods at any examined scale.

2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Marc-Andre Schulz ◽  
B. T. Thomas Yeo ◽  
Joshua T. Vogelstein ◽  
Janaina Mourao-Miranada ◽  
Jakob N. Kather ◽  
...  

2021 ◽  
Vol 13 (3) ◽  
pp. 368
Author(s):  
Christopher A. Ramezan ◽  
Timothy A. Warner ◽  
Aaron E. Maxwell ◽  
Bradley S. Price

The size of the training data set is a major determinant of classification accuracy. Nevertheless, the collection of a large training data set for supervised classifiers can be a challenge, especially for studies covering a large area, which may be typical of many real-world applied projects. This work investigates how variations in training set size, ranging from a large sample size (n = 10,000) to a very small sample size (n = 40), affect the performance of six supervised machine-learning algorithms applied to classify large-area high-spatial-resolution (HR) (1–5 m) remotely sensed data within the context of a geographic object-based image analysis (GEOBIA) approach. GEOBIA, in which adjacent similar pixels are grouped into image-objects that form the unit of the classification, offers the potential benefit of allowing multiple additional variables, such as measures of object geometry and texture, thus increasing the dimensionality of the classification input data. The six supervised machine-learning algorithms are support vector machines (SVM), random forests (RF), k-nearest neighbors (k-NN), single-layer perceptron neural networks (NEU), learning vector quantization (LVQ), and gradient-boosted trees (GBM). RF, the algorithm with the highest overall accuracy, was notable for its negligible decrease in overall accuracy, 1.0%, when training sample size decreased from 10,000 to 315 samples. GBM provided similar overall accuracy to RF; however, the algorithm was very expensive in terms of training time and computational resources, especially with large training sets. In contrast to RF and GBM, NEU, and SVM were particularly sensitive to decreasing sample size, with NEU classifications generally producing overall accuracies that were on average slightly higher than SVM classifications for larger sample sizes, but lower than SVM for the smallest sample sizes. NEU however required a longer processing time. The k-NN classifier saw less of a drop in overall accuracy than NEU and SVM as training set size decreased; however, the overall accuracies of k-NN were typically less than RF, NEU, and SVM classifiers. LVQ generally had the lowest overall accuracy of all six methods, but was relatively insensitive to sample size, down to the smallest sample sizes. Overall, due to its relatively high accuracy with small training sample sets, and minimal variations in overall accuracy between very large and small sample sets, as well as relatively short processing time, RF was a good classifier for large-area land-cover classifications of HR remotely sensed data, especially when training data are scarce. However, as performance of different supervised classifiers varies in response to training set size, investigating multiple classification algorithms is recommended to achieve optimal accuracy for a project.


Sensors ◽  
2018 ◽  
Vol 18 (11) ◽  
pp. 3777 ◽  
Author(s):  
Ataollah Shirzadi ◽  
Karim Soliamani ◽  
Mahmood Habibnejhad ◽  
Ataollah Kavian ◽  
Kamran Chapi ◽  
...  

The main objective of this research was to introduce a novel machine learning algorithm of alternating decision tree (ADTree) based on the multiboost (MB), bagging (BA), rotation forest (RF) and random subspace (RS) ensemble algorithms under two scenarios of different sample sizes and raster resolutions for spatial prediction of shallow landslides around Bijar City, Kurdistan Province, Iran. The evaluation of modeling process was checked by some statistical measures and area under the receiver operating characteristic curve (AUROC). Results show that, for combination of sample sizes of 60%/40% and 70%/30% with a raster resolution of 10 m, the RS model, while, for 80%/20% and 90%/10% with a raster resolution of 20 m, the MB model obtained a high goodness-of-fit and prediction accuracy. The RS-ADTree and MB-ADTree ensemble models outperformed the ADTree model in two scenarios. Overall, MB-ADTree in sample size of 80%/20% with a resolution of 20 m (area under the curve (AUC) = 0.942) and sample size of 60%/40% with a resolution of 10 m (AUC = 0.845) had the highest and lowest prediction accuracy, respectively. The findings confirm that the newly proposed models are very promising alternative tools to assist planners and decision makers in the task of managing landslide prone areas.


SOIL ◽  
2020 ◽  
Vol 6 (2) ◽  
pp. 565-578
Author(s):  
Wartini Ng ◽  
Budiman Minasny ◽  
Wanderson de Sousa Mendes ◽  
José Alexandre Melo Demattê

Abstract. The number of samples used in the calibration data set affects the quality of the generated predictive models using visible, near and shortwave infrared (VIS–NIR–SWIR) spectroscopy for soil attributes. Recently, the convolutional neural network (CNN) has been regarded as a highly accurate model for predicting soil properties on a large database. However, it has not yet been ascertained how large the sample size should be for CNN model to be effective. This paper investigates the effect of the training sample size on the accuracy of deep learning and machine learning models. It aims at providing an estimate of how many calibration samples are needed to improve the model performance of soil properties predictions with CNN as compared to conventional machine learning models. In addition, this paper also looks at a way to interpret the CNN models, which are commonly labelled as a black box. It is hypothesised that the performance of machine learning models will increase with an increasing number of training samples, but it will plateau when it reaches a certain number, while the performance of CNN will keep improving. The performances of two machine learning models (partial least squares regression – PLSR; Cubist) are compared against the CNN model. A VIS–NIR–SWIR spectra library from Brazil, containing 4251 unique sites with averages of two to three samples per depth (a total of 12 044 samples), was divided into calibration (3188 sites) and validation (1063 sites) sets. A subset of the calibration data set was then created to represent a smaller calibration data set ranging from 125, 300, 500, 1000, 1500, 2000, 2500 and 2700 unique sites, which is equivalent to a sample size of approximately 350, 840, 1400, 2800, 4200, 5600, 7000 and 7650. All three models (PLSR, Cubist and CNN) were generated for each sample size of the unique sites for the prediction of five different soil properties, i.e. cation exchange capacity, organic carbon, sand, silt and clay content. These calibration subset sampling processes and modelling were repeated 10 times to provide a better representation of the model performances. Learning curves showed that the accuracy increased with an increasing number of training samples. At a lower number of samples (< 1000), PLSR and Cubist performed better than CNN. The performance of CNN outweighed the PLSR and Cubist model at a sample size of 1500 and 1800, respectively. It can be recommended that deep learning is most efficient for spectra modelling for sample sizes above 2000. The accuracy of the PLSR and Cubist model seems to reach a plateau above sample sizes of 4200 and 5000, respectively, while the accuracy of CNN has not plateaued. A sensitivity analysis of the CNN model demonstrated its ability to determine important wavelengths region that affected the predictions of various soil attributes.


2021 ◽  
Author(s):  
Wanderson Bucker Moraes ◽  
Laurence V Madden ◽  
Pierce A. Paul

Since Fusarium head blight (FHB) intensity is usually highly variable within a plot, the number of spikes rated for FHB index (IND) quantification must be considered when designing experiments. In addition, quantification of sources of IND heterogeneity is crucial for defining sampling protocols. Field experiments were conducted to quantify the variability of IND (‘field severity’) at different spatial scales and to investigate the effects of sample size on estimated plot-level mean IND and its accuracy. A total of 216 7-row x 6-m-long plots of a moderately resistant and a susceptible cultivar were spray inoculated with different Fusarium graminearum spore concentrations at anthesis to generate a range of IND levels. A one-stage cluster sampling approach was used to estimate IND, with an average of 32 spikes rated at each of 10 equally spaced points per plot. Plot-level mean IND ranged from 0.9 to 37.9%. Heterogeneity of IND, quantified by fitting unconditional hierarchical linear models, was higher among spikes within clusters than among clusters within plots or among plots. The projected relative error of mean IND increased as mean IND decreased, and as sample size decreased below 100 spikes per plot. Simple random samples were drawn with replacement 50,000 times from the original dataset for each plot and used to estimate the effects of sample sizes on mean IND. Samples of 100 or more spikes resulted in more precise estimates of mean IND than smaller samples. Poor sampling may result in inaccurate estimates of IND and poor interpretation of results.


2019 ◽  
Author(s):  
Wartini Ng ◽  
Budiman Minasny ◽  
Wanderson de Sousa Mendes ◽  
José A. M. Demattê

Abstract. The number of samples used in the calibration dataset affects the quality of the generated predictive models using visible, near and shortwave infrared (VIS-NIR-SWIR) spectroscopy for soil attributes. Recently, convolutional neural network (CNN) is regarded as a highly accurate model for predicting soil properties on a large database, however it has not been ascertained yet how large the sample size should be for CNN model to be effective. This paper aims at providing an estimate of how much calibration samples are needed to improve the model performance of soil properties predictions with CNN. It is hypothesized that the larger the amount of data, the more accurate is the CNN model. The performance of two commonly used machine learning models (Partial least squares regression (PLSR) and Cubist) are compared against the CNN model. A VIS-NIR-SWIR spectral library from Brazil containing 4251 unique sites, with averages of 2–3 samples per depth (a total of 12,044 samples), was divided into calibration (3188 sites) and validation (1063 sites) sets. A subset of the calibration dataset was then created to represent smaller calibration dataset ranging from 125, 300, 500, 1000, 1500, 2000, 2500 and 2700 unique sites, or equivalent to sample size approximately 350, 840, 1400, 2800, 4200, 5600, 7000, and 7650. All three models (PLSR, Cubist, and CNN models) were generated for each sample size of the unique sites for the prediction of five different soil properties, i.e. cation exchange capacity, organic matter, sand, silt and clay content. These calibration subset sampling processes and modelling were repeated ten times to provide a better representation of the model performances. Similar results were observed when the performances of both PLSR and Cubist model were compared to the CNN model where the performance of CNN outweighed the PLSR and Cubist model at sample size of 1500 and 1800 respectively. It can be recommended that deep learning is most efficient for spectral modelling for sample size above 2000. The accuracy of the PLSR and Cubist model seemed to reach a plateau above sample size of 4200 and 5000 respectively. A sensitivity analysis was performed on the CNN model to determine important wavelengths region that affected the predictions of various soil attributes.


2019 ◽  
Vol 8 (2S11) ◽  
pp. 3700-3705

The extraordinary research in the field of unsupervised machine learning has made the non-technical media to expect to see Robot Lords overthrowing humans in near future. Whatever might be the media exaggeration, but the results of recent advances in the research of Deep Learning applications are so beautiful that it has become very difficult to differentiate between the man-made content and computer-made content. This paper tries to establish a ground for new researchers with different real-time applications of Deep Learning. This paper is not a complete study of all applications of Deep Learning, rather it focuses on some of the highly researched themes and popular applications in domains such Image Processing, Sound/Speech Processing, and Video Processing.


2021 ◽  
Author(s):  
Denis A Engemann ◽  
Apolline Mellot ◽  
Richard Hoechenberger ◽  
Hubert Banville ◽  
David Sabbagh ◽  
...  

Population-level modeling can define quantitative measures of individual aging by applying machine learning to large volumes of brain images. These measures of brain age, obtained from the general population, helped characterize disease severity in neurological populations, improving estimates of diagnosis or prognosis. Magnetoencephalography (MEG) and Electroencephalography (EEG) have the potential to further generalize this approach towards prevention and public health by enabling assessments of brain health at large scales in socioeconomically diverse environments. However, more research is needed to define methods that can handle the complexity and diversity of M/EEG signals across diverse real-world contexts. To catalyse this effort, here we propose reusable benchmarks of competing machine learning approaches for brain age modeling. We benchmarked popular classical machine learning pipelines and deep learning architectures previously used for pathology decoding or brain age estimation in 4 international M/EEG cohorts from diverse countries and cultural contexts, including recordings from more than 2500 participants. Our benchmarks were built on top of the M/EEG adaptations of the BIDS standard, providing tools that can be applied with minimal modification on any M/EEG dataset provided in the BIDS format. Our results suggest that, regardless of whether classical machine learning or deep learning was used, the highest performance was reached by pipelines and architectures involving spatially aware representations of the M/EEG signals, leading to R^2 scores between 0.60-0.71. Hand-crafted features paired with random forest regression provided robust benchmarks even in situations in which other approaches failed. Taken together, this set of benchmarks, accompanied by open-source software and high-level Python scripts, can serve as a starting point and quantitative reference for future efforts at developing M/EEG-based measures of brain aging. The generality of the approach renders this benchmark reusable for other related objectives such as modeling specific cognitive variables or clinical endpoints.


2021 ◽  
Author(s):  
Dong Jin Park ◽  
Min Woo Park ◽  
Homin Lee ◽  
Young-Jin Kim ◽  
Yeongsic Kim ◽  
...  

Abstract Artificial intelligence is a concept that includes machine learning and deep learning. The deep learning model used in this study corresponds to DNN (deep neural network) by utilizing two or more hidden layers. In this study, MLP (multi-layer perceptron) and machine learning models (XGBoost, LGBM) were used. An MLP consists of at least three layers: an input layer, a hidden layer, and an output layer. In general, tree models or linear models using machine learning are widely used for classification. We analyzed our data by applying deep learning (MLP) to improve the performance, which showed good results. The deep learning and ML models showed differences in predictive power and disease classification patterns. We used a confusion matrix and analyzed feature importance using the SHAP value method. Here, we present a protocol to confirm that the use of deep learning can show good performance in disease classification using hospital numerical structured data (laboratory test).


Sign in / Sign up

Export Citation Format

Share Document