scholarly journals Confound Removal and Normalization in Practice: A Neuroimaging Based Sex Prediction Case Study

Author(s):  
Shammi More ◽  
Simon B. Eickhoff ◽  
Julian Caspers ◽  
Kaustubh R. Patil

AbstractMachine learning (ML) methods are increasingly being used to predict pathologies and biological traits using neuroimaging data. Here controlling for confounds is essential to get unbiased estimates of generalization performance and to identify the features driving predictions. However, a systematic evaluation of the advantages and disadvantages of available alternatives is lacking. This makes it difficult to compare results across studies and to build deployment quality models. Here, we evaluated two commonly used confound removal schemes–whole data confound regression (WDCR) and cross-validated confound regression (CVCR)–to understand their effectiveness and biases induced in generalization performance estimation. Additionally, we study the interaction of the confound removal schemes with Z-score normalization, a common practice in ML modelling. We applied eight combinations of confound removal schemes and normalization (pipelines) to decode sex from resting-state functional MRI (rfMRI) data while controlling for two confounds, brain size and age. We show that both schemes effectively remove linear univariate and multivariate confounding effects resulting in reduced model performance with CVCR providing better generalization estimates, i.e., closer to out-of-sample performance than WDCR. We found no effect of normalizing before or after confound removal. In the presence of dataset and confound shift, four tested confound removal procedures yielded mixed results, raising new questions. We conclude that CVCR is a better method to control for confounding effects in neuroimaging studies. We believe that our in-depth analyses shed light on choices associated with confound removal and hope that it generates more interest in this problem instrumental to numerous applications.

2017 ◽  
Author(s):  
Heath R. Pardoe ◽  
Ruben Kuzniecky

AbstractThis paper describes NAPR, a cloud-based framework for accessing age prediction models created using machine learning-based analysis of neuroimaging data. The NAPR service is provided at https://www.cloudneuro.org. The NAPR system allows external users to predict the age of individual subjects using their own MRI data. As a demonstration of the NAPR approach, age prediction models were trained using healthy control data from the ABIDE, CoRR, DLBS and NKI Rockland neuroimaging datasets (total N = 2367). MRI scans were processed using Freesurfer v5.3. Age prediction models were trained using relevance vector machines and Gaussian processes machine learning techniques. NAPR will allow for rigorous and transparent out-of-sample assessment of age prediction model performance, and may therefore assist in the translation of neuroimaging-based modelling techniques to the clinic.


2018 ◽  
Author(s):  
Lukas Snoek ◽  
Steven Miletić ◽  
H. Steven Scholte

ABSTRACTOver the past decade, multivariate pattern analyses and especially decoding analyses have become a popular alternative to traditional mass-univariate analyses in neuroimaging research. However, a fundamental limitation of decoding analyses is that the source of information driving the decoder is ambiguous, which becomes problematic when the to-be-decoded variable is confounded by variables that are not of primary interest. In this study, we use a comprehensive set of simulations and analyses of empirical data to evaluate two techniques that were previously proposed and used to control for confounding variables in decoding analyses: counterbalancing and confound regression. For our empirical analyses, we attempt to decode gender from structural MRI data when controlling for the confound ‘brain size’. We show that both methods introduce strong biases in decoding performance: counterbalancing leads to better performance than expected (i.e., positive bias), which we show in our simulations is due to the subsampling process that tends to remove samples that are hard to classify; confound regression, on the other hand, leads to worse performance than expected (i.e., negative bias), even resulting in significant below-chance performance in some scenarios. In our simulations, we show that below-chance accuracy can be predicted by the variance of the distribution of correlations between the features and the target. Importantly, we show that this negative bias disappears in both the empirical analyses and simulations when the confound regression procedure performed in every fold of the cross-validation routine, yielding plausible model performance. From these results, we conclude that foldwise confound regression is the only method that appropriately controls for confounds, which thus can be used to gain more insight into the exact source(s) of information driving one’s decoding analysis.HIGHLIGHTSThe interpretation of decoding models is ambiguous when dealing with confounds;We evaluate two methods, counterbalancing and confound regression, in their ability to control for confounds;We find that counterbalancing leads to positive bias because it removes hard-to-classify samples;We find that confound regression leads to negative bias, because it yields data with less signal than expected by chance;Our simulations demonstrate a tight relationship between model performance in decoding analyses and the sample distribution of the correlation coefficient;We show that the negative bias observed in confound regression can be remedied by cross-validating the confound regression procedure;


Author(s):  
Stefan Hahn ◽  
Jessica Meyer ◽  
Michael Roitzsch ◽  
Christiaan Delmaar ◽  
Wolfgang Koch ◽  
...  

Spray applications enable a uniform distribution of substances on surfaces in a highly efficient manner, and thus can be found at workplaces as well as in consumer environments. A systematic literature review on modelling exposure by spraying activities has been conducted and status and further needs have been discussed with experts at a symposium. This review summarizes the current knowledge about models and their level of conservatism and accuracy. We found that extraction of relevant information on model performance for spraying from published studies and interpretation of model accuracy proved to be challenging, as the studies often accounted for only a small part of potential spray applications. To achieve a better quality of exposure estimates in the future, more systematic evaluation of models is beneficial, taking into account a representative variety of spray equipment and application patterns. Model predictions could be improved by more accurate consideration of variation in spray equipment. Inter-model harmonization with regard to spray input parameters and appropriate grouping of spray exposure situations is recommended. From a user perspective, a platform or database with information on different spraying equipment and techniques and agreed standard parameters for specific spraying scenarios from different regulations may be useful.


2003 ◽  
Vol 8 (3) ◽  
pp. 375-383 ◽  
Author(s):  
Rodolfo de Castro Ribas Jr. ◽  
Maria Lucia Seidl de Moura ◽  
Isabela Dias Soares ◽  
Alessandra Aparecida do Nascimento Gomes ◽  
Marc H. Bornstein

This review has several objectives: To describe and discuss theoretical conceptions of the construct of socioeconomic status (SES) and to argue for its vital role in psychological research; to present and analyze procedures employed to measure SES and trends in their utilization; and to review and discuss the use of SES measures in Brazilian psychological literature. The relative position of individuals, families, and groups in a given hierarchy (frequently converted into a score produced by a scale) is what has usually been called SES. The main indicators and procedures used to measure SES are discussed in regard to its advantages and disadvantages. A review of the literature offers evidence of the importance of the SES in different psychological processes. A systematic evaluation of articles from the PsycARTICLES database was conducted and revealed that the percentage of articles published annually that employed socioeconomic status increased steadily and substantially from 1988 through 2000 and that SES has been consistently applied more in some research areas (e.g., developmental, clinical, social psychology). A content analysis of the use of SES in articles published from 1981 through 2001 in three prominent Brazilian psychology journals was conducted showing that reliable SES measures are not commonly used in the Brazilian psychological literature. The results of these reviews and analyses are discussed in terms of their implications for further progress of psychological literature, especially in Brazil, with regard SES.


2021 ◽  
Author(s):  
Astrid Rybner ◽  
Emil Trenckner Jessen ◽  
Marie Damsgaard Mortensen ◽  
Stine Nyhus Larsen ◽  
Ruth Grossman ◽  
...  

Background: Machine learning (ML) approaches show increasing promise to identify vocal markers of Autism Spectrum Disorder (ASD). Nonetheless, it is unclear to what extent such markers generalize to new speech samples collected in diverse settings such as using a different speech task or a different language. Aim: In this paper, we systematically assess the generalizability of ML findings across a variety of contexts. Methods: We re-train a promising published ML model of vocal markers of ASD on novel cross-linguistic datasets following a rigorous pipeline to minimize overfitting, including cross-validated training and ensemble models. We test the generalizability of the models by testing them on i) different participants from the same study, performing the same task; ii) the same participants, performing a different (but similar) task; iii) a different study with participants speaking a different language, performing the same type of task. Results: While model performance is similar to previously published findings when trained and tested on data from the same study (out-of-sample performance), there is considerable variance between studies. Crucially, the models do not generalize well to new similar tasks and not at all to new languages. The ML pipeline is openly shared. Conclusion: Generalizability of ML models of vocal markers - and more generally biobehavioral markers - of ASD is an issue. We outline three recommendations researchers could take in order to be more explicit about generalizability and improve it in future studies.


JAMIA Open ◽  
2021 ◽  
Vol 4 (3) ◽  
Author(s):  
Anthony Finch ◽  
Alexander Crowell ◽  
Yung-Chieh Chang ◽  
Pooja Parameshwarappa ◽  
Jose Martinez ◽  
...  

Abstract Objective Attention networks learn an intelligent weighted averaging mechanism over a series of entities, providing increases to both performance and interpretability. In this article, we propose a novel time-aware transformer-based network and compare it to another leading model with similar characteristics. We also decompose model performance along several critical axes and examine which features contribute most to our model’s performance. Materials and methods Using data sets representing patient records obtained between 2017 and 2019 by the Kaiser Permanente Mid-Atlantic States medical system, we construct four attentional models with varying levels of complexity on two targets (patient mortality and hospitalization). We examine how incorporating transfer learning and demographic features contribute to model success. We also test the performance of a model proposed in recent medical modeling literature. We compare these models with out-of-sample data using the area under the receiver-operator characteristic (AUROC) curve and average precision as measures of performance. We also analyze the attentional weights assigned by these models to patient diagnoses. Results We found that our model significantly outperformed the alternative on a mortality prediction task (91.96% AUROC against 73.82% AUROC). Our model also outperformed on the hospitalization task, although the models were significantly more competitive in that space (82.41% AUROC against 80.33% AUROC). Furthermore, we found that demographic features and transfer learning features which are frequently omitted from new models proposed in the EMR modeling space contributed significantly to the success of our model. Discussion We proposed an original construction of deep learning electronic medical record models which achieved very strong performance. We found that our unique model construction outperformed on several tasks in comparison to a leading literature alternative, even when input data was held constant between them. We obtained further improvements by incorporating several methods that are frequently overlooked in new model proposals, suggesting that it will be useful to explore these options further in the future.


2018 ◽  
Author(s):  
Gab Abramowitz ◽  
Nadja Herger ◽  
Ethan Gutmann ◽  
Dorit Hammerling ◽  
Reto Knutti ◽  
...  

Abstract. The rationale for using multi-model ensembles in climate change projections and impacts research is often based on the expectation that different models constitute independent estimates, so that a range of models allows a better characterisation of the uncertainties in the representation of the climate system than a single model. However, it is known that research groups share literature, ideas for representations of processes, parameterisations, evaluation data sets and even sections of model code. Thus, nominally different models might have similar biases because of similarities in the way they represent a subset of processes, or even be near duplicates of others, weakening the assumption that they constitute independent estimates. If there are near-replicates of some models, then treating all models equally is likely to bias the inferences made using these ensembles. The challenge is to establish the degree to which this might be true for any given application. While this issue is recognized by many in the community, quantifying and accounting for model dependence in anything other than an ad-hoc way is challenging. Here we present a synthesis of the range of disparate attempts to define, quantify and address model dependence in multi-model climate ensembles in a common conceptual framework, and provide guidance on how users can test the efficacy of approaches that move beyond the equally weighted ensemble. In the upcoming Coupled Model Intercomparison Project phase 6 (CMIP6), several new models that are closely related to existing models are anticipated, as well as large ensembles from some models. We argue that quantitatively accounting for dependence in addition to model performance, and thoroughly testing the effectiveness of the approach used will be key to a sound interpretation of the CMIP ensembles in future scientific studies.


2021 ◽  
Vol 23 (Supplement_6) ◽  
pp. vi137-vi137
Author(s):  
Niklas Tillmanns ◽  
Avery Lum ◽  
W R Brim ◽  
Harry Subramanian ◽  
Ming Lin ◽  
...  

Abstract PURPOSE Generalizability, reproducibility and objectivity are critical elements that need to be considered when translating machine learning models into clinical practice. While a large body of literature has been published on machine learning methods for segmentation of brain tumors, a systematic evaluation of paper quality and reproducibility has not been done. We investigated the use of “Transparent Reporting of studies on prediction models for Individual Prognosis Or Diagnosis” (TRIPOD) items, among papers published in this relatively new and growing field. METHODS According to PRISMA a literature review was performed on four databases, Ovid Embase, Ovid MEDLINE, Cochrane trials (CENTRAL) and Web of science core-collection first in October 2020 and a second time in February 2021. Keywords and controlled vocabulary included artificial intelligence, machine learning, deep learning, radiomics, magnetic resonance imaging, glioma, and glioblastoma. The publications were assessed in order to the TRIPOD items. RESULTS 37 publications from our database search were screened in TRIPOD and yielded an average score of 12.08 with the maximum score being 16 and the minimum score 7. The best scoring item was interpretation (item 19) where all papers scored a point. The lowest scoring items were the title, the abstract, risk groups and the model performance (items number 1, 2, 11 and 16), where no paper scored a point. Less than 1% of the papers discussed the problem of missing data (item 9) and the funding of research (item 22). CONCLUSION TRIPOD analysis showed that a majority of the papers do not score high on critical elements that allow reproducibility, translation, and objectivity of research. An average score of 12.08 (40%) indicates that the publications usually achieve a relatively low score. The categories that were consistently poorly described include the ML network description, measuring model performance, title details and inclusion of information into the abstract.


Science ◽  
2018 ◽  
Vol 360 (6394) ◽  
pp. 1222-1227 ◽  
Author(s):  
P. K. Reardon ◽  
Jakob Seidlitz ◽  
Simon Vandekar ◽  
Siyuan Liu ◽  
Raihaan Patel ◽  
...  

Brain size variation over primate evolution and human development is associated with shifts in the proportions of different brain regions. Individual brain size can vary almost twofold among typically developing humans, but the consequences of this for brain organization remain poorly understood. Using in vivo neuroimaging data from more than 3000 individuals, we find that larger human brains show greater areal expansion in distributed frontoparietal cortical networks and related subcortical regions than in limbic, sensory, and motor systems. This areal redistribution recapitulates cortical remodeling across evolution, manifests by early childhood in humans, and is linked to multiple markers of heightened metabolic cost and neuronal connectivity. Thus, human brain shape is systematically coupled to naturally occurring variations in brain size through a scaling map that integrates spatiotemporally diverse aspects of neurobiology.


Sign in / Sign up

Export Citation Format

Share Document