A novel machine learning algorithm has the potential to reduce by 1/3 the quantity of ILR episodes needing review

Abstract Background Implantable Loop Recorders (ILRs) are increasingly used and generate a high workload for timely adjudication of ECG recordings. In particular, the excessive false positive rate leads to a significant review burden. Purpose A novel machine learning algorithm was developed to reclassify ILR episodes in order to decrease by 80% the False Positive rate while maintaining 99% sensitivity. This study aims to evaluate the impact of this algorithm to reduce the number of abnormal episodes reported in Medtronic ILRs. Methods Among 20 European centers, all Medtronic ILR patients were enrolled during the 2nd semester of 2020. Using a remote monitoring platform, every ILR transmitted episode was collected and anonymised. For every ILR detected episode with a transmitted ECG, the new algorithm reclassified it applying the same labels as the ILR (asystole, brady, AT/AF, VT, artifact, normal). We measured the number of episodes identified as false positive and reclassified as normal by the algorithm, and their proportion among all episodes. Results In 370 patients, ILRs recorded 3755 episodes including 305 patient-triggered and 629 with no ECG transmitted. 2821 episodes were analyzed by the novel algorithm, which reclassified 1227 episodes as normal rhythm. These reclassified episodes accounted for 43% of analyzed episodes and 32.6% of all episodes recorded. Conclusion A novel machine learning algorithm significantly reduces the quantity of episodes flagged as abnormal and typically reviewed by healthcare professionals. FUNDunding Acknowledgement Type of funding sources: None. Figure 1. ILR episodes analysis

Download Full-text

Identification of newborns at risk for autism using electronic medical records and machine learning

10.1101/19008367 ◽

2019 ◽

Author(s):

Rayees Rahman ◽

Arad Kodesh ◽

Stephen Z Levine ◽

Sven Sandin ◽

Abraham Reichenberg ◽

...

Keyword(s):

Machine Learning ◽

Autism Spectrum Disorder ◽

Positive Predictive Value ◽

Electronic Medical Records ◽

Predictive Value ◽

False Positive ◽

Medical Records ◽

False Positive Rate ◽

Autism Spectrum ◽

Positive Rate

AbstractImportanceCurrent approaches for early identification of individuals at high risk for autism spectrum disorder (ASD) in the general population are limited, where most ASD patients are not identified until after the age of 4. This is despite substantial evidence suggesting that early diagnosis and intervention improves developmental course and outcome.ObjectiveDevelop a machine learning (ML) method predicting the diagnosis of ASD in offspring in a general population sample, using parental electronic medical records (EMR) available before childbirthDesignPrognostic study of EMR data within a single Israeli health maintenance organization, for the parents of 1,397 ASD children (ICD-9/10), and 94,741 non-ASD children born between January 1st, 1997 through December 31st, 2008. The complete EMR record of the parents was used to develop various ML models to predict the risk of having a child with ASD.Main outcomes and measuresRoutinely available parental sociodemographic information, medical histories and prescribed medications data until offspring’s birth were used to generate features to train various machine learning algorithms, including multivariate logistic regression, artificial neural networks, and random forest. Prediction performance was evaluated with 10-fold cross validation, by computing C statistics, sensitivity, specificity, accuracy, false positive rate, and precision (positive predictive value, PPV).ResultsAll ML models tested had similar performance, achieving an average C statistics of 0.70, sensitivity of 28.63%, specificity of 98.62%, accuracy of 96.05%, false positive rate of 1.37%, and positive predictive value of 45.85% for predicting ASD in this dataset.Conclusion and relevanceML algorithms combined with EMR capture early life ASD risk. Such approaches may be able to enhance the ability for accurate and efficient early detection of ASD in large populations of children.Key pointsQuestionCan autism risk in children be predicted using the pre-birth electronic medical record (EMR) of the parents?FindingsIn this population-based study that included 1,397 children with autism spectrum disorder (ASD) and 94,741 non-ASD children, we developed a machine learning classifier for predicting the likelihood of childhood diagnosis of ASD with an average C statistic of 0.70, sensitivity of 28.63%, specificity of 98.62%, accuracy of 96.05%, false positive rate of 1.37%, and positive predictive value of 45.85%.MeaningThe results presented serve as a proof-of-principle of the potential utility of EMR for the identification of a large proportion of future children at a high-risk of ASD.

Download Full-text

Analyzing the Impact of Climate Factors on GNSS-Derived Displacements by Combining the Extended Helmert Transformation and XGboost Machine Learning Algorithm

Journal of Sensors ◽

10.1155/2021/9926442 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Hanlin Liu ◽

Linqiang Yang ◽

Linchao Li

Keyword(s):

Machine Learning ◽

Puerto Rico ◽

Reference Frame ◽

Learning Algorithm ◽

Virgin Islands ◽

Machine Learning Algorithm ◽

Climate Factors ◽

Helmert Transformation ◽

The Impact

A variety of climate factors influence the precision of the long-term Global Navigation Satellite System (GNSS) monitoring data. To precisely analyze the effect of different climate factors on long-term GNSS monitoring records, this study combines the extended seven-parameter Helmert transformation and a machine learning algorithm named Extreme Gradient boosting (XGboost) to establish a hybrid model. We established a local-scale reference frame called stable Puerto Rico and Virgin Islands reference frame of 2019 (PRVI19) using ten continuously operating long-term GNSS sites located in the rigid portion of the Puerto Rico and Virgin Islands (PRVI) microplate. The stability of PRVI19 is approximately 0.4 mm/year and 0.5 mm/year in the horizontal and vertical directions, respectively. The stable reference frame PRVI19 can avoid the risk of bias due to long-term plate motions when studying localized ground deformation. Furthermore, we applied the XGBoost algorithm to the postprocessed long-term GNSS records and daily climate data to train the model. We quantitatively evaluated the importance of various daily climate factors on the GNSS time series. The results show that wind is the most influential factor with a unit-less index of 0.013. Notably, we used the model with climate and GNSS records to predict the GNSS-derived displacements. The results show that the predicted displacements have a slightly lower root mean square error compared to the fitted results using spline method (prediction: 0.22 versus fitted: 0.31). It indicates that the proposed model considering the climate records has the appropriate predict results for long-term GNSS monitoring.

Download Full-text

Machine Learning for Automated Polyp Detection in Computed Tomography Colonography

Machine Learning ◽

10.4018/978-1-60960-818-7.ch407 ◽

2012 ◽

pp. 830-850

Author(s):

Abhilash Alexander Miranda ◽

Olivier Caelen ◽

Gianluca Bontempi

Keyword(s):

Machine Learning ◽

Computed Tomography ◽

False Positive ◽

False Positive Rate ◽

Learning Algorithms ◽

Colorectal Polyps ◽

Machine Learning Algorithms ◽

Computed Tomography Colonography ◽

Positive Rate ◽

Independent Features

This chapter presents a comprehensive scheme for automated detection of colorectal polyps in computed tomography colonography (CTC) with particular emphasis on robust learning algorithms that differentiate polyps from non-polyp shapes. The authors’ automated CTC scheme introduces two orientation independent features which encode the shape characteristics that aid in classification of polyps and non-polyps with high accuracy, low false positive rate, and low computations making the scheme suitable for colorectal cancer screening initiatives. Experiments using state-of-the-art machine learning algorithms viz., lazy learning, support vector machines, and naïve Bayes classifiers reveal the robustness of the two features in detecting polyps at 100% sensitivity for polyps with diameter greater than 10 mm while attaining total low false positive rates, respectively, of 3.05, 3.47 and 0.71 per CTC dataset at specificities above 99% when tested on 58 CTC datasets. The results were validated using colonoscopy reports provided by expert radiologists.

Download Full-text

Identification of newborns at risk for autism using electronic medical records and machine learning

European Psychiatry ◽

10.1192/j.eurpsy.2020.17 ◽

2020 ◽

Vol 63 (1) ◽

Author(s):

Rayees Rahman ◽

Arad Kodesh ◽

Stephen Z. Levine ◽

Sven Sandin ◽

Abraham Reichenberg ◽

...

Keyword(s):

Machine Learning ◽

General Population ◽

Electronic Medical Records ◽

False Positive ◽

Medical Records ◽

False Positive Rate ◽

Autism Spectrum ◽

Health Maintenance ◽

C Statistic ◽

Positive Rate

Abstract Background. Current approaches for early identification of individuals at high risk for autism spectrum disorder (ASD) in the general population are limited, and most ASD patients are not identified until after the age of 4. This is despite substantial evidence suggesting that early diagnosis and intervention improves developmental course and outcome. The aim of the current study was to test the ability of machine learning (ML) models applied to electronic medical records (EMRs) to predict ASD early in life, in a general population sample. Methods. We used EMR data from a single Israeli Health Maintenance Organization, including EMR information for parents of 1,397 ASD children (ICD-9/10) and 94,741 non-ASD children born between January 1st, 1997 and December 31st, 2008. Routinely available parental sociodemographic information, parental medical histories, and prescribed medications data were used to generate features to train various ML algorithms, including multivariate logistic regression, artificial neural networks, and random forest. Prediction performance was evaluated with 10-fold cross-validation by computing the area under the receiver operating characteristic curve (AUC; C-statistic), sensitivity, specificity, accuracy, false positive rate, and precision (positive predictive value [PPV]). Results. All ML models tested had similar performance. The average performance across all models had C-statistic of 0.709, sensitivity of 29.93%, specificity of 98.18%, accuracy of 95.62%, false positive rate of 1.81%, and PPV of 43.35% for predicting ASD in this dataset. Conclusions. We conclude that ML algorithms combined with EMR capture early life ASD risk as well as reveal previously unknown features to be associated with ASD-risk. Such approaches may be able to enhance the ability for accurate and efficient early detection of ASD in large populations of children.

Download Full-text

Impact of Encoding of High Cardinality Categorical Data to Solve Prediction Problems

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9044 ◽

2020 ◽

Vol 17 (9) ◽

pp. 4197-4201

Author(s):

Heena Gupta ◽

V. Asha

Keyword(s):

Machine Learning ◽

Categorical Data ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Prediction Problem ◽

Encoding Scheme ◽

The Impact ◽

Prediction Problems

The prediction problem in any domain is very important to assess the prices and preferences among people. This issue varies for different kinds of data. Data may be nominal or ordinal, it may involve more categories or less. For any category to be considered by a machine learning algorithm, it needs to be encoded before any other operation can be further performed. There are various encoding schemes available like label encoding, count encoding and one hot encoding. This paper aims to understand the impact of various encoding schemes and the accuracy among the prediction problems of high cardinality categorical data. The paper also proposes an encoding scheme based on curated strings. The domain chosen for this purpose is predicting doctors’ fees in various cities having different profiles and qualification.

Download Full-text

Data Mining Validation of Fluconazole Breakpoints Established by the European Committee on Antimicrobial Susceptibility Testing

Antimicrobial Agents and Chemotherapy ◽

10.1128/aac.00081-09 ◽

2009 ◽

Vol 53 (7) ◽

pp. 2949-2954 ◽

Cited By ~ 18

Author(s):

Isabel Cuesta ◽

Concha Bielza ◽

Pedro Larrañaga ◽

Manuel Cuenca-Estrella ◽

Fernando Laguna ◽

...

Keyword(s):

Machine Learning ◽

Antimicrobial Susceptibility ◽

Roc Curve ◽

False Positive ◽

Statistical Power ◽

Susceptibility Testing ◽

False Positive Rate ◽

Antimicrobial Susceptibility Testing ◽

European Committee ◽

Positive Rate

ABSTRACT European Committee on Antimicrobial Susceptibility Testing (EUCAST) breakpoints classify Candida strains with a fluconazole MIC ≤ 2 mg/liter as susceptible, those with a fluconazole MIC of 4 mg/liter as representing intermediate susceptibility, and those with a fluconazole MIC > 4 mg/liter as resistant. Machine learning models are supported by complex statistical analyses assessing whether the results have statistical relevance. The aim of this work was to use supervised classification algorithms to analyze the clinical data used to produce EUCAST fluconazole breakpoints. Five supervised classifiers (J48, Correlation and Regression Trees [CART], OneR, Naïve Bayes, and Simple Logistic) were used to analyze two cohorts of patients with oropharyngeal candidosis and candidemia. The target variable was the outcome of the infections, and the predictor variables consisted of values for the MIC or the proportion between the dose administered and the MIC of the isolate (dose/MIC). Statistical power was assessed by determining values for sensitivity and specificity, the false-positive rate, the area under the receiver operating characteristic (ROC) curve, and the Matthews correlation coefficient (MCC). CART obtained the best statistical power for a MIC > 4 mg/liter for detecting failures (sensitivity, 87%; false-positive rate, 8%; area under the ROC curve, 0.89; MCC index, 0.80). For dose/MIC determinations, the target was >75, with a sensitivity of 91%, a false-positive rate of 10%, an area under the ROC curve of 0.90, and an MCC index of 0.80. Other classifiers gave similar breakpoints with lower statistical power. EUCAST fluconazole breakpoints have been validated by means of machine learning methods. These computer tools must be incorporated in the process for developing breakpoints to avoid researcher bias, thus enhancing the statistical power of the model.

Download Full-text

A Machine-Learning Algorithm for Estimating and Ranking the Impact of Environmental Risk Factors in Exploratory Epidemiological Studies

Statistical Modeling for Biological Systems ◽

10.1007/978-3-030-34675-1_8 ◽

2020 ◽

pp. 137-156

Author(s):

Jessica G. Young ◽

Alan E. Hubbard ◽

Brenda Eskenazi ◽

Nicholas P. Jewell

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Environmental Risk ◽

Learning Algorithm ◽

Epidemiological Studies ◽

Environmental Risk Factors ◽

Machine Learning Algorithm ◽

The Impact

Download Full-text

A hybrid approach to reducing the false positive rate in unsupervised machine learning intrusion detection

SoutheastCon 2016 ◽

10.1109/secon.2016.7506773 ◽

2016 ◽

Cited By ~ 3

Author(s):

Angela Denise Landress

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

False Positive ◽

False Positive Rate ◽

Hybrid Approach ◽

Unsupervised Machine Learning ◽

Positive Rate

Download Full-text

The impact of P-hacking on "Redefine statistical significance"

10.31234/osf.io/bp2z4 ◽

2017 ◽

Author(s):

Harry Crane

Keyword(s):

False Positive ◽

Statistical Significance ◽

False Positive Rate ◽

Perceived Benefits ◽

Replication Rate ◽

Recent Proposal ◽

Positive Rate ◽

Replication Crisis ◽

The Impact ◽

Lower Cutoff

A recent proposal to "redefine statistical significance" (Benjamin, et al. Nature Human Behaviour, 2017) claims that false positive rates "would immediately improve" by factors greater than two and replication rates would double simply by changing the conventional cutoff for 'statistical significance' from P<0.05 to P<0.005. I analyze the veracity of these claims, focusing especially on how Benjamin, et al neglect the effects of P-hacking in assessing the impact of their proposal. My analysis shows that once P-hacking is accounted for the perceived benefits of the lower threshold all but disappear, prompting two main conclusions: (i) The claimed improvements to false positive rate and replication rate in Benjamin, et al (2017) are exaggerated and misleading. (ii) There are plausible scenarios under which the lower cutoff will make the replication crisis worse.

Download Full-text

A New Data-Driven Turbulence Model Framework for Unsteady Flows Applied to Wall-Jet and Wall-Wake Flows

Volume 2A: Turbomachinery ◽

10.1115/gt2019-90179 ◽

2019 ◽

Author(s):

Chitrarth Lav ◽

Jimmy Philip ◽

Richard D. Sandberg

Keyword(s):

Machine Learning ◽

Mass Flow ◽

Vortex Shedding ◽

Learning Algorithm ◽

Data Driven ◽

Wall Jet ◽

Machine Learning Algorithm ◽

Length Scales ◽

Wake Flows ◽

The Impact

Abstract The unsteady flow prediction for turbomachinery applications relies heavily on unsteady RANS (URANS). For flows that exhibit vortex shedding, such as the wall-jet/wake flows considered in this study, URANS is unable to predict the correct momentum mixing with sufficient accuracy. We suggest a novel framework to improve that prediction, whereby the deterministic scales associated with vortex shedding are resolved while the stochastic scales of pure turbulence are modelled. The framework first separates the stochastic from the deterministic length scales and then develops a bespoke turbulence closure for the stochastic scales using a data-driven machine-learning algorithm. The novelty of the method lies in the use of machine-learning to develop closures tailored to URANS calculations. For the walljet/wake flow, three different mass flow ratios (0.86, 1.07 and 1.26) have been considered and a high-fidelity dataset of the idealised geometry is utilised for the sake of model development. This study serves as an a priori analysis, where the closures obtained from the machine-learning algorithm are evaluated before their implementation in URANS. The analysis looks at the impact of using all length scales versus the stochastic scales for closure development, and the impact of the extent of the spatial domain for developing the closure. It is found that a two-layer approach, using bespoke trained models for the near wall and the jet/wake regions, produce the best results. Finally, the generalisability of the developed closures is also evaluated by applying a given closure developed using a particular mass flow ratio to the other cases.

Download Full-text