Could machine learning aid the production of evidence reviews? A retrospective test of RobotAnalyst

Abstract Background The Observatory Evidence Service (OES) at Public Health Wales supports evidence informed decision making by conducting evidence reviews, which follow systematic review methodology, on complex public health topics. Machine-learning technologies have the potential to aid in screening studies for inclusion in reviews, and the OES have undertaken testing of one such system, RobotAnalyst, to assess its accuracy and to determine if it would increase the efficiency of the review process. Methods Retrospective testing was undertaken using three previously completed evidence reviews. For each test, references were uploaded into RobotAnalyst and the decisions made by the original reviewers were input in blocks of 25 to form a training set. The “update predictions” function generated a predicted inclusion decision for the remaining references at each test point and these were compared to the original review decisions. We calculated RobotAnalysts sensitivity, specificity, positive predictive value, false include and exclude rate and the proportion of missed references. Results Mixed levels of performance were observed. An overall increase in sensitivity as more studies were added to the training set was detected for two of the three reference sets when screened at title stage, but only in one case did RobotAnalyst produce relatively high levels of sensitivity (over 90%). This was observed in reference test set one (n = 500 references), where sensitivity increased from 51% at the start of testing to 91% after 250 references had been manually marked on the system. Although performance tended to be higher as more studies were added to the training sets, the increases were not always linear. Conclusions There may be some promise in using RobotAnalyst as a second screener, especially on larger reference sets when the human resource demands of duplicate screening are considerable. We are continuing to test RobotAnalyst both retrospectively and prospectively. Key messages Retrospective testing of RobotAnalyst observed mixed levels of performance. RobotAnalyst could potentially be utilised as a second screener for evidence review.

Download Full-text

Impact of the Choice of Cross-Validation Techniques on the Results of Machine Learning-Based Diagnostic Applications

Healthcare Informatics Research ◽

10.4258/hir.2021.27.3.189 ◽

2021 ◽

Vol 27 (3) ◽

pp. 189-199

Author(s):

Ilias Tougui ◽

Abdelilah Jilbab ◽

Jamal El Mhamdi

Keyword(s):

Machine Learning ◽

Clinical Study ◽

Cross Validation ◽

Learning Technologies ◽

Data Availability ◽

Support Vector ◽

Training Set ◽

The Subject ◽

Validation Set ◽

Diagnostic Applications

Objectives: With advances in data availability and computing capabilities, artificial intelligence and machine learning technologies have evolved rapidly in recent years. Researchers have taken advantage of these developments in healthcare informatics and created reliable tools to predict or classify diseases using machine learning-based algorithms. To correctly quantify the performance of those algorithms, the standard approach is to use cross-validation, where the algorithm is trained on a training set, and its performance is measured on a validation set. Both datasets should be subject-independent to simulate the expected behavior of a clinical study. This study compares two cross-validation strategies, the subject-wise and the record-wise techniques; the subject-wise strategy correctly mimics the process of a clinical study, while the record-wise strategy does not.Methods: We started by creating a dataset of smartphone audio recordings of subjects diagnosed with and without Parkinson’s disease. This dataset was then divided into training and holdout sets using subject-wise and the record-wise divisions. The training set was used to measure the performance of two classifiers (support vector machine and random forest) to compare six cross-validation techniques that simulated either the subject-wise process or the record-wise process. The holdout set was used to calculate the true error of the classifiers.Results: The record-wise division and the record-wise cross-validation techniques overestimated the performance of the classifiers and underestimated the classification error.Conclusions: In a diagnostic scenario, the subject-wise technique is the proper way of estimating a model’s performance, and record-wise techniques should be avoided.

Download Full-text

Abstract 135: Towards Automated Incidence Rate Reporting: Leveraging Machine Learning Technologies to Assist Stroke Adjudication in a Large-scale Epidemiological Study

Stroke ◽

10.1161/str.48.suppl_1.135 ◽

2017 ◽

Vol 48 (suppl_1) ◽

Author(s):

Yizhao Ni ◽

Kathleen Alwell ◽

Charles J Moomaw ◽

Daniel Woo ◽

Opeolu Adeoye ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

International Classification Of Diseases ◽

Epidemiological Studies ◽

Incidence Rates ◽

Learning Technologies ◽

Prediction Intervals ◽

Administrative Databases ◽

Sensitivity Specificity ◽

Improved Accuracy

Introduction: Epidemiological studies utilizing administrative databases typically use International Classification of Diseases (ICD) codes to identify stroke cases and estimate incidence rates. However, they are limited by sensitivity/specificity across study designs and stroke types. Few studies utilize physician chart review of patient records to confirm cases for improved accuracy, as this is labor intensive. We sought to develop a machine learning (ML) approach that could adjudicate potential stroke events. Methods: We utilized 8081 hospitalized stroke events in the Greater Cincinnati/Northern Kentucky Stroke Study. The study coordinators identified events with stroke-related diagnoses (ICD9 codes 430-438) from 17 regional hospitals in 2005 and 2010 and performed detailed chart abstraction. The information (e.g. diagnostic tests) was abstracted from patients’ medical records for each event, followed by physician case adjudication. Utilizing all clinical variables, a ML algorithm (logistic regression) was used to predict stroke cases and subtypes (ischemic, hemorrhagic, TIA, and non-strokes). Linear regression (LR) was applied to calibrate ML outputs and estimate prediction intervals based on gold-standard physician adjudication. The ML and LR models were trained on one year of data and tested on the other year. The model results were compared with using ICD-9 (ischemic: 434/436; hemorrhagic: 430-432; TIA: 435; non-stroke: other codes) calibrated by LR analysis. Results: Prediction intervals generated by ML covered the majority of true numbers of stroke events (Table). Compared with ICD9 codes, the ML algorithm achieved better sensitivity/specificity and more “hits” with narrower prediction intervals. Conclusions: The ML algorithm showed promise in matching physician adjudication and subtyping stroke cases. Future work is required to refine the methods to automate stroke epidemiology with improved accuracy and granularity.

Download Full-text

Machine Learning in Laryngoscopy Analysis: A Proof of Concept Observational Study for the Identification of Post-Extubation Ulcerations and Granulomas

Annals of Otology Rhinology & Laryngology ◽

10.1177/0003489420950364 ◽

2020 ◽

pp. 000348942095036

Author(s):

Felix Parker ◽

Martin B. Brodsky ◽

Lee M. Akst ◽

Haider Ali

Keyword(s):

Machine Learning ◽

Learning Technologies ◽

Learning Approaches ◽

Automated Classification ◽

Proof Of Concept ◽

Training Set ◽

Convenience Sample ◽

Applied Machine Learning ◽

Subjective Evaluations ◽

Computer Recognition

Objective: Computer-aided analysis of laryngoscopy images has potential to add objectivity to subjective evaluations. Automated classification of biomedical images is extremely challenging due to the precision required and the limited amount of annotated data available for training. Convolutional neural networks (CNNs) have the potential to improve image analysis and have demonstrated good performance in many settings. This study applied machine-learning technologies to laryngoscopy to determine the accuracy of computer recognition of known laryngeal lesions found in patients post-extubation. Methods: This is a proof of concept study that used a convenience sample of transnasal, flexible, distal-chip laryngoscopy images from patients post-extubation in the intensive care unit. After manually annotating images at the pixel-level, we applied a CNN-based method for analysis of granulomas and ulcerations to test potential machine-learning approaches for laryngoscopy analysis. Results: A total of 127 images from 25 patients were manually annotated for presence and shape of these lesions—100 for training, 27 for evaluating the system. There were 193 ulcerations (148 in the training set; 45 in the evaluation set) and 272 granulomas (208 in the training set; 64 in the evaluation set) identified. Time to annotate each image was approximately 3 minutes. Machine-based analysis demonstrated per-pixel sensitivity of 82.0% and 62.8% for granulomas and ulcerations respectively; specificity was 99.0% and 99.6%. Conclusion: This work demonstrates the feasibility of machine learning via CNN-based methods to add objectivity to laryngoscopy analysis, suggesting that CNN may aid in laryngoscopy analysis for other conditions in the future.

Download Full-text

Feature extraction and prediction of Dengue Outbreaks

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206544 ◽

2020 ◽

pp. 216-222

Author(s):

Kunal Parikh ◽

Tanvi Makadia ◽

Harshil Patel

Keyword(s):

Public Health ◽

Machine Learning ◽

Developing Countries ◽

Feature Extraction ◽

Predictive Analytics ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Health Concerns ◽

The World ◽

Dengue Outbreaks

Dengue is unquestionably one of the biggest health concerns in India and for many other developing countries. Unfortunately, many people have lost their lives because of it. Every year, approximately 390 million dengue infections occur around the world among which 500,000 people are seriously infected and 25,000 people have died annually. Many factors could cause dengue such as temperature, humidity, precipitation, inadequate public health, and many others. In this paper, we are proposing a method to perform predictive analytics on dengue’s dataset using KNN: a machine-learning algorithm. This analysis would help in the prediction of future cases and we could save the lives of many.

Download Full-text

USING MACHINE LEARNING TECHNOLOGIES IN MANAGEMENT FOR OPTIMAL PLACEMENT AND SPECIALIZATION OF AGRICULTURAL BUSINESS

Economy, labor, management in agriculture ◽

10.33938/208-17 ◽

2020 ◽

pp. 17-22

Author(s):

Е.А. Voronin ◽

A.G. Semkin

Keyword(s):

Machine Learning ◽

Learning Technologies ◽

Optimal Placement ◽

Agricultural Business

Download Full-text

Discriminating Malignancy in Thyroid Nodules: The Nomogram Versus the Kwak and ACR TI-RADS

Otolaryngology ◽

10.1177/0194599820939071 ◽

2020 ◽

Vol 163 (6) ◽

pp. 1156-1165

Author(s):

Juan Xiao ◽

Qiang Xiao ◽

Wei Cong ◽

Ting Li ◽

Shouluan Ding ◽

...

Keyword(s):

Thyroid Nodules ◽

Characteristic Curve ◽

Area Under The Curve ◽

Diagnostic Study ◽

Diagnostic Efficiency ◽

Training Set ◽

Multivariable Logistic Regression Model ◽

Predictive Values ◽

Validation Set ◽

Sensitivity Specificity

Objective To develop an easy-to-use nomogram for discrimination of malignant thyroid nodules and to compare diagnostic efficiency with the Kwak and American College of Radiology (ACR) Thyroid Imaging, Reporting and Data System (TI-RADS). Study Design Retrospective diagnostic study. Setting The Second Hospital of Shandong University. Subjects and Methods From March 2017 to April 2019, 792 patients with 1940 thyroid nodules were included into the training set; from May 2019 to December 2019, 174 patients with 389 nodules were included into the validation set. Multivariable logistic regression model was used to develop a nomogram for discriminating malignant nodules. To compare the diagnostic performance of the nomogram with the Kwak and ACR TI-RADS, the area under the receiver operating characteristic curve, sensitivity, specificity, and positive and negative predictive values were calculated. Results The nomogram consisted of 7 factors: composition, orientation, echogenicity, border, margin, extrathyroidal extension, and calcification. In the training set, for all nodules, the area under the curve (AUC) for the nomogram was 0.844, which was higher than the Kwak TI-RADS (0.826, P = .008) and the ACR TI-RADS (0.810, P < .001). For the 822 nodules >1 cm, the AUC of the nomogram was 0.891, which was higher than the Kwak TI-RADS (0.852, P < .001) and the ACR TI-RADS (0.853, P < .001). In the validation set, the AUC of the nomogram was also higher than the Kwak and ACR TI-RADS ( P < .05), each in the whole series and separately for nodules >1 or ≤1 cm. Conclusions When compared with the Kwak and ACR TI-RADS, the nomogram had a better performance in discriminating malignant thyroid nodules.

Download Full-text

Predicting Future Occurrence of Acute Hypotensive Episodes Using Noninvasive and Invasive Features

Military Medicine ◽

10.1093/milmed/usaa418 ◽

2021 ◽

Vol 186 (Supplement_1) ◽

pp. 445-451

Author(s):

Yifei Sun ◽

Navid Rashedi ◽

Vikrant Vaze ◽

Parikshit Shah ◽

Ryan Halter ◽

...

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Real World ◽

Short Term Memory ◽

Model Performance ◽

Learning Technologies ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbor ◽

Continuous Map

ABSTRACT Introduction Early prediction of the acute hypotensive episode (AHE) in critically ill patients has the potential to improve outcomes. In this study, we apply different machine learning algorithms to the MIMIC III Physionet dataset, containing more than 60,000 real-world intensive care unit records, to test commonly used machine learning technologies and compare their performances. Materials and Methods Five classification methods including K-nearest neighbor, logistic regression, support vector machine, random forest, and a deep learning method called long short-term memory are applied to predict an AHE 30 minutes in advance. An analysis comparing model performance when including versus excluding invasive features was conducted. To further study the pattern of the underlying mean arterial pressure (MAP), we apply a regression method to predict the continuous MAP values using linear regression over the next 60 minutes. Results Support vector machine yields the best performance in terms of recall (84%). Including the invasive features in the classification improves the performance significantly with both recall and precision increasing by more than 20 percentage points. We were able to predict the MAP with a root mean square error (a frequently used measure of the differences between the predicted values and the observed values) of 10 mmHg 60 minutes in the future. After converting continuous MAP predictions into AHE binary predictions, we achieve a 91% recall and 68% precision. In addition to predicting AHE, the MAP predictions provide clinically useful information regarding the timing and severity of the AHE occurrence. Conclusion We were able to predict AHE with precision and recall above 80% 30 minutes in advance with the large real-world dataset. The prediction of regression model can provide a more fine-grained, interpretable signal to practitioners. Model performance is improved by the inclusion of invasive features in predicting AHE, when compared to predicting the AHE based on only the available, restricted set of noninvasive technologies. This demonstrates the importance of exploring more noninvasive technologies for AHE prediction.

Download Full-text

Modelling Environmental Impact on Public Health using Machine Learning: Case Study on Asthma

2020 5th International Conference on Innovative Technologies in Intelligent Systems and Industrial Applications (CITISIA) ◽

10.1109/citisia50690.2020.9397488 ◽

2020 ◽

Author(s):

Lakmini Wijesekara ◽

Liwan Liyanage

Keyword(s):

Public Health ◽

Machine Learning ◽

Environmental Impact

Download Full-text

Design of a dynamic and self-adapting system, supported with artificial intelligence, machine learning and real-time intelligence for predictive cyber risk analytics in extreme environments – cyber risk in the colonisation of Mars

Safety in Extreme Environments ◽

10.1007/s42797-021-00025-1 ◽

2021 ◽

Author(s):

Petar Radanliev ◽

David De Roure ◽

Kevin Page ◽

Max Van Kleek ◽

Omar Santos ◽

...

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Real Time ◽

Cyber Security ◽

Extreme Environments ◽

Learning Technologies ◽

Cyber Attacks ◽

Edge Computing ◽

Mathematical Formulas ◽

Cyber Risk

AbstractMultiple governmental agencies and private organisations have made commitments for the colonisation of Mars. Such colonisation requires complex systems and infrastructure that could be very costly to repair or replace in cases of cyber-attacks. This paper surveys deep learning algorithms, IoT cyber security and risk models, and established mathematical formulas to identify the best approach for developing a dynamic and self-adapting system for predictive cyber risk analytics supported with Artificial Intelligence and Machine Learning and real-time intelligence in edge computing. The paper presents a new mathematical approach for integrating concepts for cognition engine design, edge computing and Artificial Intelligence and Machine Learning to automate anomaly detection. This engine instigates a step change by applying Artificial Intelligence and Machine Learning embedded at the edge of IoT networks, to deliver safe and functional real-time intelligence for predictive cyber risk analytics. This will enhance capacities for risk analytics and assists in the creation of a comprehensive and systematic understanding of the opportunities and threats that arise when edge computing nodes are deployed, and when Artificial Intelligence and Machine Learning technologies are migrated to the periphery of the internet and into local IoT networks.

Download Full-text

Predicting youth diabetes risk using NHANES data and machine learning

Scientific Reports ◽

10.1038/s41598-021-90406-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Nita Vangeepuram ◽

Bian Liu ◽

Po-hsiang Chiu ◽

Linhua Wang ◽

Gaurav Pandey

Keyword(s):

Machine Learning ◽

Large Scale ◽

Clinical Guideline ◽

Diabetes Risk ◽

Screening Tools ◽

Clinical Screening ◽

Predictive Values ◽

Screening Guideline ◽

Demographic Subgroups ◽

Sensitivity Specificity

AbstractPrediabetes and diabetes mellitus (preDM/DM) have become alarmingly prevalent among youth in recent years. However, simple questionnaire-based screening tools to reliably assess diabetes risk are only available for adults, not youth. As a first step in developing such a tool, we used a large-scale dataset from the National Health and Nutritional Examination Survey (NHANES) to examine the performance of a published pediatric clinical screening guideline in identifying youth with preDM/DM based on American Diabetes Association diagnostic biomarkers. We assessed the agreement between the clinical guideline and biomarker criteria using established evaluation measures (sensitivity, specificity, positive/negative predictive value, F-measure for the positive/negative preDM/DM classes, and Kappa). We also compared the performance of the guideline to those of machine learning (ML) based preDM/DM classifiers derived from the NHANES dataset. Approximately 29% of the 2858 youth in our study population had preDM/DM based on biomarker criteria. The clinical guideline had a sensitivity of 43.1% and specificity of 67.6%, positive/negative predictive values of 35.2%/74.5%, positive/negative F-measures of 38.8%/70.9%, and Kappa of 0.1 (95%CI: 0.06–0.14). The performance of the guideline varied across demographic subgroups. Some ML-based classifiers performed comparably to or better than the screening guideline, especially in identifying preDM/DM youth (p = 5.23 × 10−5).We demonstrated that a recommended pediatric clinical screening guideline did not perform well in identifying preDM/DM status among youth. Additional work is needed to develop a simple yet accurate screener for youth diabetes risk, potentially by using advanced ML methods and a wider range of clinical and behavioral health data.

Download Full-text