Predicting Central Serous Chorioretinopathy Recurrence Using Machine Learning

Purpose: To predict central serous chorioretinopathy (CSC) recurrence 3 and 6 months after laser treatment by using machine learning.Methods: Clinical and imaging features of 461 patients (480 eyes) with CSC were collected at Zhongshan Ophthalmic Center (ZOC) and Xiamen Eye Center (XEC). The ZOC data (416 eyes of 401 patients) were used as the training dataset and the internal test dataset, while the XEC data (64 eyes of 60 patients) were used as the external test dataset. Six different machine learning algorithms and an ensemble model were trained to predict recurrence in patients with CSC. After completing the initial detailed investigation, we designed a simplified model using only clinical data and OCT features.Results: The ensemble model exhibited the best performance among the six algorithms, with accuracies of 0.941 (internal test dataset) and 0.970 (external test dataset) at 3 months and 0.903 (internal test dataset) and 1.000 (external test dataset) at 6 months. The simplified model showed a comparable level of predictive power.Conclusion: Machine learning achieves high accuracies in predicting the recurrence of CSC patients. The application of an intelligent recurrence prediction model for patients with CSC can potentially facilitate recurrence factor identification and precise individualized interventions.

Download Full-text

Streamlining Quality Review of Mass Spectrometry Data in the Clinical Laboratory by Use of Machine Learning

Archives of Pathology & Laboratory Medicine ◽

10.5858/arpa.2018-0238-oa ◽

2019 ◽

Vol 143 (8) ◽

pp. 990-998 ◽

Cited By ~ 2

Author(s):

Min Yu ◽

Lindsay A. L. Bazydlo ◽

David E. Bruns ◽

James H. Harrison

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Turnaround Time ◽

Machine Learning Algorithms ◽

Classification Model ◽

Supervised Machine Learning ◽

Training Dataset ◽

Support Vector ◽

Test Dataset ◽

Manual Review

Context.— Turnaround time and productivity of clinical mass spectrometric (MS) testing are hampered by time-consuming manual review of the analytical quality of MS data before release of patient results. Objective.— To determine whether a classification model created by using standard machine learning algorithms can verify analytically acceptable MS results and thereby reduce manual review requirements. Design.— We obtained retrospective data from gas chromatography–MS analyses of 11-nor-9-carboxy-delta-9-tetrahydrocannabinol (THC-COOH) in 1267 urine samples. The data for each sample had been labeled previously as either analytically unacceptable or acceptable by manual review. The dataset was randomly split into training and test sets (848 and 419 samples, respectively), maintaining equal proportions of acceptable (90%) and unacceptable (10%) results in each set. We used stratified 10-fold cross-validation in assessing the abilities of 6 supervised machine learning algorithms to distinguish unacceptable from acceptable assay results in the training dataset. The classifier with the highest recall was used to build a final model, and its performance was evaluated against the test dataset. Results.— In comparison testing of the 6 classifiers, a model based on the Support Vector Machines algorithm yielded the highest recall and acceptable precision. After optimization, this model correctly identified all unacceptable results in the test dataset (100% recall) with a precision of 81%. Conclusions.— Automated data review identified all analytically unacceptable assays in the test dataset, while reducing the manual review requirement by about 87%. This automation strategy can focus manual review only on assays likely to be problematic, allowing improved throughput and turnaround time without reducing quality.

Download Full-text

Handcrafted and Deep Learning-Based Radiomic Models Can Distinguish GBM from Brain Metastasis

Journal of Oncology ◽

10.1155/2021/5518717 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Zhiyuan Liu ◽

Zekun Jiang ◽

Li Meng ◽

Jun Yang ◽

Ying Liu ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Brain Metastasis ◽

Area Under The Curve ◽

Classification Performance ◽

Machine Learning Algorithms ◽

Training Dataset ◽

Test Dataset ◽

Contrast Enhanced ◽

Magnetic Resonance Imaging Mri

Objective. The purpose of this study was to investigate the feasibility of applying handcrafted radiomics (HCR) and deep learning-based radiomics (DLR) for the accurate preoperative classification of glioblastoma (GBM) and solitary brain metastasis (BM). Methods. A retrospective analysis of the magnetic resonance imaging (MRI) data of 140 patients (110 in the training dataset and 30 in the test dataset) with GBM and 128 patients (98 in the training dataset and 30 in the test dataset) with BM confirmed by surgical pathology was performed. The regions of interest (ROIs) on T1-weighted imaging (T1WI), T2-weighted imaging (T2WI), and contrast-enhanced T1WI (T1CE) were drawn manually, and then, HCR and DLR analyses were performed. On this basis, different machine learning algorithms were implemented and compared to find the optimal modeling method. The final classifiers were identified and validated for different MRI modalities using HCR features and HCR + DLR features. By analyzing the receiver operating characteristic (ROC) curve, the area under the curve (AUC), accuracy, sensitivity, and specificity were calculated to evaluate the predictive efficacy of different methods. Results. In multiclassifier modeling, random forest modeling showed the best distinguishing performance among all MRI modalities. HCR models already showed good results for distinguishing between the two types of brain tumors in the test dataset (T1WI, AUC = 0.86; T2WI, AUC = 0.76; T1CE, AUC = 0.93). By adding DLR features, all AUCs showed significant improvement (T1WI, AUC = 0.87; T2WI, AUC = 0.80; T1CE, AUC = 0.97; p < 0.05 ). The T1CE-based radiomic model showed the best classification performance (AUC = 0.99 in the training dataset and AUC = 0.97 in the test dataset), surpassing the other MRI modalities ( p < 0.05 ). The multimodality radiomic model also showed robust performance (AUC = 1 in the training dataset and AUC = 0.84 in the test dataset). Conclusion. Machine learning models using MRI radiomic features can help distinguish GBM from BM effectively, especially the combination of HCR and DLR features.

Download Full-text

Derivation of Respiratory Metrics in Health and Asthma; Machine Learning Methodology (Preprint)

10.2196/preprints.25178 ◽

2020 ◽

Author(s):

Joseph Prinable ◽

Peter Jones ◽

David Boland ◽

Alistair McEwan ◽

Cindy Thamrin

Keyword(s):

Machine Learning ◽

Short Term Memory ◽

Reference Signal ◽

Machine Learning Algorithms ◽

Training Dataset ◽

Local Health ◽

Strongly Correlated ◽

Human Research Ethics ◽

Expiration Time ◽

Wide Variability

BACKGROUND The ability to continuously monitor breathing metrics may have indications for general health as well as respiratory conditions such as asthma. However, few studies have focused on breathing due to a lack of available wearable technologies. OBJECTIVE Examine the performance of two machine learning algorithms in extracting breathing metrics from a finger-based pulse oximeter, which is amenable to long-term monitoring. METHODS Pulse oximetry data was collected from 11 healthy and 11 asthma subjects who breathed at a range of controlled respiratory rates. UNET and Long Short-Term memory (LSTM) algorithms were applied to the data, and results compared against breathing metrics derived from respiratory inductance plethysmography measured simultaneously as a reference. RESULTS The UNET vs LSTM model provided breathing metrics which were strongly correlated with those from the reference signal (all p<0.001, except for inspiratory:expiratory ratio). The following relative mean bias(95% confidence interval) were observed: inspiration time 1.89(-52.95, 56.74)% vs 1.30(-52.15, 54.74)%, expiration time -3.70(-55.21, 47.80)% vs -4.97(-56.84, 46.89)%, inspiratory:expiratory ratio -4.65(-87.18, 77.88)% vs -5.30(-87.07, 76.47)%, inter-breath intervals -2.39(-32.76, 27.97)% vs -3.16(-33.69, 27.36)%, and respiratory rate 2.99(-27.04 to 33.02)% vs 3.69(-27.17 to 34.56)%. CONCLUSIONS Both machine learning models show strongly correlation and good comparability with reference, with low bias though wide variability for deriving breathing metrics in asthma and health cohorts. Future efforts should focus on improvement of performance of these models, e.g. by increasing the size of the training dataset at the lower breathing rates. CLINICALTRIAL Sydney Local Health District Human Research Ethics Committee (#LNR\16\HAWKE99 ethics approval).

Download Full-text

Exploratory Analysis of Driving Force of Wildfires in Australia: An Application of Machine Learning within Google Earth Engine

Remote Sensing ◽

10.3390/rs13010010 ◽

2020 ◽

Vol 13 (1) ◽

pp. 10

Author(s):

Andrea Sulova ◽

Jamal Jokar Arsanjani

Keyword(s):

Climate Change ◽

Machine Learning ◽

Random Forest ◽

Google Earth ◽

Summer Season ◽

Driving Factors ◽

Machine Learning Algorithms ◽

Classification And Regression Tree ◽

Training Dataset ◽

Google Earth Engine

Recent studies have suggested that due to climate change, the number of wildfires across the globe have been increasing and continue to grow even more. The recent massive wildfires, which hit Australia during the 2019–2020 summer season, raised questions to what extent the risk of wildfires can be linked to various climate, environmental, topographical, and social factors and how to predict fire occurrences to take preventive measures. Hence, the main objective of this study was to develop an automatized and cloud-based workflow for generating a training dataset of fire events at a continental level using freely available remote sensing data with a reasonable computational expense for injecting into machine learning models. As a result, a data-driven model was set up in Google Earth Engine platform, which is publicly accessible and open for further adjustments. The training dataset was applied to different machine learning algorithms, i.e., Random Forest, Naïve Bayes, and Classification and Regression Tree. The findings show that Random Forest outperformed other algorithms and hence it was used further to explore the driving factors using variable importance analysis. The study indicates the probability of fire occurrences across Australia as well as identifies the potential driving factors of Australian wildfires for the 2019–2020 summer season. The methodical approach and achieved results and drawn conclusions can be of great importance to policymakers, environmentalists, and climate change researchers, among others.

Download Full-text

Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences

Bioinformatics ◽

10.1093/bioinformatics/btv674 ◽

2015 ◽

Vol 32 (6) ◽

pp. 821-827 ◽

Cited By ~ 19

Author(s):

Enrique Audain ◽

Yassel Ramos ◽

Henning Hermjakob ◽

Darren R. Flower ◽

Yasset Perez-Riverol

Keyword(s):

Machine Learning ◽

Isoelectric Point ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Basis Set ◽

Superior Performance ◽

Supplementary Information ◽

Training Dataset ◽

Accurate Estimation ◽

Prediction Methods

Abstract Motivation: In any macromolecular polyprotic system—for example protein, DNA or RNA—the isoelectric point—commonly referred to as the pI—can be defined as the point of singularity in a titration curve, corresponding to the solution pH value at which the net overall surface charge—and thus the electrophoretic mobility—of the ampholyte sums to zero. Different modern analytical biochemistry and proteomics methods depend on the isoelectric point as a principal feature for protein and peptide characterization. Protein separation by isoelectric point is a critical part of 2-D gel electrophoresis, a key precursor of proteomics, where discrete spots can be digested in-gel, and proteins subsequently identified by analytical mass spectrometry. Peptide fractionation according to their pI is also widely used in current proteomics sample preparation procedures previous to the LC-MS/MS analysis. Therefore accurate theoretical prediction of pI would expedite such analysis. While such pI calculation is widely used, it remains largely untested, motivating our efforts to benchmark pI prediction methods. Results: Using data from the database PIP-DB and one publically available dataset as our reference gold standard, we have undertaken the benchmarking of pI calculation methods. We find that methods vary in their accuracy and are highly sensitive to the choice of basis set. The machine-learning algorithms, especially the SVM-based algorithm, showed a superior performance when studying peptide mixtures. In general, learning-based pI prediction methods (such as Cofactor, SVM and Branca) require a large training dataset and their resulting performance will strongly depend of the quality of that data. In contrast with Iterative methods, machine-learning algorithms have the advantage of being able to add new features to improve the accuracy of prediction. Contact: [email protected] Availability and Implementation: The software and data are freely available at https://github.com/ypriverol/pIR. Supplementary information: Supplementary data are available at Bioinformatics online.

Download Full-text

Predicting hospitalization following psychiatric crisis care using machine learning

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01361-1 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Matthijs Blankers ◽

Louk F. M. van der Post ◽

Jack J. M. Dekker

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Prediction Models ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Ensemble Model ◽

K Nearest Neighbors ◽

Crisis Care

Abstract Background Accurate prediction models for whether patients on the verge of a psychiatric criseis need hospitalization are lacking and machine learning methods may help improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate the accuracy of ten machine learning algorithms, including the generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact. We also evaluate an ensemble model to optimize the accuracy and we explore individual predictors of hospitalization. Methods Data from 2084 patients included in the longitudinal Amsterdam Study of Acute Psychiatry with at least one reported psychiatric crisis care contact were included. Target variable for the prediction models was whether the patient was hospitalized in the 12 months following inclusion. The predictive power of 39 variables related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts was evaluated. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared and we also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis and the five best performing algorithms were combined in an ensemble model using stacking. Results All models performed above chance level. We found Gradient Boosting to be the best performing algorithm (AUC = 0.774) and K-Nearest Neighbors to be the least performing (AUC = 0.702). The performance of GLM/logistic regression (AUC = 0.76) was slightly above average among the tested algorithms. In a Net Reclassification Improvement analysis Gradient Boosting outperformed GLM/logistic regression by 2.9% and K-Nearest Neighbors by 11.3%. GLM/logistic regression outperformed K-Nearest Neighbors by 8.7%. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was in most cases modest. The results show that a predictive accuracy similar to the best performing model can be achieved when combining multiple algorithms in an ensemble model.

Download Full-text

Detection of misinformation on garlic and COVID-19 in Twitter: A machine learning-based approach (Preprint)

10.2196/preprints.33056 ◽

2021 ◽

Author(s):

Myeong Gyu Kim ◽

Jae Hyun Kim ◽

Kyungim Kim

Keyword(s):

Machine Learning ◽

Social Media ◽

Latent Dirichlet Allocation ◽

Predictive Performance ◽

Machine Learning Algorithms ◽

Training Dataset ◽

Polynomial Kernel ◽

Support Vector ◽

Accurate Information ◽

Probability Number

BACKGROUND Garlic-related misinformation is prevalent whenever a virus outbreak occurs. Again, with the outbreak of coronavirus disease 2019 (COVID-19), garlic-related misinformation is spreading through social media sites, including Twitter. Machine learning-based approaches can be used to detect misinformation from vast tweets. OBJECTIVE This study aimed to develop machine learning algorithms for detecting misinformation on garlic and COVID-19 in Twitter. METHODS This study used 5,929 original tweets mentioning garlic and COVID-19. Tweets were manually labeled as misinformation, accurate information, and others. We tested the following algorithms: k-nearest neighbors; random forest; support vector machine (SVM) with linear, radial, and polynomial kernels; and neural network. Features for machine learning included user-based features (verified account, user type, number of followers, and follower rate) and text-based features (uniform resource locator, negation, sentiment score, Latent Dirichlet Allocation topic probability, number of retweets, and number of favorites). A model with the highest accuracy in the training dataset (70% of overall dataset) was tested using a test dataset (30% of overall dataset). Predictive performance was measured using overall accuracy, sensitivity, specificity, and balanced accuracy. RESULTS SVM with the polynomial kernel model showed the highest accuracy of 0.670. The model also showed a balanced accuracy of 0.757, sensitivity of 0.819, and specificity of 0.696 for misinformation. Important features in the misinformation and accurate information classes included topic 4 (common myths), topic 13 (garlic-specific myths), number of followers, topic 11 (misinformation on social media), and follower rate. Topic 3 (cooking recipes) was the most important feature in the others class. CONCLUSIONS Our SVM model showed good performance in detecting misinformation. The results of our study will help detect misinformation related to garlic and COVID-19. It could also be applied to prevent misinformation related to dietary supplements in the event of a future outbreak of a disease other than COVID-19.

Download Full-text

Using Supervised Machine Learning Algorithms for Automated Lithology Prediction from Wireline Log Data

10.2118/208559-ms ◽

2021 ◽

Author(s):

Marian Popescu ◽

Rebecca Head ◽

Tim Ferriday ◽

Kate Evans ◽

Jose Montero ◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Training Dataset ◽

Depth Interval ◽

Log Data ◽

Machine Learning Approach ◽

Lithology Prediction ◽

Logging While Drilling

Abstract This paper presents advancements in machine learning and cloud deployment that enable rapid and accurate automated lithology interpretation. A supervised machine learning technique is described that enables rapid, consistent, and accurate lithology prediction alongside quantitative uncertainty from large wireline or logging-while-drilling (LWD) datasets. To leverage supervised machine learning, a team of geoscientists and petrophysicists made detailed lithology interpretations of wells to generate a comprehensive training dataset. Lithology interpretations were based on applying determinist cross-plotting by utilizing and combining various raw logs. This training dataset was used to develop a model and test a machine learning pipeline. The pipeline was applied to a dataset previously unseen by the algorithm, to predict lithology. A quality checking process was performed by a petrophysicist to validate new predictions delivered by the pipeline against human interpretations. Confidence in the interpretations was assessed in two ways. The prior probability was calculated, a measure of confidence in the input data being recognized by the model. Posterior probability was calculated, which quantifies the likelihood that a specified depth interval comprises a given lithology. The supervised machine learning algorithm ensured that the wells were interpreted consistently by removing interpreter biases and inconsistencies. The scalability of cloud computing enabled a large log dataset to be interpreted rapidly; >100 wells were interpreted consistently in five minutes, yielding >70% lithological match to the human petrophysical interpretation. Supervised machine learning methods have strong potential for classifying lithology from log data because: 1) they can automatically define complex, non-parametric, multi-variate relationships across several input logs; and 2) they allow classifications to be quantified confidently. Furthermore, this approach captured the knowledge and nuances of an interpreter's decisions by training the algorithm using human-interpreted labels. In the hydrocarbon industry, the quantity of generated data is predicted to increase by >300% between 2018 and 2023 (IDC, Worldwide Global DataSphere Forecast, 2019–2023). Additionally, the industry holds vast legacy data. This supervised machine learning approach can unlock the potential of some of these datasets by providing consistent lithology interpretations rapidly, allowing resources to be used more effectively.

Download Full-text

An Ensemble Model for Predicting Chronic Diseases Using Machine Learning Algorithms

Smart Computing Techniques and Applications - Smart Innovation, Systems and Technologies ◽

10.1007/978-981-16-1502-3_34 ◽

2021 ◽

pp. 337-345

Author(s):

B. Manjulatha ◽

Suresh Pabboju

Keyword(s):

Machine Learning ◽

Chronic Diseases ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Ensemble Model

Download Full-text

Machine Learning Algorithms to Predict Recurrence within 10 Years after Breast Cancer Surgery: A Prospective Cohort Study

Cancers ◽

10.3390/cancers12123817 ◽

2020 ◽

Vol 12 (12) ◽

pp. 3817

Author(s):

Shi-Jer Lou ◽

Ming-Feng Hou ◽

Hong-Tai Chang ◽

Chong-Chi Chiu ◽

Hao-Hsien Lee ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Learning Algorithms ◽

External Validation ◽

Model Development ◽

Cancer Surgery ◽

Machine Learning Algorithms ◽

Breast Cancer Surgery ◽

Training Dataset

No studies have discussed machine learning algorithms to predict recurrence within 10 years after breast cancer surgery. This study purposed to compare the accuracy of forecasting models to predict recurrence within 10 years after breast cancer surgery and to identify significant predictors of recurrence. Registry data for breast cancer surgery patients were allocated to a training dataset (n = 798) for model development, a testing dataset (n = 171) for internal validation, and a validating dataset (n = 171) for external validation. Global sensitivity analysis was then performed to evaluate the significance of the selected predictors. Demographic characteristics, clinical characteristics, quality of care, and preoperative quality of life were significantly associated with recurrence within 10 years after breast cancer surgery (p < 0.05). Artificial neural networks had the highest prediction performance indices. Additionally, the surgeon volume was the best predictor of recurrence within 10 years after breast cancer surgery, followed by hospital volume and tumor stage. Accurate recurrence within 10 years prediction by machine learning algorithms may improve precision in managing patients after breast cancer surgery and improve understanding of risk factors for recurrence within 10 years after breast cancer surgery.

Download Full-text