scholarly journals Identification and prediction of ALS subgroups using machine learning

Author(s):  
Faraz Faghri ◽  
Fabian Brunn ◽  
Anant Dadu ◽  
Elisabetta Zucchi ◽  
Ilaria Martinelli ◽  
...  

Background The disease entity known as amyotrophic lateral sclerosis (ALS) is now known to represent a collection of overlapping syndromes. A better understanding of this heterogeneity and the ability to distinguish ALS subtypes would improve the clinical care of patients and enhance our understanding of the disease. Subtype profiles could be incorporated into the clinical trial design to improve our ability to detect a therapeutic effect. A variety of classification systems have been proposed over the years based on empirical observations, but it is unclear to what extent they genuinely reflect ALS population substructure. Methods We applied machine learning algorithms to a prospective, population-based cohort consisting of 2,858 Italian patients diagnosed with ALS for whom detailed clinical phenotype data were available. We replicated our findings in an independent population-based cohort of 1,097 Italian ALS patients. Findings We found that semi-supervised machine learning based on UMAP applied to the output of a multi-layered perceptron neural network produced the optimum clustering of the ALS patients in the discovery cohort. These clusters roughly corresponded to the six clinical subtypes defined by the Chiò classification system (bulbar ALS, respiratory ALS, flail arm ALS, classical ALS, pyramidal ALS, and flail leg ALS). The same clusters were identified in the replication cohort. A supervised learning approach based on ensemble learning identified twelve clinical parameters that predicted ALS clinical subtype with high accuracy (area under the curve = 0.94). Interpretation Our data-driven study provides insight into the ALS population's substructure and demonstrates that the Chiò classification system robustly identifies ALS subtypes. We provide an interactive website (https://share.streamlit.io/anant-dadu/machinelearningforals/main) so that clinical researchers can predict the clinical subtype of an ALS patient based on a small number of clinical parameters. Funding National Institute on Aging and the Italian Ministry of Health.

2016 ◽  
Vol 55 (3) ◽  
pp. 1055-1067 ◽  
Author(s):  
Timo Pekkala ◽  
Anette Hall ◽  
Jyrki Lötjönen ◽  
Jussi Mattila ◽  
Hilkka Soininen ◽  
...  

2020 ◽  
Author(s):  
Han-Saem Kim ◽  
Chang-Guk Sun ◽  
Hyung-Ik Cho ◽  
Moon-Gyo Lee

<p>Earthquake-induced land deformation and structure failure are more severe over soft soils than over firm soils and rocks owing to the seismic site effect and liquefaction. The site-specific seismic site effect related to the amplification of ground motion has spatial uncertainty depend on the local subsurface, surface geological, and topographic conditions. When the 2017 Pohang earthquake (M 5.4), South Korea’s second-strongest earthquake in decades, occurred, the severe damages influencing by variable site effect indicators were observed focusing on the basin or basin-edge region deposited unconsolidated Quaternary sediments. Thus, the site characterization is essential considering empirical correlations with geotechnical site response parameters and surface proxies. Furthermore, in the case of so many variables and tenuously related correlations, machine learning classification models can prove to be very precise than the parametric methods. In this study, the multivariate seismic site classification system was established using the machine learning technique based on the geospatial big data platform.</p><p>The supervised machine learning classification techniques and more specifically, random forest, support vector machine (SVM), and artificial neural network (ANN) algorithms have been adopted. Supervised machine learning algorithms analyze a set of labeled training data consisting of a set of input data and desired output values, and produce an inferred function which can be used for predictions from given input data. To optimize the classification criteria by considering the geotechnical uncertainty and local site effects, the training datasets applying principal component analysis (PCA) were verified with k-fold cross-validation. Moreover, the optimized training algorithm, proved by loss estimators (receiver operating characteristic curve (ROC), the area under the ROC curve (AUC)) based on the confusion matrix, was selected.</p><p>For the southeastern region in South Korea, the boring log information (strata, standard penetration test, etc.), geological map (1:50k scale), digital terrain model (having 5 m × 5 m), soil map (1:250k scale) were collected and constructed as geospatial big data. Preliminarily, to build spatially coincided datasets with geotechnical response parameters and surface proxies, the mesh-type geospatial information was built by the advanced geostatistical interpolation and simulation methods.</p><p>Site classification systems use seismic response parameters related to the geotechnical characteristics of the study area as the classification criteria. The current site classification systems in South Korea and the United States recommend Vs30, which is the average shear wave velocity (Vs) up to 30 m underground. This criterion uses only the dynamic characteristics of the site without considering its geometric distribution characteristics. Thus, the geospatial information for the input layer included the geo-layer thickness, surface proxies (elevation, slope, geological category, soil category), average Vs for soil layer (Vs,soil) and site period (TG). The Vs30-based site class was defined as categorical labeled data. Finally, the site class can be predicted using only proxies based on the optimized classification techniques.</p>


2019 ◽  
Vol 24 (1) ◽  
pp. 197-206
Author(s):  
Niko Murrell ◽  
Ryan Bradley ◽  
Nikhil Bajaj ◽  
Julie Gordon Whitney ◽  
George T.-C. Chiu

2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Ce Shi ◽  
Mengyi Wang ◽  
Tiantian Zhu ◽  
Ying Zhang ◽  
Yufeng Ye ◽  
...  

Abstract Purpose To develop an automated classification system using a machine learning classifier to distinguish clinically unaffected eyes in patients with keratoconus from a normal control population based on a combination of Scheimpflug camera images and ultra-high-resolution optical coherence tomography (UHR-OCT) imaging data. Methods A total of 121 eyes from 121 participants were classified by 2 cornea experts into 3 groups: normal (50 eyes), with keratoconus (38 eyes) or with subclinical keratoconus (33 eyes). All eyes were imaged with a Scheimpflug camera and UHR-OCT. Corneal morphological features were extracted from the imaging data. A neural network was used to train a model based on these features to distinguish the eyes with subclinical keratoconus from normal eyes. Fisher’s score was used to rank the differentiable power of each feature. The receiver operating characteristic (ROC) curves were calculated to obtain the area under the ROC curves (AUCs). Results The developed classification model used to combine all features from the Scheimpflug camera and UHR-OCT dramatically improved the differentiable power to discriminate between normal eyes and eyes with subclinical keratoconus (AUC = 0.93). The variation in the thickness profile within each individual in the corneal epithelium extracted from UHR-OCT imaging ranked the highest in differentiating eyes with subclinical keratoconus from normal eyes. Conclusion The automated classification system using machine learning based on the combination of Scheimpflug camera data and UHR-OCT imaging data showed excellent performance in discriminating eyes with subclinical keratoconus from normal eyes. The epithelial features extracted from the OCT images were the most valuable in the discrimination process. This classification system has the potential to improve the differentiable power of subclinical keratoconus and the efficiency of keratoconus screening.


2021 ◽  
Vol 8 (1) ◽  
pp. 64-76
Author(s):  
Cosimo Ieracitano ◽  
Annunziata Paviglianiti ◽  
Maurizio Campolo ◽  
Amir Hussain ◽  
Eros Pasero ◽  
...  

BMJ Open ◽  
2019 ◽  
Vol 9 (8) ◽  
pp. e028015 ◽  
Author(s):  
Mathias Carl Blom ◽  
Awais Ashfaq ◽  
Anita Sant'Anna ◽  
Philip D Anderson ◽  
Markus Lingman

ObjectivesThe aim of this work was to train machine learning models to identify patients at end of life with clinically meaningful diagnostic accuracy, using 30-day mortality in patients discharged from the emergency department (ED) as a proxy.DesignRetrospective, population-based registry study.SettingSwedish health services.Primary and secondary outcome measuresAll cause 30-day mortality.MethodsElectronic health records (EHRs) and administrative data were used to train six supervised machine learning models to predict all-cause mortality within 30 days in patients discharged from EDs in southern Sweden, Europe.ParticipantsThe models were trained using 65 776 ED visits and validated on 55 164 visits from a separate ED to which the models were not exposed during training.ResultsThe outcome occurred in 136 visits (0.21%) in the development set and in 83 visits (0.15%) in the validation set. The model with highest discrimination attained ROC–AUC 0.95 (95% CI 0.93 to 0.96), with sensitivity 0.87 (95% CI 0.80 to 0.93) and specificity 0.86 (0.86 to 0.86) on the validation set.ConclusionsMultiple models displayed excellent discrimination on the validation set and outperformed available indexes for short-term mortality prediction interms of ROC–AUC (by indirect comparison). The practical utility of the models increases as the data they were trained on did not require costly de novo collection but were real-world data generated as a by-product of routine care delivery.


2020 ◽  
Vol 14 (2) ◽  
pp. 140-159
Author(s):  
Anthony-Paul Cooper ◽  
Emmanuel Awuni Kolog ◽  
Erkki Sutinen

This article builds on previous research around the exploration of the content of church-related tweets. It does so by exploring whether the qualitative thematic coding of such tweets can, in part, be automated by the use of machine learning. It compares three supervised machine learning algorithms to understand how useful each algorithm is at a classification task, based on a dataset of human-coded church-related tweets. The study finds that one such algorithm, Naïve-Bayes, performs better than the other algorithms considered, returning Precision, Recall and F-measure values which each exceed an acceptable threshold of 70%. This has far-reaching consequences at a time where the high volume of social media data, in this case, Twitter data, means that the resource-intensity of manual coding approaches can act as a barrier to understanding how the online community interacts with, and talks about, church. The findings presented in this article offer a way forward for scholars of digital theology to better understand the content of online church discourse.


Sign in / Sign up

Export Citation Format

Share Document