The classification of neurodegenerative disease from acoustic speech data

Abstract Neurodegenerative diseases often affect speech. Speech acoustics can be used as objective clinical markers of pathology. Previous investigations of pathological speech have primarily compared controls with one specific condition and excluded comorbidities. We broaden the utility of speech markers by examining how multiple acoustic features can delineate diseases. We used supervised machine learning with gradient boosting (CatBoost) to differentiate healthy speech and speech from people with multiple sclerosis or Friedreich ataxia. Participants performed a diadochokinetic task where they repeated alternating syllables. We extracted 74 spectral and temporal prosodic features from the speech recordings, which were subjected to machine learning. Results showed that Friedreich ataxia, multiple sclerosis and healthy controls were all identified with high accuracy (over 82%). Twenty-one acoustic features were strong markers of neurodegenerative diseases, falling under the categories of spectral qualia, spectral power, and speech rate. We demonstrated that speech markers can delineate neurodegenerative diseases and distinguish healthy speech from pathological speech with high accuracy. Findings emphasize the importance of examining speech outcomes when assessing indicators of neurodegenerative disease. We propose large-scale initiatives to broaden the scope for differentiating other neurological diseases and affective disorders.

Download Full-text

How to Predict the Long-term Course of Neurodegenerative Diseases?

Jornada de Jóvenes Investigadores del I3A ◽

10.26754/jjii3a.4886 ◽

2020 ◽

Vol 8 ◽

Author(s):

Alberto Montolío Marco ◽

José Cegoñino Banzo ◽

Elena García Martín ◽

Amaya Pérez del Palomar Aldea

Keyword(s):

Machine Learning ◽

Multiple Sclerosis ◽

Neurodegenerative Diseases ◽

Neurodegenerative Disease ◽

Specific Treatment ◽

Machine Learning Techniques ◽

Learning Techniques

The aim of this work is to predict the disability state in neurodegenerative disease, such as multiple sclerosis (MS), using clinical batabases and machine learning techniques. This prediction could help clinicians select a more specific treatment for MS patients.

Download Full-text

A Comparative Study of Supervised Machine Learning Algorithms for the Prediction of Long-Range Chromatin Interactions

Genes ◽

10.3390/genes11090985 ◽

2020 ◽

Vol 11 (9) ◽

pp. 985 ◽

Cited By ~ 2

Author(s):

Thomas Vanhaeren ◽

Federico Divina ◽

Miguel García-Torres ◽

Francisco Gómez-Vela ◽

Wim Vanhoof ◽

...

Keyword(s):

Machine Learning ◽

Transcription Factors ◽

Long Range ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

The Other ◽

Supervised Machine Learning ◽

Chromatin Interaction ◽

Gradient Boosting ◽

Chromatin Interactions

The role of three-dimensional genome organization as a critical regulator of gene expression has become increasingly clear over the last decade. Most of our understanding of this association comes from the study of long range chromatin interaction maps provided by Chromatin Conformation Capture-based techniques, which have greatly improved in recent years. Since these procedures are experimentally laborious and expensive, in silico prediction has emerged as an alternative strategy to generate virtual maps in cell types and conditions for which experimental data of chromatin interactions is not available. Several methods have been based on predictive models trained on one-dimensional (1D) sequencing features, yielding promising results. However, different approaches vary both in the way they model chromatin interactions and in the machine learning-based strategy they rely on, making it challenging to carry out performance comparison of existing methods. In this study, we use publicly available 1D sequencing signals to model cohesin-mediated chromatin interactions in two human cell lines and evaluate the prediction performance of six popular machine learning algorithms: decision trees, random forests, gradient boosting, support vector machines, multi-layer perceptron and deep learning. Our approach accurately predicts long-range interactions and reveals that gradient boosting significantly outperforms the other five methods, yielding accuracies of about 95%. We show that chromatin features in close genomic proximity to the anchors cover most of the predictive information, as has been previously reported. Moreover, we demonstrate that gradient boosting models trained with different subsets of chromatin features, unlike the other methods tested, are able to produce accurate predictions. In this regard, and besides architectural proteins, transcription factors are shown to be highly informative. Our study provides a framework for the systematic prediction of long-range chromatin interactions, identifies gradient boosting as the best suited algorithm for this task and highlights cell-type specific binding of transcription factors at the anchors as important determinants of chromatin wiring mediated by cohesin.

Download Full-text

Supervised Machine-learning Predictive Analytics for Prediction of Postinduction Hypotension

Anesthesiology ◽

10.1097/aln.0000000000002374 ◽

2018 ◽

Vol 129 (4) ◽

pp. 675-688 ◽

Cited By ~ 45

Author(s):

Samir Kendale ◽

Prathamesh Kulkarni ◽

Andrew D. Rosenberg ◽

Jing Wang

Keyword(s):

Machine Learning ◽

Receiver Operating Characteristic Curve ◽

Operating Characteristic ◽

Predictive Analytics ◽

Characteristic Curve ◽

Supervised Machine Learning ◽

Gradient Boosting ◽

Machine Learning Methods ◽

Gradient Boosting Machine ◽

Operating Characteristic Curve

AbstractEditor’s PerspectiveWhat We Already Know about This TopicWhat This Article Tells Us That Is NewBackgroundHypotension is a risk factor for adverse perioperative outcomes. Machine-learning methods allow large amounts of data for development of robust predictive analytics. The authors hypothesized that machine-learning methods can provide prediction for the risk of postinduction hypotension.MethodsData was extracted from the electronic health record of a single quaternary care center from November 2015 to May 2016 for patients over age 12 that underwent general anesthesia, without procedure exclusions. Multiple supervised machine-learning classification techniques were attempted, with postinduction hypotension (mean arterial pressure less than 55 mmHg within 10 min of induction by any measurement) as primary outcome, and preoperative medications, medical comorbidities, induction medications, and intraoperative vital signs as features. Discrimination was assessed using cross-validated area under the receiver operating characteristic curve. The best performing model was tuned and final performance assessed using split-set validation.ResultsOut of 13,323 cases, 1,185 (8.9%) experienced postinduction hypotension. Area under the receiver operating characteristic curve using logistic regression was 0.71 (95% CI, 0.70 to 0.72), support vector machines was 0.63 (95% CI, 0.58 to 0.60), naive Bayes was 0.69 (95% CI, 0.67 to 0.69), k-nearest neighbor was 0.64 (95% CI, 0.63 to 0.65), linear discriminant analysis was 0.72 (95% CI, 0.71 to 0.73), random forest was 0.74 (95% CI, 0.73 to 0.75), neural nets 0.71 (95% CI, 0.69 to 0.71), and gradient boosting machine 0.76 (95% CI, 0.75 to 0.77). Test set area for the gradient boosting machine was 0.74 (95% CI, 0.72 to 0.77).ConclusionsThe success of this technique in predicting postinduction hypotension demonstrates feasibility of machine-learning models for predictive analytics in the field of anesthesiology, with performance dependent on model selection and appropriate tuning.

Download Full-text

Effective Parameter Optimization & Classification using Bat-Inspired Algorithm with Improving NSSA

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1498.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 3343-3349

Keyword(s):

Machine Learning ◽

Optimal Parameter ◽

Personal Information ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Gradient Boosting ◽

Security Measures ◽

End User ◽

Effective Strategies ◽

Made In

Network Security is an important aspectin communication-related activities. In recent times, the advent of more sophisticated technologies changed the way the information is being sharedwith everyone in any part of the world.Concurrently, these advancements are mishandled to compromise the end-user devices intentionally to steal their personal information. The number of attacks made on targeted devices is increasing over time. Even though the security mechanisms used to defend the network is enhanced and kept updated periodically, new advanced methods are developed by the intruders to penetrate the system. In order to avoid these discrepancies, effective strategies must be applied to enhance the security measures in the network. In this paper, a machine learning-based approach is proposed to identify the pattern of different categories of attacks made in the past. KDD cup 1999 dataset is accessed to develop this predictive model. Bat optimization algorithm identifies the optimal parameter subset. Supervised machine learning algorithms were employed to train the model from the data to make predictions. The performance of the system is evaluated through evaluation metrics like accuracy, precision and so on. Four classification algorithms were used out of which, gradient boosting model outperformed the benchmarked algorithms and proved its importance on data classification based on the accuracy obtained from this model.

Download Full-text

Automated Detection of Multiple Sclerosis Lesions Using Texture-based Features and a Hybrid Classifier

Caspian Journal of Neurological Sciences ◽

10.32598/cjns.6.20.220.1 ◽

2020 ◽

Vol 6 (1) ◽

pp. 16-30

Author(s):

Somayeh Raiesdana ◽

Keyword(s):

Machine Learning ◽

Multiple Sclerosis ◽

Fuzzy Inference ◽

Brain Mri ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Inference System ◽

Common Technique ◽

Magnetic Resonance Imaging Mri

Background: Multiple Sclerosis (MS) is the most frequent non-traumatic neurological disease capable of causing disability in young adults. Detection of MS lesions with magnetic resonance imaging (MRI) is the most common technique. However, manual interpretation of vast amounts of data is often tedious and error-prone. Furthermore, changes in lesions are often subtle and extremely unrepresentative. Objectives: To develop an automated non-subjective method for the detection and quantification of MS lesions. Materials & Methods: This paper focuses on the automatic detection and classification of MS lesions in brain MRI images. Two datasets, one simulated and the other one recorded in hospital, are utilized in this work. A novel hybrid algorithm combining image processing and machine learning techniques is implemented. To this end, first, intricate morphological patterns are extracted from MRI images via texture analysis. Then, statistical textures-based features are extracted. Afterward, two supervised machine learning algorithms, i.e., the Hidden Markov Model (HMM) and Adaptive Neuro-Fuzzy Inference System (ANFIS) are employed within a hybrid platform. The hybrid system makes decisions based on ensemble learning. The stacking technique is used to apply predictions from both models o train a perceptron as a decisive model. Results: Experimental results on both datasets indicate that the proposed hybrid method outperforms HMM and ANFIS classifiers with reducing false positives. Furthermore, the performance of the proposed method compared with the state-of-the-art methods, was approved. Conclusion: Remarkable results of the proposed method motivate advanced detection systems employing other MRI sequences and their combination.

Download Full-text

Fine-grained classification of social science journal articles using textual data: A comparison of supervised machine learning approaches

Quantitative Science Studies ◽

10.1162/qss_a_00106 ◽

2020 ◽

pp. 1-26

Author(s):

Joshua Eykens ◽

Raf Guns ◽

Tim C.E. Engels

Keyword(s):

Social Sciences ◽

Machine Learning ◽

Social Science ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Gradient Boosting ◽

Fine Grained ◽

Textual Data

We compare two supervised machine learning algorithms—Multinomial Naïve Bayes and Gradient Boosting—to classify social science articles using textual data. The high level of granularity of the classification scheme used and the possibility that multiple categories are assigned to a document make this task challenging. To collect the training data, we query three discipline specific thesauri to retrieve articles corresponding to specialties in the classification. The resulting dataset consists of 113,909 records and covers 245 specialties, aggregated into 31 subdisciplines from three disciplines. Experts were consulted to validate the thesauri-based classification. The resulting multi-label dataset is used to train the machine learning algorithms in different configurations. We deploy a multi-label classifier chaining model, allowing for an arbitrary number of categories to be assigned to each document. The best results are obtained with Gradient Boosting. The approach does not rely on citation data. It can be applied in settings where such information is not available. We conclude that fine-grained text-based classification of social sciences publications at a subdisciplinary level is a hard task, for humans and machines alike. A combination of human expertise and machine learning is suggested as a way forward to improve the classification of social sciences documents.

Download Full-text

Supervised machine learning based liver disease prediction approach with LASSO feature selection

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v10i6.3242 ◽

2021 ◽

Vol 10 (6) ◽

pp. 3369-3376

Author(s):

Saima Afrin ◽

F. M. Javed Mehedi Shamrat ◽

Tafsirul Islam Nibir ◽

Mst. Fahmida Muntasim ◽

Md. Shakil Moharram ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Liver Disease ◽

Decision Tree ◽

Medical Science ◽

Supervised Machine Learning ◽

Gradient Boosting ◽

Support Vector ◽

Machine Learning Classification ◽

Prediction Approach

In this contemporary era, the uses of machine learning techniques are increasing rapidly in the field of medical science for detecting various diseases such as liver disease (LD). Around the globe, a large number of people die because of this deadly disease. By diagnosing the disease in a primary stage, early treatment can be helpful to cure the patient. In this research paper, a method is proposed to diagnose the LD using supervised machine learning classification algorithms, namely logistic regression, decision tree, random forest, AdaBoost, KNN, linear discriminant analysis, gradient boosting and support vector machine (SVM). We also deployed a least absolute shrinkage and selection operator (LASSO) feature selection technique on our taken dataset to suggest the most highly correlated attributes of LD. The predictions with 10 fold cross-validation (CV) made by the algorithms are tested in terms of accuracy, sensitivity, precision and f1-score values to forecast the disease. It is observed that the decision tree algorithm has the best performance score where accuracy, precision, sensitivity and f1-score values are 94.295%, 92%, 99% and 96% respectively with the inclusion of LASSO. Furthermore, a comparison with recent studies is shown to prove the significance of the proposed system.

Download Full-text

Machine learning methods to predict mechanical ventilation and mortality in patients with COVID-19

PLoS ONE ◽

10.1371/journal.pone.0249285 ◽

2021 ◽

Vol 16 (4) ◽

pp. e0249285

Author(s):

Limin Yu ◽

Alexandra Halalau ◽

Bhavinkumar Dalal ◽

Amr E. Abbas ◽

Felicia Ivascu ◽

...

Keyword(s):

Machine Learning ◽

Mechanical Ventilation ◽

Hospital Mortality ◽

Emergency Room ◽

Vital Signs ◽

High Accuracy ◽

Gradient Boosting ◽

Learning Models ◽

Increased Risk ◽

Machine Learning Models

Background The Coronavirus disease 2019 (COVID-19) pandemic has affected millions of people across the globe. It is associated with a high mortality rate and has created a global crisis by straining medical resources worldwide. Objectives To develop and validate machine-learning models for prediction of mechanical ventilation (MV) for patients presenting to emergency room and for prediction of in-hospital mortality once a patient is admitted. Methods Two cohorts were used for the two different aims. 1980 COVID-19 patients were enrolled for the aim of prediction ofMV. 1036 patients’ data, including demographics, past smoking and drinking history, past medical history and vital signs at emergency room (ER), laboratory values, and treatments were collected for training and 674 patients were enrolled for validation using XGBoost algorithm. For the second aim to predict in-hospital mortality, 3491 hospitalized patients via ER were enrolled. CatBoost, a new gradient-boosting algorithm was applied for training and validation of the cohort. Results Older age, higher temperature, increased respiratory rate (RR) and a lower oxygen saturation (SpO2) from the first set of vital signs were associated with an increased risk of MV amongst the 1980 patients in the ER. The model had a high accuracy of 86.2% and a negative predictive value (NPV) of 87.8%. While, patients who required MV, had a higher RR, Body mass index (BMI) and longer length of stay in the hospital were the major features associated with in-hospital mortality. The second model had a high accuracy of 80% with NPV of 81.6%. Conclusion Machine learning models using XGBoost and catBoost algorithms can predict need for mechanical ventilation and mortality with a very high accuracy in COVID-19 patients.

Download Full-text

Identification of Core Suppliers Based on E-Invoice Data Using Supervised Machine Learning

Journal of Risk and Financial Management ◽

10.3390/jrfm11040070 ◽

2018 ◽

Vol 11 (4) ◽

pp. 70 ◽

Cited By ~ 1

Author(s):

Jung-sik Hong ◽

Hyeongyu Yeo ◽

Nam-Wook Cho ◽

Taeuk Ahn

Keyword(s):

Machine Learning ◽

Random Forests ◽

Area Under The Curve ◽

High Accuracy ◽

Supervised Machine Learning ◽

Machine Learning Method ◽

Learning Method ◽

Machine Learning Technique ◽

Novel Approach ◽

Learning Technique

Since not all suppliers are to be managed in the same way, a purchasing strategy requires proper supplier segmentation so that the most suitable strategies can be used for different segments. Most existing methods for supplier segmentation, however, either depend on subjective judgements or require significant efforts. To overcome the limitations, this paper proposes a novel approach for supplier segmentation. The objective of this paper is to develop an automated and effective way to identify core suppliers, whose profit impact on a buyer is significant. To achieve this objective, the application of a supervised machine learning technique, Random Forests (RF), to e-invoice data is proposed. To validate the effectiveness, the proposed method has been applied to real e-invoice data obtained from an automobile parts manufacturer. Results of high accuracy and the area under the curve (AUC) attest to the applicability of our approach. Our method is envisioned to be of value for automating the identification of core suppliers. The main benefits of the proposed approach include the enhanced efficiency of supplier segmentation procedures. Besides, by utilizing a machine learning method to e-invoice data, our method results in more reliable segmentation in terms of selecting and weighting variables.

Download Full-text

Geospatial Analysis of Environmental Atmospheric Risk Factors in Neurodegenerative Diseases: A Systematic Review

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17228414 ◽

2020 ◽

Vol 17 (22) ◽

pp. 8414

Author(s):

Mariana Oliveira ◽

André Padrão ◽

André Ramalho ◽

Mariana Lobo ◽

Ana Cláudia Teodoro ◽

...

Keyword(s):

Risk Factors ◽

Multiple Sclerosis ◽

Systematic Review ◽

Neurodegenerative Diseases ◽

Nitrogen Dioxide ◽

Neurodegenerative Disease ◽

Environmental Influence ◽

Sun Exposure ◽

Geospatial Analysis ◽

Atmospheric Factors

Despite the vast evidence on the environmental influence in neurodegenerative diseases, those considering a geospatial approach are scarce. We conducted a systematic review to identify studies concerning environmental atmospheric risk factors for neurodegenerative diseases that have used geospatial analysis/tools. PubMed, Web of Science, and Scopus were searched for all scientific studies that included a neurodegenerative disease, an environmental atmospheric factor, and a geographical analysis. Of the 34 included papers, approximately 60% were related to multiple sclerosis (MS), hence being the most studied neurodegenerative disease in the context of this study. Sun exposure (n = 13) followed by the most common exhaustion gases (n = 10 for nitrogen dioxide (NO2) and n = 5 for carbon monoxide (CO)) were the most studied atmospheric factors. Only one study used a geospatial interpolation model, although 13 studies used remote sensing data to compute atmospheric factors. In 20% of papers, we found an inverse correlation between sun exposure and multiple sclerosis. No consensus was reached in the analysis of nitrogen dioxide and Parkinson’s disease, but it was related to dementia and amyotrophic lateral sclerosis. This systematic review (number CRD42020196188 in PROSPERO’s database) provides an insight into the available evidence regarding the geospatial influence of environmental factors on neurodegenerative diseases.

Download Full-text