Demonstration of the potential of white-box machine learning approaches to gain insights from cardiovascular disease electrocardiograms

We present the results from a white-box machine learning approach to detect cardiac arrhythmias using electrocardiographic data. A C5.0 is trained to recognize four classes using common features. The four classes are (i) atrial fibrillation and atrial flutter, (ii) tachycardias (iii), sinus bradycardia and (iv) sinus rhythm. Data from 10,646 subjects, 83% of whom have at least one arrhythmia and 17% of whom exhibit a normal sinus rhythm, are used. The C5.0 is trained using 10-fold cross-validation and is able to achieve a balanced accuracy of 95.35%. By using the white-box machine learning approach, a clear and comprehensible tree structure can be revealed, which has selected the 5 most important features from a total of 24 features. These 5 features are ventricular rate, RR-Interval variation, atrial rate, age and difference between longest and shortest RR-Interval. The combination of ventricular rate, RR-Interval variation and atrial rate is especially relevant to achieve classification accuracy, which can be disclosed through the tree. The tree assigns unique values to distinguish the classes. These findings could be applied in medicine in the future. It can be shown that a white-box machine learning approach can reveal granular structures, thus confirming known linear relationships and also revealing nonlinear relationships. To highlight the strength of the C5.0 with respect to this structural revelation, the results of further white-box machine learning and black-box machine learning algorithms are presented.

Download Full-text

A Machine Learning Approach to Study Glycosidase Activities from Bifidobacterium

Microorganisms ◽

10.3390/microorganisms9051034 ◽

2021 ◽

Vol 9 (5) ◽

pp. 1034

Author(s):

Carlos Sabater ◽

Lorena Ruiz ◽

Abelardo Margolles

Keyword(s):

Machine Learning ◽

Supervised Classification ◽

Machine Learning Algorithms ◽

Learning Approach ◽

Human Milk Oligosaccharides ◽

Future Studies ◽

High Fiber ◽

Machine Learning Approach ◽

Prebiotic Oligosaccharides

This study aimed to recover metagenome-assembled genomes (MAGs) from human fecal samples to characterize the glycosidase profiles of Bifidobacterium species exposed to different prebiotic oligosaccharides (galacto-oligosaccharides, fructo-oligosaccharides and human milk oligosaccharides, HMOs) as well as high-fiber diets. A total of 1806 MAGs were recovered from 487 infant and adult metagenomes. Unsupervised and supervised classification of glycosidases codified in MAGs using machine-learning algorithms allowed establishing characteristic hydrolytic profiles for B. adolescentis, B. bifidum, B. breve, B. longum and B. pseudocatenulatum, yielding classification rates above 90%. Glycosidase families GH5 44, GH32, and GH110 were characteristic of B. bifidum. The presence or absence of GH1, GH2, GH5 and GH20 was characteristic of B. adolescentis, B. breve and B. pseudocatenulatum, while families GH1 and GH30 were relevant in MAGs from B. longum. These characteristic profiles allowed discriminating bifidobacteria regardless of prebiotic exposure. Correlation analysis of glycosidase activities suggests strong associations between glycosidase families comprising HMOs-degrading enzymes, which are often found in MAGs from the same species. Mathematical models here proposed may contribute to a better understanding of the carbohydrate metabolism of some common bifidobacteria species and could be extrapolated to other microorganisms of interest in future studies.

Download Full-text

Employing a Machine Learning Approach to Detect Combined Internet of Things Attacks against Two Objective Functions Using a Novel Dataset

Security and Communication Networks ◽

10.1155/2020/2804291 ◽

2020 ◽

Vol 2020 ◽

pp. 1-17

Author(s):

John Foley ◽

Naghmeh Moradpoor ◽

Henry Ochenyi

Keyword(s):

Machine Learning ◽

Low Power ◽

Vulnerability Analysis ◽

Machine Learning Algorithms ◽

Learning Approach ◽

Lossy Networks ◽

Global Networks ◽

Machine Learning Approach ◽

Iot Devices ◽

Network Metrics

One of the important features of routing protocol for low-power and lossy networks (RPLs) is objective function (OF). OF influences an IoT network in terms of routing strategies and network topology. On the contrary, detecting a combination of attacks against OFs is a cutting-edge technology that will become a necessity as next generation low-power wireless networks continue to be exploited as they grow rapidly. However, current literature lacks study on vulnerability analysis of OFs particularly in terms of combined attacks. Furthermore, machine learning is a promising solution for the global networks of IoT devices in terms of analysing their ever-growing generated data and predicting cyberattacks against such devices. Therefore, in this paper, we study the vulnerability analysis of two popular OFs of RPL to detect combined attacks against them using machine learning algorithms through different simulated scenarios. For this, we created a novel IoT dataset based on power and network metrics, which is deployed as part of an RPL IDS/IPS solution to enhance information security. Addressing the captured results, our machine learning approach is successful in detecting combined attacks against two popular OFs of RPL based on the power and network metrics in which MLP and RF algorithms are the most successful classifier deployment for single and ensemble models.

Download Full-text

Phenotyping Cardiogenic Shock

Journal of the American Heart Association ◽

10.1161/jaha.120.020085 ◽

2021 ◽

Author(s):

Elric Zweck ◽

Katherine L. Thayer ◽

Ole K. L. Helgestad ◽

Manreet Kanwar ◽

Mohyee Ayouty ◽

...

Keyword(s):

Machine Learning ◽

Hospital Mortality ◽

Cardiogenic Shock ◽

Treatment Strategies ◽

Machine Learning Algorithms ◽

Learning Approach ◽

Tailored Treatment ◽

Machine Learning Approach ◽

Clinical Profiles ◽

Patient Enrollment

Background Cardiogenic shock (CS) is a heterogeneous syndrome with varied presentations and outcomes. We used a machine learning approach to test the hypothesis that patients with CS have distinct phenotypes at presentation, which are associated with unique clinical profiles and in‐hospital mortality. Methods and Results We analyzed data from 1959 patients with CS from 2 international cohorts: CSWG (Cardiogenic Shock Working Group Registry) (myocardial infarction [CSWG‐MI; n=410] and acute‐on‐chronic heart failure [CSWG‐HF; n=480]) and the DRR (Danish Retroshock MI Registry) (n=1069). Clusters of patients with CS were identified in CSWG‐MI using the consensus k means algorithm and subsequently validated in CSWG‐HF and DRR. Patients in each phenotype were further categorized by their Society of Cardiovascular Angiography and Interventions staging. The machine learning algorithms revealed 3 distinct clusters in CS: "non‐congested (I)", "cardiorenal (II)," and "cardiometabolic (III)" shock. Among the 3 cohorts (CSWG‐MI versus DDR versus CSWG‐HF), in‐hospital mortality was 21% versus 28% versus 10%, 45% versus 40% versus 32%, and 55% versus 56% versus 52% for clusters I, II, and III, respectively. The "cardiometabolic shock" cluster had the highest risk of developing stage D or E shock as well as in‐hospital mortality among the phenotypes, regardless of cause. Despite baseline differences, each cluster showed reproducible demographic, metabolic, and hemodynamic profiles across the 3 cohorts. Conclusions Using machine learning, we identified and validated 3 distinct CS phenotypes, with specific and reproducible associations with mortality. These phenotypes may allow for targeted patient enrollment in clinical trials and foster development of tailored treatment strategies in subsets of patients with CS.

Download Full-text

A machine learning approach to predict ethnicity using personal name and census location in Canada

PLoS ONE ◽

10.1371/journal.pone.0241239 ◽

2020 ◽

Vol 15 (11) ◽

pp. e0241239

Author(s):

Kai On Wong ◽

Osmar R. Zaïane ◽

Faith G. Davis ◽

Yutaka Yasui

Keyword(s):

Machine Learning ◽

First Nations ◽

Predictive Value ◽

Large Scale ◽

Performance Metrics ◽

Characteristic Curve ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Approach ◽

Machine Learning Approach

Background Canada is an ethnically-diverse country, yet its lack of ethnicity information in many large databases impedes effective population research and interventions. Automated ethnicity classification using machine learning has shown potential to address this data gap but its performance in Canada is largely unknown. This study conducted a large-scale machine learning framework to predict ethnicity using a novel set of name and census location features. Methods Using census 1901, the multiclass and binary class classification machine learning pipelines were developed. The 13 ethnic categories examined were Aboriginal (First Nations, Métis, Inuit, and all-combined)), Chinese, English, French, Irish, Italian, Japanese, Russian, Scottish, and others. Machine learning algorithms included regularized logistic regression, C-support vector, and naïve Bayes classifiers. Name features consisted of the entire name string, substrings, double-metaphones, and various name-entity patterns, while location features consisted of the entire location string and substrings of province, district, and subdistrict. Predictive performance metrics included sensitivity, specificity, positive predictive value, negative predictive value, F1, Area Under the Curve for Receiver Operating Characteristic curve, and accuracy. Results The census had 4,812,958 unique individuals. For multiclass classification, the highest performance achieved was 76% F1 and 91% accuracy. For binary classifications for Chinese, French, Italian, Japanese, Russian, and others, the F1 ranged 68–95% (median 87%). The lower performance for English, Irish, and Scottish (F1 ranged 63–67%) was likely due to their shared cultural and linguistic heritage. Adding census location features to the name-based models strongly improved the prediction in Aboriginal classification (F1 increased from 50% to 84%). Conclusions The automated machine learning approach using only name and census location features can predict the ethnicity of Canadians with varying performance by specific ethnic categories.

Download Full-text

A Machine Learning Approach for One-Stop Learning

Data Mining and Knowledge Discovery Technologies ◽

10.4018/978-1-59904-960-1.ch013 ◽

2008 ◽

pp. 333-357 ◽

Cited By ~ 1

Author(s):

Marco A. Alvarez ◽

SeungJin Lim

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Approach ◽

Internet Users ◽

Machine Learning Approach ◽

Supervised Learning Algorithms ◽

One Stop ◽

The Subject

Current search engines impose an overhead to motivated students and Internet users who employ the Web as a valuable resource for education. The user, searching for good educational materials for a technical subject, often spends extra time to filter irrelevant pages or ends up with commercial advertisements. It would be ideal if, given a technical subject by user who is educationally motivated, suitable materials with respect to the given subject are automatically identified by an affordable machine processing of the recommendation set returned by a search engine for the subject. In this scenario, the user can save a significant amount of time in filtering out less useful Web pages, and subsequently the user’s learning goal on the subject can be achieved more efficiently without clicking through numerous pages. This type of convenient learning is called One-Stop Learning (OSL). In this paper, the contributions made by Lim and Ko in (Lim and Ko, 2006) for OSL are redefined and modeled using machine learning algorithms. Four selected supervised learning algorithms: Support Vector Machine (SVM), AdaBoost, Naive Bayes and Neural Networks are evaluated using the same data used in (Lim and Ko, 2006). The results presented in this paper are promising, where the highest precision (98.9%) and overall accuracy (96.7%) obtained by using SVM is superior to the results presented by Lim and Ko. Furthermore, the machine learning approach presented here, demonstrates that the small set of features used to represent each Web page yields a good solution for the OSL problem.

Download Full-text

A Machine Learning Approach for Studying the Comorbidities of Complex Diagnoses

Behavioral Sciences ◽

10.3390/bs9120122 ◽

2019 ◽

Vol 9 (12) ◽

pp. 122 ◽

Cited By ~ 2

Author(s):

Marina Sánchez-Rico ◽

Jesús M. Alvarado

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Learning Approach ◽

Clinical Database ◽

Hierarchical Agglomerative Cluster ◽

Dimensionality Reduction Technique ◽

Machine Learning Approach ◽

Wide Variability ◽

Methodological Problems ◽

Hierarchical Agglomerative Cluster Analysis

The study of diagnostic associations entails a large number of methodological problems regarding the application of machine learning algorithms, collinearity and wide variability being some of the most prominent ones. To overcome these, we propose and tested the usage of uniform manifold approximation and projection (UMAP), a very recent, popular dimensionality reduction technique. We showed its effectiveness by using it on a large Spanish clinical database of patients diagnosed with depression, to whom we applied UMAP before grouping them using a hierarchical agglomerative cluster analysis. By extensively studying its behavior and results, validating them with purely unsupervised metrics, we show that they are consistent with well-known relationships, which validates the applicability of UMAP to advance the study of comorbidities.

Download Full-text

A Novel Ensemble Machine Learning Approach for Bioarchaeological Sex Prediction

Technologies ◽

10.3390/technologies9020023 ◽

2021 ◽

Vol 9 (2) ◽

pp. 23

Author(s):

Evan Muzzall

Keyword(s):

Machine Learning ◽

Missing Data ◽

Central Italy ◽

Area Under The Curve ◽

Machine Learning Algorithms ◽

Low Rank ◽

Future Research ◽

Learning Approach ◽

Machine Learning Approach ◽

Metric Distances

I present a novel machine learning approach to predict sex in the bioarchaeological record. Eighteen cranial interlandmark distances and five maxillary dental metric distances were recorded from n = 420 human skeletons from the necropolises at Alfedena (600–400 BCE) and Campovalano (750–200 BCE and 9–11th Centuries CE) in central Italy. A generalized low rank model (GLRM) was used to impute missing data and Area under the Curve—Receiver Operating Characteristic (AUC-ROC) with 20-fold stratified cross-validation was used to evaluate predictive performance of eight machine learning algorithms on different subsets of the data. Additional perspectives such as this one show strong potential for sex prediction in bioarchaeological and forensic anthropological contexts. Furthermore, GLRMs have the potential to handle missing data in ways previously unexplored in the discipline. Although results of this study look promising (highest AUC-ROC = 0.9722 for predicting binary male/female sex), the main limitation is that the sexes of the individuals included were not known but were estimated using standard macroscopic bioarchaeological methods. However, future research should apply this machine learning approach to known-sex reference samples in order to better understand its value, along with the more general contributions that machine learning can make to the reconstruction of past human lifeways.

Download Full-text

A Tree Based Machine Learning Approach for PTB Diagnostic Dataset

Journal of Physics Conference Series ◽

10.1088/1742-6596/2115/1/012042 ◽

2021 ◽

Vol 2115 (1) ◽

pp. 012042

Author(s):

S Premanand ◽

Sathiya Narayanan

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Deep Learning ◽

Learning Algorithms ◽

Primary Objective ◽

Machine Learning Algorithms ◽

Learning Approach ◽

Related Data ◽

Machine Learning Approach ◽

Health Related

Abstract The primary objective of this particular paper is to classify the health-related data without feature extraction in Machine Learning, which hinder the performance and reliability. The assumption of our work will be like, can we able to get better result for health-related data with the help of Tree based Machine Learning algorithms without extracting features like in Deep Learning. This study performs better classification with Tree based Machine Learning approach for the health-related medical data. After doing pre-processing, without feature extraction, i.e., from raw data signal with the help of Machine Learning algorithms we are able to get better results. The presented paper which has better result even when compared to some of the advanced Deep Learning architecture models. The results demonstrate that overall classification accuracy of Random Forest, XGBoost, LightGBM and CatBoost, Tree-based Machine Learning algorithms for normal and abnormal condition of the datasets was found to be 97.88%, 98.23%, 98.03% and 95.57% respectively.

Download Full-text

Sentiment Analysis on Social Media using Machine Learning Approach

10.22541/au.163620143.37655829/v1 ◽

2021 ◽

Author(s):

Erick Omuya ◽

George Okeyo ◽

Michael Kimwele

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Language Processing ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Approach ◽

K Nearest Neighbor ◽

Machine Learning Approach

Social media has been embraced by different people as a convenient and official medium of communication. People write messages and attach images and videos on Twitter, Facebook and other social media which they share. Social media therefore generates a lot of data that is rich in sentiments from these updates. Sentiment analysis has been used to determine opinions of clients, for instance, relating to a particular product or company. Knowledge based approach and Machine learning approach are among the strategies that have been used to analyze these sentiments. The performance of sentiment analysis is however distorted by noise, the curse of dimensionality, the data domains and size of data used for training and testing. This research aims at developing a model for sentiment analysis in which dimensionality reduction and the use of different parts of speech improves sentiment analysis performance. It uses natural language processing for filtering, storing and performing sentiment analysis on the data from social media. The model is tested using Naïve Bayes, Support Vector Machines and K-Nearest neighbor machine learning algorithms and its performance compared with that of two other Sentiment Analysis models. Experimental results show that the model improves sentiment analysis performance using machine learning techniques.

Download Full-text

A Novel Smart City-Based Framework on Perspectives for Application of Machine Learning in Combating COVID-19

BioMed Research International ◽

10.1155/2021/5546790 ◽

2021 ◽

Vol 2021 ◽

pp. 1-15

Author(s):

Absalom E. Ezugwu ◽

Ibrahim Abaker Targio Hashem ◽

Olaide N. Oyelade ◽

Mubarak Almutari ◽

Mohammed A. Al-Garadi ◽

...

Keyword(s):

Machine Learning ◽

Smart Cities ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Learning Approach ◽

Learning Framework ◽

National Healthcare ◽

Machine Learning Approach ◽

The Internet Of Things ◽

Analyze Data

The spread of COVID-19 worldwide continues despite multidimensional efforts to curtail its spread and provide treatment. Efforts to contain the COVID-19 pandemic have triggered partial or full lockdowns across the globe. This paper presents a novel framework that intelligently combines machine learning models and the Internet of Things (IoT) technology specifically to combat COVID-19 in smart cities. The purpose of the study is to promote the interoperability of machine learning algorithms with IoT technology by interacting with a population and its environment to curtail the COVID-19 pandemic. Furthermore, the study also investigates and discusses some solution frameworks, which can generate, capture, store, and analyze data using machine learning algorithms. These algorithms can detect, prevent, and trace the spread of COVID-19 and provide a better understanding of the disease in smart cities. Similarly, the study outlined case studies on the application of machine learning to help fight against COVID-19 in hospitals worldwide. The framework proposed in the study is a comprehensive presentation on the major components needed to integrate the machine learning approach with other AI-based solutions. Finally, the machine learning framework presented in this study has the potential to help national healthcare systems in curtailing the COVID-19 pandemic in smart cities. In addition, the proposed framework is poised as a pointer for generating research interests that would yield outcomes capable of been integrated to form an improved framework.

Download Full-text