Bacterial Immunogenicity Prediction by Machine Learning Methods

The identification of protective immunogens is the most important and vigorous initial step in the long-lasting and expensive process of vaccine design and development. Machine learning (ML) methods are very effective in data mining and in the analysis of big data such as microbial proteomes. They are able to significantly reduce the experimental work for discovering novel vaccine candidates. Here, we applied six supervised ML methods (partial least squares-based discriminant analysis, k nearest neighbor (kNN), random forest (RF), support vector machine (SVM), random subspace method (RSM), and extreme gradient boosting) on a set of 317 known bacterial immunogens and 317 bacterial non-immunogens and derived models for immunogenicity prediction. The models were validated by internal cross-validation in 10 groups from the training set and by the external test set. All of them showed good predictive ability, but the xgboost model displays the most prominent ability to identify immunogens by recognizing 84% of the known immunogens in the test set. The combined RSM-kNN model was the best in the recognition of non-immunogens, identifying 92% of them in the test set. The three best performing ML models (xgboost, RSM-kNN, and RF) were implemented in the new version of the server VaxiJen, and the prediction of bacterial immunogens is now based on majority voting.

Download Full-text

Application of Artificial Intelligence and Machine Learning Techniques in Classifying Extent of Dementia Across Alzheimer's Image Data

International Journal of Quantitative Structure-Property Relationships ◽

10.4018/ijqspr.2021040103 ◽

2021 ◽

Vol 6 (2) ◽

pp. 29-46

Author(s):

Robin Ghosh ◽

Anirudh Reddy Cingreddy ◽

Venkata Melapu ◽

Sravanthi Joginipelli ◽

Supratik Kar

Keyword(s):

Neural Network ◽

Machine Learning ◽

Nearest Neighbor ◽

Image Data ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

K Nearest Neighbor ◽

Mild Dementia ◽

Extreme Gradient Boosting

Alzheimer's disease (AD) is one of the most common forms of dementia and the sixth-leading cause of death in older adults. The presented study has illustrated the applications of deep learning (DL) and associated methods, which could have a broader impact on identifying dementia stages and may guide therapy in the future for multiclass image detection. The studied datasets contain around 6,400 magnetic resonance imaging (MRI) images, each segregated into the severity of Alzheimer's classes: mild dementia, very mild dementia, non-dementia, moderate dementia. These four image specifications were used to classify the dementia stages in each patient applying the convolutional neural network (CNN) algorithm. Employing the CNN-based in silico model, the authors successfully classified and predicted the different AD stages and got around 97.19% accuracy. Again, machine learning (ML) techniques like extreme gradient boosting (XGB), support vector machine (SVM), k-nearest neighbor (KNN), and artificial neural network (ANN) offered accuracy of 96.62%, 96.56%, 94.62, and 89.88%, respectively.

Download Full-text

IgA Nephropathy Prediction in Children with Machine Learning Algorithms

Future Internet ◽

10.3390/fi12120230 ◽

2020 ◽

Vol 12 (12) ◽

pp. 230

Author(s):

Ping Zhang ◽

Rongqin Wang ◽

Nianfeng Shi

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Immunoglobulin A ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

K Nearest Neighbor ◽

Chi Square ◽

Extreme Gradient Boosting

Immunoglobulin A nephropathy (IgAN) is the most common primary glomerular disease all over the world and it is a major cause of renal failure. IgAN prediction in children with machine learning algorithms has been rarely studied. We retrospectively analyzed the electronic medical records from the Nanjing Eastern War Zone Hospital, chose eXtreme Gradient Boosting (XGBoost), random forest (RF), CatBoost, support vector machines (SVM), k-nearest neighbor (KNN), and extreme learning machine (ELM) models in order to predict the probability that the patient would not reach or reach end-stage renal disease (ESRD) within five years, used the chi-square test to select the most relevant 16 features as the input of the model, and designed a decision-making system (DMS) of IgAN prediction in children that is based on XGBoost and Django framework. The receiver operating characteristic (ROC) curve was used in order to evaluate the performance of the models and XGBoost had the best performance by comparison. The AUC value, accuracy, precision, recall, and f1-score of XGBoost were 85.11%, 78.60%, 75.96%, 76.70%, and 76.33%, respectively. The XGBoost model is useful for physicians and pediatric patients in providing predictions regarding IgAN. As an advantage, a DMS can be designed based on the XGBoost model to assist a physician to effectively treat IgAN in children for preventing deterioration.

Download Full-text

Evaluation of Three Different Machine Learning Methods for Object-Based Artificial Terrace Mapping—A Case Study of the Loess Plateau, China

Remote Sensing ◽

10.3390/rs13051021 ◽

2021 ◽

Vol 13 (5) ◽

pp. 1021

Author(s):

Hu Ding ◽

Jiaming Na ◽

Shangjing Jiang ◽

Jie Zhu ◽

Kai Liu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Loess Plateau ◽

Water Conservation ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

The Loess Plateau ◽

Object Based ◽

Extreme Gradient Boosting

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.

Download Full-text

Stress Classification of ECG-Derived HRV Features Extracted from Wearable Devices

10.20944/preprints202103.0644.v1 ◽

2021 ◽

Author(s):

Kayisan Mary Dalmeida ◽

Giovanni Luca Masala

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Wearable Devices ◽

Mental Wellbeing ◽

Gradient Boosting ◽

Support Vector ◽

K Nearest Neighbor ◽

Automobile Crashes ◽

Machine Learning Model ◽

Stress Classification

Stress has been identified as one of the major causes of automobile crashes which then lead to high rates of fatalities and injuries each year. Stress can be measured via physiological measurements and in this study the focus will be based on the features that can be extracted by common wearable devices. Hence the study will be mainly focusing on the heart rate variability (HRV). This study is aimed to develop a good predictive model that can accurately classify stress levels from ECG-derived HRV features, obtained from automobile drivers, testing different machine learning methodologies such as K-Nearest Neighbor (KNN), Support Vector Machines (SVM), Multilayer Perceptron (MLP), Random Forest (RF) and Gradient Boosting (GB). Moreover, the models obtained with highest predictive power will be used as reference for the development of a machine learning model that would be used to classify stress from HRV features derived from HRV measurements obtained from wearable devices. We demonstrate that MLP was the ideal stress classifier by achieving a Recall of 80%. The proposed method can be also used on all applications in which is important to monitor the stress level e. g. in physical rehabilitation, anxiety relief or mental wellbeing.

Download Full-text

Machine learning-based patient classification system for adults with stroke: A systematic review

Chronic Illness ◽

10.1177/17423953211067435 ◽

2021 ◽

pp. 174239532110674

Author(s):

Suebsarn Ruksakulpiwat ◽

Witchuda Thongking ◽

Wendie Zhou ◽

Chitchanok Benjasirisan ◽

Lalipat Phianhasin ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Classification System ◽

Nearest Neighbor ◽

Gradient Boosting ◽

Support Vector ◽

K Nearest Neighbor ◽

Optimal Outcomes ◽

And Gender ◽

Meta Analyses

Objective To evaluate the existing evidence of a machine learning-based classification system that stratifies patients with stroke. Methods The authors carried out a systematic review following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) recommendations for a review article. PubMed, MEDLINE, Web of Science, and CINAHL Plus Full Text were searched from January 2015 to February 2021. Results There are twelve studies included in this systematic review. Fifteen algorithms were used in the included studies. The most common forms of machine learning (ML) used to classify stroke patients were the support vector machine (SVM) (n = 8 studies), followed by random forest (RF) (n = 7 studies), decision tree (DT) (n = 4 studies), gradient boosting (GB) (n = 4 studies), neural networks (NNs) (n = 3 studies), deep learning (n = 2 studies), and k-nearest neighbor (k-NN) (n = 2 studies), respectively. Forty-four features of inputs were used in the included studies, and age and gender are the most common features in the ML model. Discussion There is no single algorithm that performed better or worse than all others at classifying patients with stroke, in part because different input data require different algorithms to achieve optimal outcomes.

Download Full-text

Classification of Parkinson’s disease and essential tremor based on balance and gait characteristics from wearable motion sensors via machine learning techniques: a data-driven approach

Journal of NeuroEngineering and Rehabilitation ◽

10.1186/s12984-020-00756-5 ◽

2020 ◽

Vol 17 (1) ◽

Author(s):

Sanghee Moon ◽

Hyun-Je Song ◽

Vibhash D. Sharma ◽

Kelly E. Lyons ◽

Rajesh Pahwa ◽

...

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Motion Sensors ◽

Learning Models ◽

K Nearest Neighbor ◽

Gait Characteristics ◽

Machine Learning Models

Abstract Background Parkinson’s disease (PD) and essential tremor (ET) are movement disorders that can have similar clinical characteristics including tremor and gait difficulty. These disorders can be misdiagnosed leading to delay in appropriate treatment. The aim of the study was to determine whether balance and gait variables obtained with wearable inertial motion sensors can be utilized to differentiate between PD and ET using machine learning. Additionally, we compared classification performances of several machine learning models. Methods This retrospective study included balance and gait variables collected during the instrumented stand and walk test from people with PD (n = 524) and with ET (n = 43). Performance of several machine learning techniques including neural networks, support vector machine, k-nearest neighbor, decision tree, random forest, and gradient boosting, were compared with a dummy model or logistic regression using F1-scores. Results Machine learning models classified PD and ET based on balance and gait characteristics better than the dummy model (F1-score = 0.48) or logistic regression (F1-score = 0.53). The highest F1-score was 0.61 of neural network, followed by 0.59 of gradient boosting, 0.56 of random forest, 0.55 of support vector machine, 0.53 of decision tree, and 0.49 of k-nearest neighbor. Conclusions This study demonstrated the utility of machine learning models to classify different movement disorders based on balance and gait characteristics collected from wearable sensors. Future studies using a well-balanced data set are needed to confirm the potential clinical utility of machine learning models to discern between PD and ET.

Download Full-text

Comparative Study of Several Machine Learning Algorithms for Classification of Unifloral Honeys

Foods ◽

10.3390/foods10071543 ◽

2021 ◽

Vol 10 (7) ◽

pp. 1543

Author(s):

Fernando Mateo ◽

Andrea Tarazona ◽

Eva María Mateo

Keyword(s):

Machine Learning ◽

Discriminant Analysis ◽

Pollen Grains ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

K Nearest Neighbors ◽

Test Set ◽

Extreme Gradient Boosting ◽

Classical Methodology

Unifloral honeys are highly demanded by honey consumers, especially in Europe. To ensure that a honey belongs to a very appreciated botanical class, the classical methodology is palynological analysis to identify and count pollen grains. Highly trained personnel are needed to perform this task, which complicates the characterization of honey botanical origins. Organoleptic assessment of honey by expert personnel helps to confirm such classification. In this study, the ability of different machine learning (ML) algorithms to correctly classify seven types of Spanish honeys of single botanical origins (rosemary, citrus, lavender, sunflower, eucalyptus, heather and forest honeydew) was investigated comparatively. The botanical origin of the samples was ascertained by pollen analysis complemented with organoleptic assessment. Physicochemical parameters such as electrical conductivity, pH, water content, carbohydrates and color of unifloral honeys were used to build the dataset. The following ML algorithms were tested: penalized discriminant analysis (PDA), shrinkage discriminant analysis (SDA), high-dimensional discriminant analysis (HDDA), nearest shrunken centroids (PAM), partial least squares (PLS), C5.0 tree, extremely randomized trees (ET), weighted k-nearest neighbors (KKNN), artificial neural networks (ANN), random forest (RF), support vector machine (SVM) with linear and radial kernels and extreme gradient boosting trees (XGBoost). The ML models were optimized by repeated 10-fold cross-validation primarily on the basis of log loss or accuracy metrics, and their performance was compared on a test set in order to select the best predicting model. Built models using PDA produced the best results in terms of overall accuracy on the test set. ANN, ET, RF and XGBoost models also provided good results, while SVM proved to be the worst.

Download Full-text

A hybrid evolutionary learning classification for robot ground pattern recognition

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202940 ◽

2021 ◽

pp. 1-15

Author(s):

Jiankai Zuo ◽

Yaying Zhang

Keyword(s):

Nearest Neighbor ◽

Fitness Function ◽

Gradient Boosting ◽

Support Vector ◽

Evolutionary Learning ◽

Ensemble Classifiers ◽

Improved Genetic Algorithm ◽

K Nearest Neighbor ◽

Obvious Effect ◽

Extreme Gradient Boosting

In the field of intelligent robot engineering, whether it is humanoid, bionic or vehicle robots, the driving forms of standing, moving and walking, and the consciousness discrimination of the environment in which they are located have always been the focus and difficulty of research. Based on such problems, Naive Bayes Classifier (NBC), Support Vector Machine(SVM), k-Nearest-Neighbor (KNN), Decision Tree (DT), Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) were introduced to conduct experiments. The six individual classifiers have an obvious effect on a particular type of ground, but the overall performance is poor. Therefore, the paper proposes a “Novel Hybrid Evolutionary Learning” method (NHEL) which combines every single classifier by means of weighted voting and adopts an improved genetic algorithm (GA) to obtain the optimal weight. According to the fitness function and evolution times, this paper designs the adaptively changing crossover and mutation rate and applies the conjugate gradient (CG) to enhance GA. By making full use of the global search capabilities of GA and the fast local search ability of CG, the convergence speed is accelerated and the search precision is upgraded. The experimental results show that the performance of the proposed model is significantly better than individual machine learning and ensemble classifiers.

Download Full-text

Vocal Feature Extraction-Based Artificial Intelligent Model for Parkinson’s Disease Detection

Diagnostics ◽

10.3390/diagnostics11061076 ◽

2021 ◽

Vol 11 (6) ◽

pp. 1076

Author(s):

Muntasir Hoq ◽

Mohammed Nazim Uddin ◽

Seung-Bo Park

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Nearest Neighbor ◽

Neurodegenerative Disorder ◽

Imbalanced Data ◽

Principal Component ◽

Gradient Boosting ◽

Support Vector ◽

K Nearest Neighbor ◽

Extreme Gradient Boosting

As a neurodegenerative disorder, Parkinson’s disease (PD) affects the nerve cells of the human brain. Early detection and treatment can help to relieve the symptoms of PD. Recent PD studies have extracted the features from vocal disorders as a harbinger for PD detection, as patients face vocal changes and impairments at the early stages of PD. In this study, two hybrid models based on a Support Vector Machine (SVM) integrating with a Principal Component Analysis (PCA) and a Sparse Autoencoder (SAE) are proposed to detect PD patients based on their vocal features. The first model extracted and reduced the principal components of vocal features based on the explained variance of each feature using PCA. For the first time, the second model used a novel Deep Neural Network (DNN) of an SAE, consisting of multiple hidden layers with L1 regularization to compress the vocal features into lower-dimensional latent space. In both models, reduced features were fed into the SVM as inputs, which performed classification by learning hyperplanes, along with projecting the data into a higher dimension. An F1-score, a Mathews Correlation Coefficient (MCC), and a Precision-Recall curve were used, along with accuracy to evaluate the proposed models due to highly imbalanced data. With its highest accuracy of 0.935, F1-score of 0.951, and MCC value of 0.788, the probing results show that the proposed model of the SAE-SVM surpassed not only the former model of the PCA-SVM and other standard models including Multilayer Perceptron (MLP), Extreme Gradient Boosting (XGBoost), K-Nearest Neighbor (KNN), and Random Forest (RF), but also surpassed two recent studies using the same dataset. Oversampling and balancing the dataset with SMOTE boosted the performance of the models.

Download Full-text

Application of Machine Learning Approaches for the Design and Study of Anticancer Drugs

Current Drug Targets ◽

10.2174/1389450119666180809122244 ◽

2019 ◽

Vol 20 (5) ◽

pp. 488-500 ◽

Cited By ~ 6

Author(s):

Yan Hu ◽

Yi Lu ◽

Shuo Wang ◽

Mengying Zhang ◽

Xiaosheng Qu ◽

...

Keyword(s):

Machine Learning ◽

Drug Design ◽

Anticancer Drugs ◽

Nearest Neighbor ◽

Cost Effective ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Activity Prediction ◽

Linear Discriminant

Background: Globally the number of cancer patients and deaths are continuing to increase yearly, and cancer has, therefore, become one of the world's highest causes of morbidity and mortality. In recent years, the study of anticancer drugs has become one of the most popular medical topics. Objective: In this review, in order to study the application of machine learning in predicting anticancer drugs activity, some machine learning approaches such as Linear Discriminant Analysis (LDA), Principal components analysis (PCA), Support Vector Machine (SVM), Random forest (RF), k-Nearest Neighbor (kNN), and Naïve Bayes (NB) were selected, and the examples of their applications in anticancer drugs design are listed. Results: Machine learning contributes a lot to anticancer drugs design and helps researchers by saving time and is cost effective. However, it can only be an assisting tool for drug design. Conclusion: This paper introduces the application of machine learning approaches in anticancer drug design. Many examples of success in identification and prediction in the area of anticancer drugs activity prediction are discussed, and the anticancer drugs research is still in active progress. Moreover, the merits of some web servers related to anticancer drugs are mentioned.

Download Full-text