Prediction of Train Arrival Delay Using Hybrid ELM-PSO Approach

In this study, a hybrid method combining extreme learning machine (ELM) and particle swarm optimization (PSO) is proposed to forecast train arrival delays that can be used for later delay management and timetable optimization. First, nine characteristics (e.g., buffer time, the train number, and station code) associated with train arrival delays are chosen and analyzed using extra trees classifier. Next, an ELM with one hidden layer is developed to predict train arrival delays by considering these characteristics mentioned before as input features. Furthermore, the PSO algorithm is chosen to optimize the hyperparameter of the ELM compared to Bayesian optimization and genetic algorithm solving the arduousness problem of manual regulating. Finally, a case is studied to confirm the advantage of the proposed model. Contrasted to four baseline models (k-nearest neighbor, categorical boosting, Lasso, and gradient boosting decision tree) across different metrics, the proposed model is demonstrated to be proficient and achieve the highest prediction accuracy. In addition, through a detailed analysis of the prediction error, it is found that our model possesses good robustness and correctness.

Download Full-text

Computational Intelligence-Based Model for Mortality Rate Prediction in COVID-19 Patients

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18126429 ◽

2021 ◽

Vol 18 (12) ◽

pp. 6429

Author(s):

Irfan Ullah Khan ◽

Nida Aslam ◽

Malak Aljabri ◽

Sumayh S. Aljameel ◽

Mariam Moataz Aly Kamaleldin ◽

...

Keyword(s):

Mortality Rate ◽

Computational Intelligence ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

Detection And Identification ◽

Proposed Model ◽

Extreme Gradient Boosting ◽

The World ◽

Detection And Diagnosis

The COVID-19 outbreak is currently one of the biggest challenges facing countries around the world. Millions of people have lost their lives due to COVID-19. Therefore, the accurate early detection and identification of severe COVID-19 cases can reduce the mortality rate and the likelihood of further complications. Machine Learning (ML) and Deep Learning (DL) models have been shown to be effective in the detection and diagnosis of several diseases, including COVID-19. This study used ML algorithms, such as Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and K-Nearest Neighbor (KNN) and DL model (containing six layers with ReLU and output layer with sigmoid activation), to predict the mortality rate in COVID-19 cases. Models were trained using confirmed COVID-19 patients from 146 countries. Comparative analysis was performed among ML and DL models using a reduced feature set. The best results were achieved using the proposed DL model, with an accuracy of 0.97. Experimental results reveal the significance of the proposed model over the baseline study in the literature with the reduced feature set.

Download Full-text

A hybrid cost-sensitive ensemble for heart disease prediction

10.21203/rs.2.22946/v1 ◽

2020 ◽

Author(s):

Zhenya Qi ◽

Zuoru Zhang

Keyword(s):

Heart Disease ◽

Cross Validation ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Misclassification Cost ◽

Proposed Model ◽

Learning Machine ◽

Fold Cross Validation ◽

Very High

Abstract Heart disease is the primary cause of morbidity and mortality in the world. It includes numerous problems and symptoms. The diagnosis of heart disease is difficult because there are too many factors to analyze. What's more, the misclassification cost could be very high. In this paper, I firstly propose a cost-sensitive ensemble model to improve the accuracy of diagnosis and reduce the misclassification cost. The proposed model contains five heterogeneous classifiers: random forest, logistic regression, support vector machine, extreme learning machine and k-nearest neighbor. Then, experiments are done on three datasets from UCI machine learning repository. The highest classification accuracy of 91.74%, highest G-mean of 90.55%, highest precision of 96.11%, highest recall of 89.61% and lowest misclassification cost of 30.32% are achieved by the proposed model according to ten-fold cross validation. The results demonstrate that the performance of the proposed model is superior to those of previously reported classification techniques.

Download Full-text

Evaluation of Three Different Machine Learning Methods for Object-Based Artificial Terrace Mapping—A Case Study of the Loess Plateau, China

Remote Sensing ◽

10.3390/rs13051021 ◽

2021 ◽

Vol 13 (5) ◽

pp. 1021

Author(s):

Hu Ding ◽

Jiaming Na ◽

Shangjing Jiang ◽

Jie Zhu ◽

Kai Liu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Loess Plateau ◽

Water Conservation ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

The Loess Plateau ◽

Object Based ◽

Extreme Gradient Boosting

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.

Download Full-text

Assessing the Relation between Mud Components and Rheology for Loss Circulation Prevention Using Polymeric Gels: A Machine Learning Approach

Energies ◽

10.3390/en14051377 ◽

2021 ◽

Vol 14 (5) ◽

pp. 1377

Author(s):

Musaab I. Magzoub ◽

Raj Kiran ◽

Saeed Salehi ◽

Ibnelwaleed A. Hussein ◽

Mustafa S. Nasser

Keyword(s):

Machine Learning ◽

Rheological Properties ◽

Nearest Neighbor ◽

Drilling Fluid ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

Wide Range ◽

Machine Learning Approach ◽

Drilling Operations

The traditional way to mitigate loss circulation in drilling operations is to use preventative and curative materials. However, it is difficult to quantify the amount of materials from every possible combination to produce customized rheological properties. In this study, machine learning (ML) is used to develop a framework to identify material composition for loss circulation applications based on the desired rheological characteristics. The relation between the rheological properties and the mud components for polyacrylamide/polyethyleneimine (PAM/PEI)-based mud is assessed experimentally. Four different ML algorithms were implemented to model the rheological data for various mud components at different concentrations and testing conditions. These four algorithms include (a) k-Nearest Neighbor, (b) Random Forest, (c) Gradient Boosting, and (d) AdaBoosting. The Gradient Boosting model showed the highest accuracy (91 and 74% for plastic and apparent viscosity, respectively), which can be further used for hydraulic calculations. Overall, the experimental study presented in this paper, together with the proposed ML-based framework, adds valuable information to the design of PAM/PEI-based mud. The ML models allowed a wide range of rheology assessments for various drilling fluid formulations with a mean accuracy of up to 91%. The case study has shown that with the appropriate combination of materials, reasonable rheological properties could be achieved to prevent loss circulation by managing the equivalent circulating density (ECD).

Download Full-text

mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation

Bioinformatics ◽

10.1093/bioinformatics/bty1047 ◽

2018 ◽

Vol 35 (16) ◽

pp. 2757-2765 ◽

Cited By ~ 63

Author(s):

Balachandran Manavalan ◽

Shaherin Basith ◽

Tae Hwan Shin ◽

Leyi Wei ◽

Gwang Lee

Keyword(s):

Nearest Neighbor ◽

Feature Representation ◽

Superior Performance ◽

Supplementary Information ◽

Gradient Boosting ◽

Support Vector ◽

Pharmaceutical Drugs ◽

K Nearest Neighbor ◽

Feature Descriptors ◽

Predicted Probability

AbstractMotivationCardiovascular disease is the primary cause of death globally accounting for approximately 17.7 million deaths per year. One of the stakes linked with cardiovascular diseases and other complications is hypertension. Naturally derived bioactive peptides with antihypertensive activities serve as promising alternatives to pharmaceutical drugs. So far, there is no comprehensive analysis, assessment of diverse features and implementation of various machine-learning (ML) algorithms applied for antihypertensive peptide (AHTP) model construction.ResultsIn this study, we utilized six different ML algorithms, namely, Adaboost, extremely randomized tree (ERT), gradient boosting (GB), k-nearest neighbor, random forest (RF) and support vector machine (SVM) using 51 feature descriptors derived from eight different feature encodings for the prediction of AHTPs. While ERT-based trained models performed consistently better than other algorithms regardless of various feature descriptors, we treated them as baseline predictors, whose predicted probability of AHTPs was further used as input features separately for four different ML-algorithms (ERT, GB, RF and SVM) and developed their corresponding meta-predictors using a two-step feature selection protocol. Subsequently, the integration of four meta-predictors through an ensemble learning approach improved the balanced prediction performance and model robustness on the independent dataset. Upon comparison with existing methods, mAHTPred showed superior performance with an overall improvement of approximately 6–7% in both benchmarking and independent datasets.Availability and implementationThe user-friendly online prediction tool, mAHTPred is freely accessible at http://thegleelab.org/mAHTPred.Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text

Land Cover Classification Using the Proposed Texture Model and Fuzzy k-NN Classifier

Optimization Techniques for Problem Solving in Uncertainty - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-5091-4.ch009 ◽

2018 ◽

pp. 226-261

Author(s):

Jenicka S.

Keyword(s):

Nearest Neighbor ◽

Confusion Matrix ◽

Texture Feature ◽

Image Texture ◽

Kappa Statistics ◽

Grey Level ◽

Classification Problems ◽

K Nearest Neighbor ◽

Texture Model ◽

Proposed Model

Texture feature is a decisive factor in pattern classification problems because texture features are not deduced from the intensity of current pixel but from the grey level intensity variations of current pixel with its neighbors. In this chapter, a new texture model called multivariate binary threshold pattern (MBTP) has been proposed with five discrete levels such as -9, -1, 0, 1, and 9 characterizing the grey level intensity variations of the center pixel with its neighbors in the local neighborhood of each band in a multispectral image. Texture-based classification has been performed with the proposed model using fuzzy k-nearest neighbor (fuzzy k-NN) algorithm on IRS-P6, LISS-IV data, and the results have been evaluated based on confusion matrix, classification accuracy, and Kappa statistics. From the experiments, it is found that the proposed model outperforms other chosen existing texture models.

Download Full-text

Bacterial Immunogenicity Prediction by Machine Learning Methods

Vaccines ◽

10.3390/vaccines8040709 ◽

2020 ◽

Vol 8 (4) ◽

pp. 709

Author(s):

Ivan Dimitrov ◽

Nevena Zaharieva ◽

Irini Doytchinova

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Predictive Ability ◽

Initial Step ◽

Majority Voting ◽

Gradient Boosting ◽

Support Vector ◽

K Nearest Neighbor ◽

Test Set ◽

Extreme Gradient Boosting

The identification of protective immunogens is the most important and vigorous initial step in the long-lasting and expensive process of vaccine design and development. Machine learning (ML) methods are very effective in data mining and in the analysis of big data such as microbial proteomes. They are able to significantly reduce the experimental work for discovering novel vaccine candidates. Here, we applied six supervised ML methods (partial least squares-based discriminant analysis, k nearest neighbor (kNN), random forest (RF), support vector machine (SVM), random subspace method (RSM), and extreme gradient boosting) on a set of 317 known bacterial immunogens and 317 bacterial non-immunogens and derived models for immunogenicity prediction. The models were validated by internal cross-validation in 10 groups from the training set and by the external test set. All of them showed good predictive ability, but the xgboost model displays the most prominent ability to identify immunogens by recognizing 84% of the known immunogens in the test set. The combined RSM-kNN model was the best in the recognition of non-immunogens, identifying 92% of them in the test set. The three best performing ML models (xgboost, RSM-kNN, and RF) were implemented in the new version of the server VaxiJen, and the prediction of bacterial immunogens is now based on majority voting.

Download Full-text

A Novel Hybrid Model Based on Extreme Learning Machine, k-Nearest Neighbor Regression and Wavelet Denoising Applied to Short-Term Electric Load Forecasting

Energies ◽

10.3390/en10050694 ◽

2017 ◽

Vol 10 (5) ◽

pp. 694 ◽

Cited By ~ 9

Author(s):

◽

Keyword(s):

Extreme Learning Machine ◽

Hybrid Model ◽

Nearest Neighbor ◽

Load Forecasting ◽

Wavelet Denoising ◽

Electric Load ◽

K Nearest Neighbor ◽

Short Term ◽

Model Based ◽

Learning Machine

Download Full-text

Chicago Crime Analysis using R Programming

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit1952173 ◽

2019 ◽

pp. 937-944

Author(s):

Monish N

Keyword(s):

Law Enforcement ◽

Data Science ◽

Nearest Neighbor ◽

Crime Rates ◽

K Nearest Neighbor ◽

Proposed Model ◽

The Us ◽

Steep Decline ◽

R Programming ◽

And Behavior

In recent years law enforcement have improved by taking better strategies, computer aided technology, efficient use of resource, etc. As a result of these over the couple of years there has been a steep decline in crime rate in the US (United States). Law enforcement have turned to data science for insights (ranging from reports, corrective analysis and behavior modelling). There has been an overall drop in crime rates in Chicago in recent years. In fact, these rates are at the lowest when compared to the previous decades. This paper uses the criminal dataset found at “data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2” to describe historical trends, insights, etc. in Chicago from 1965 to 2018 and not to assign any casual interpretation of the vanguards of crime rates during this period. Here K-Nearest Neighbor (KNN) classification is used for training and crime predication. Discussions on future investigation can also be found. The proposed model has an accuracy of 83.2%.

Download Full-text

Heart disease prediction model with k-nearest neighbor algorithm

International Journal of Informatics and Communication Technology (IJ-ICT) ◽

10.11591/ijict.v10i3.pp225-230 ◽

2021 ◽

Vol 10 (3) ◽

pp. 225

Author(s):

Tssehay Admassu Assegie

Keyword(s):

Heart Disease ◽

Prediction Model ◽

Nearest Neighbor ◽

Predictive Performance ◽

Data Repository ◽

Disease Prediction ◽

K Nearest Neighbor ◽

Proposed Model ◽

K Nearest Neighbor Algorithm ◽

Learning Data

<span>In this study, the author proposed k-nearest neighbor (KNN) based heart disease prediction model. The author conducted an experiment to evaluate the performance of the proposed model. Moreover, the result of the experimental evaluation of the predictive performance of the proposed model is analyzed. To conduct the study, the author obtained heart disease data from Kaggle machine learning data repository. The dataset consists of 1025 observations of which 499 or 48.68% is heart disease negative and 526 or 51.32% is heart disease positive. Finally, the performance of KNN algorithm is analyzed on the test set. The result of performance analysis on the experimental results on the Kaggle heart disease data repository shows that the accuracy of the KNN is 91.99%</span>

Download Full-text