Hypertension-Related Drug Activity Identification Based on Novel Ensemble Method

Hypertension is a chronic disease and major risk factor for cardiovascular and cerebrovascular diseases that often leads to damage to target organs. The prevention and treatment of hypertension is crucially important for human health. In this paper, a novel ensemble method based on a flexible neural tree (FNT) is proposed to identify hypertension-related active compounds. In the ensemble method, the base classifiers are Multi-Grained Cascade Forest (gcForest), support vector machines (SVM), random forest (RF), AdaBoost, decision tree (DT), Gradient Boosting Decision Tree (GBDT), KNN, logical regression, and naïve Bayes (NB). The classification results of nine classifiers are utilized as the input vector of FNT, which is utilized as a nonlinear ensemble method to identify hypertension-related drug compounds. The experiment data are extracted from hypertension-unrelated and hypertension-related compounds collected from the up-to-date literature. The results reveal that our proposed ensemble method performs better than other single classifiers in terms of ROC curve, AUC, TPR, FRP, Precision, Specificity, and F1. Our proposed method is also compared with the averaged and voting ensemble methods. The results reveal that our method could identify hypertension-related compounds more accurately than two classical ensemble methods.

Download Full-text

Analysis of Heart Disease Using Parallel and Sequential Ensemble Methods With Feature Selection Techniques

International Journal of Big Data and Analytics in Healthcare ◽

10.4018/ijbdah.20210101.oa4 ◽

2021 ◽

Vol 6 (1) ◽

pp. 40-56

Author(s):

Dhyan Chandra Yadav ◽

Saurabh Pal

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Random Forest ◽

Decision Tree ◽

Classification Accuracy ◽

Ensemble Methods ◽

Machine Learning Algorithms ◽

Ensemble Method ◽

Gradient Boosting ◽

High Classification Accuracy

This paper has organized a heart disease-related dataset from UCI repository. The organized dataset describes variables correlations with class-level target variables. This experiment has analyzed the variables by different machine learning algorithms. The authors have considered prediction-based previous work and finds some machine learning algorithms did not properly work or do not cover 100% classification accuracy with overfitting, underfitting, noisy data, residual errors on base level decision tree. This research has used Pearson correlation and chi-square features selection-based algorithms for heart disease attributes correlation strength. The main objective of this research to achieved highest classification accuracy with fewer errors. So, the authors have used parallel and sequential ensemble methods to reduce above drawback in prediction. The parallel and serial ensemble methods were organized by J48 algorithm, reduced error pruning, and decision stump algorithm decision tree-based algorithms. This paper has used random forest ensemble method for parallel randomly selection in prediction and various sequential ensemble methods such as AdaBoost, Gradient Boosting, and XGBoost Meta classifiers. In this paper, the experiment divides into two parts: The first part deals with J48, reduced error pruning and decision stump and generated a random forest ensemble method. This parallel ensemble method calculated high classification accuracy 100% with low error. The second part of the experiment deals with J48, reduced error pruning, and decision stump with three sequential ensemble methods, namely AdaBoostM1, XG Boost, and Gradient Boosting. The XG Boost ensemble method calculated better results or high classification accuracy and low error compare to AdaBoostM1 and Gradient Boosting ensemble methods. The XG Boost ensemble method calculated 98.05% classification accuracy, but random forest ensemble method calculated high classification accuracy 100% with low error.

Download Full-text

Comparison of Ensemble Machine Learning Methods for Soil Erosion Pin Measurements

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10010042 ◽

2021 ◽

Vol 10 (1) ◽

pp. 42

Author(s):

Kieu Anh Nguyen ◽

Walter Chen ◽

Bor-Shiun Lin ◽

Uma Seeboonruang

Keyword(s):

Machine Learning ◽

Soil Erosion ◽

Ensemble Methods ◽

Machine Learning Algorithms ◽

Multivariate Adaptive Regression Splines ◽

Gradient Boosting ◽

Support Vector ◽

Ensemble Machine Learning ◽

Boosting Method ◽

Bagging Method

Although machine learning has been extensively used in various fields, it has only recently been applied to soil erosion pin modeling. To improve upon previous methods of quantifying soil erosion based on erosion pin measurements, this study explored the possible application of ensemble machine learning algorithms to the Shihmen Reservoir watershed in northern Taiwan. Three categories of ensemble methods were considered in this study: (a) Bagging, (b) boosting, and (c) stacking. The bagging method in this study refers to bagged multivariate adaptive regression splines (bagged MARS) and random forest (RF), and the boosting method includes Cubist and gradient boosting machine (GBM). Finally, the stacking method is an ensemble method that uses a meta-model to combine the predictions of base models. This study used RF and GBM as the meta-models, decision tree, linear regression, artificial neural network, and support vector machine as the base models. The dataset used in this study was sampled using stratified random sampling to achieve a 70/30 split for the training and test data, and the process was repeated three times. The performance of six ensemble methods in three categories was analyzed based on the average of three attempts. It was found that GBM performed the best among the ensemble models with the lowest root-mean-square error (RMSE = 1.72 mm/year), the highest Nash-Sutcliffe efficiency (NSE = 0.54), and the highest index of agreement (d = 0.81). This result was confirmed by the spatial comparison of the absolute differences (errors) between model predictions and observations using GBM and RF in the study area. In summary, the results show that as a group, the bagging method and the boosting method performed equally well, and the stacking method was third for the erosion pin dataset considered in this study.

Download Full-text

DESIGN OF DECISION TREE VIA KERNELIZED HIERARCHICAL CLUSTERING FOR MULTICLASS SUPPORT VECTOR MACHINES

Cybernetics & Systems ◽

10.1080/01969720601139058 ◽

2007 ◽

Vol 38 (2) ◽

pp. 187-202 ◽

Cited By ~ 6

Author(s):

Zhao Lu ◽

Feng Lin ◽

Hao Ying

Keyword(s):

Support Vector Machines ◽

Decision Tree ◽

Hierarchical Clustering ◽

Support Vector ◽

Vector Machines ◽

Multiclass Support Vector Machines

Download Full-text

The impact of different parameter sets on the classification of asteroid types

10.5194/epsc2021-807 ◽

2021 ◽

Author(s):

Hanna Klimczak ◽

Wojciech Kotłowski ◽

Dagmara Oszkiewicz ◽

Francesca DeMeo ◽

Agnieszka Kryszczyńska ◽

...

Keyword(s):

Gradient Boosting ◽

Support Vector ◽

Multilayer Perceptrons ◽

Machine Learning Methods ◽

Vector Machines ◽

Science Centre ◽

The Difference ◽

The Impact

The aim of the project is the classification of asteroids according to the most commonly used asteroid taxonomy (Bus-Demeo et al. 2009) with the use of various machine learning methods like Logistic Regression, Naive Bayes, Support Vector Machines, Gradient Boosting and Multilayer Perceptrons. Different parameter sets are used for classification in order to compare the quality of prediction with limited amount of data, namely the difference in performance between using the 0.45mu to 2.45mu spectral range and multiple spectral features, as well as performing the Prinicpal Component Analysis to reduce the dimensions of the spectral data. &#160; This work has been supported by grant&#160;No. 2017/25/B/ST9/00740 from the National Science Centre, Poland.

Download Full-text

Multi-Class Taxonomy of Well Integrity Anomalies Applying Inductive Learning Algorithms: Analytical Approach for Artificial-Lift Wells

10.2118/206129-ms ◽

2021 ◽

Author(s):

Mostafa Sa'eed Yakoot ◽

Adel Mohamed Salem Ragab ◽

Omar Mahmoud

Keyword(s):

Decision Tree ◽

Confusion Matrix ◽

Learning Algorithms ◽

Oil And Gas Industry ◽

Classification Model ◽

Gradient Boosting ◽

Support Vector ◽

Risk Category ◽

Well Integrity ◽

Extreme Gradient Boosting

Abstract Well integrity has become a crucial field with increased focus and being published intensively in industry researches. It is important to maintain the integrity of the individual well to ensure that wells operate as expected for their designated life (or higher) with all risks kept as low as reasonably practicable, or as specified. Machine learning (ML) and artificial intelligence (AI) models are used intensively in oil and gas industry nowadays. ML concept is based on powerful algorithms and robust database. Developing an efficient classification model for well integrity (WI) anomalies is now feasible because of having enormous number of well failures and well barrier integrity tests, and analyses in the database. Circa 9000 dataset points were collected from WI tests performed for 800 wells in Gulf of Suez, Egypt for almost 10 years. Moreover, those data have been quality-controlled and quality-assured by experienced engineers. The data contain different forms of WI failures. The contributing parameter set includes a total of 23 barrier elements. Data were structured and fed into 11 different ML algorithms to build an automated systematic tool for calculating imposed risk category of any well. Comparison analysis for the deployed models was performed to infer the best predictive model that can be relied on. 11 models include both supervised and ensemble learning algorithms such as random forest, support vector machine (SVM), decision tree and scalable boosting techniques. Out of 11 models, the results showed that extreme gradient boosting (XGB), categorical boosting (CatBoost), and decision tree are the most reliable algorithms. Moreover, novel evaluation metrics for confusion matrix of each model have been introduced to overcome the problem of existing metrics which don't consider domain knowledge during model evaluation. The innovated model will help to utilize company resources efficiently and dedicate personnel efforts to wells with the high-risk. As a result, progressive improvements on business, safety, environment, and performance of the business. This paper would be a milestone in the design and creation of the Well Integrity Database Management Program through the combination of integrity and ML.

Download Full-text

Novel Design of Decision-Tree-Based Support Vector Machines Multi-class Classifier

Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-540-74205-0_90 ◽

2007 ◽

pp. 871-880 ◽

Cited By ~ 1

Author(s):

Liaoying Zhao ◽

Xiaorun Li ◽

Guangzhou Zhao

Keyword(s):

Support Vector Machines ◽

Decision Tree ◽

Support Vector ◽

Vector Machines ◽

Novel Design

Download Full-text

STree: A Single Multi-class Oblique Decision Tree Based on Support Vector Machines

10.1007/978-3-030-85713-4_6 ◽

2021 ◽

pp. 54-64

Author(s):

Ricardo Montañana ◽

Jose A. Gámez ◽

Jose M. Puerta

Keyword(s):

Support Vector Machines ◽

Decision Tree ◽

Support Vector ◽

Vector Machines

Download Full-text

Decision Tree as an Accelerator for Support Vector Machines

Advances in Character Recognition ◽

10.5772/52227 ◽

2012 ◽

Cited By ~ 7

Author(s):

Fu Chang ◽

Chan-Cheng Liu

Keyword(s):

Support Vector Machines ◽

Decision Tree ◽

Support Vector ◽

Vector Machines

Download Full-text

Exploiting Visual Features in Financial Time Series Prediction

International Journal of Cognitive Informatics and Natural Intelligence ◽

10.4018/ijcini.2020040104 ◽

2020 ◽

Vol 14 (2) ◽

pp. 61-76

Author(s):

Adil Gürsel Karaçor ◽

Turan Erman Erkan

Keyword(s):

Financial Time Series ◽

Time Series Prediction ◽

Visual Features ◽

Gradient Boosting ◽

Support Vector ◽

Foreign Exchange Rates ◽

Extreme Gradient Boosting ◽

Vector Machines ◽

Direction Of Movement ◽

Visual Properties

The possibility to enhance prediction accuracy for foreign exchange rates was investigated in two ways: first applying an outside the box approach to modeling price graphs by exploiting their visual properties, and secondly employing the most efficient methods to detect patterns to classify the direction of movement. The approach that exploits the visual properties of price graphs which make use of density regions along with high and low values describing the shape; hence, the authors propose the name ‘Finance Vision.' The data used in the predictive model consists of 1-hour past price values of 4 different currency pairs, between 2003 and 2016. Prediction performances of state-of-the-art methods; Extreme Gradient Boosting, Artificial Neural Network and Support Vector Machines are compared over the same data with the same sets of features. Results show that density based visual features contribute considerably to prediction performance.

Download Full-text

Steganalysis of Adaptive Multi-Rate Speech Based on Extreme Gradient Boosting

Electronics ◽

10.3390/electronics9030522 ◽

2020 ◽

Vol 9 (3) ◽

pp. 522

Author(s):

Congcong Sun ◽

Hui Tian ◽

Chin-Chen Chang ◽

Yewang Chen ◽

Yiqiao Cai ◽

...

Keyword(s):

Statistical Characteristics ◽

Gradient Boosting ◽

Support Vector ◽

Final State ◽

Markov Transition Matrix ◽

Markov Transition ◽

Extreme Gradient Boosting ◽

Vector Machines ◽

Feature Based ◽

Series Of Experiments

Steganalysis of adaptive multi-rate (AMR) speech is a hot topic for controlling cybercrimes grounded in steganography in related speech streams. In this paper, we first present a novel AMR steganalysis model, which utilizes extreme gradient boosting (XGBoost) as the classifier, instead of support vector machines (SVM) adopted in the previous schemes. Compared with the SVM-based model, this new model can facilitate the excavation of potential information from the high-dimensional features and can avoid overfitting. Moreover, to further strengthen the preceding features based on the statistical characteristics of pulse pairs, we present the convergence feature based on the Markov chain to reflect the global characterization of pulse pairs, which is essentially the final state of the Markov transition matrix. Combining the convergence feature with the preceding features, we propose an XGBoost-based steganalysis scheme for AMR speech streams. Finally, we conducted a series of experiments to assess our presented scheme and compared it with previous schemes. The experimental results demonstrate that the proposed scheme is feasible, and can provide better performance in terms of detecting the existing steganography methods based on AMR speech streams.

Download Full-text