Stacked Framework for Ensemble of Heterogeneous Classification Algorithms

Ensemble methods fabricate a sequence of classifiers for classifying fresh instances by procuring a weighted vote of their individual predictions. Toning down the error and increasing accuracy is an avant-garde problem in ensemble classification. This paper presents a novel generic object-oriented voting and weighting adapted stacking framework for utilizing an ensemble of classifiers for prediction. This universal framework operates based on the weighted average of the probabilities of any suite of base learners and the final prediction is the aggregate of their respective votes. For illustrative purposes, three familiar heterogeneous classifiers, such as the Support Vector Machine, [Formula: see text]-Nearest Neighbor and Naïve Bayes, are utilized as candidates for ensemble classification using the proposed stacked framework. Further, the ensemble classifier built upon the framework is compared with others and evaluated using various cross-validation levels and percentage splits on a range of benchmark datasets. The outcome distinguishes the framework from the competition. The proposed framework is used to predict the crime propensity of prisoners most accurately, with 99.9901% accuracy.

Download Full-text

SUBiNN: a stacked uni- and bivariate kNN sparse ensemble

Advances in Data Analysis and Classification ◽

10.1007/s11634-021-00462-7 ◽

2021 ◽

Author(s):

Tiffany Elsten ◽

Mark de Rooij

Keyword(s):

Random Forests ◽

Nearest Neighbor ◽

Ensemble Methods ◽

Predictive Performance ◽

Ensemble Classifier ◽

Support Vector ◽

Data Sets ◽

Vector Machines ◽

Lasso Method ◽

Nearest Neighbor Classifiers

AbstractNearest Neighbor classification is an intuitive distance-based classification method. It has, however, two drawbacks: (1) it is sensitive to the number of features, and (2) it does not give information about the importance of single features or pairs of features. In stacking, a set of base-learners is combined in one overall ensemble classifier by means of a meta-learner. In this manuscript we combine univariate and bivariate nearest neighbor classifiers that are by itself easily interpretable. Furthermore, we combine these classifiers by a Lasso method that results in a sparse ensemble of nonlinear main and pairwise interaction effects. We christened the new method SUBiNN: Stacked Uni- and Bivariate Nearest Neighbors. SUBiNN overcomes the two drawbacks of simple nearest neighbor methods. In extensive simulations and using benchmark data sets, we evaluate the predictive performance of SUBiNN and compare it to other nearest neighbor ensemble methods as well as Random Forests and Support Vector Machines. Results indicate that SUBiNN often outperforms other nearest neighbor methods, that SUBiNN is well capable of identifying noise features, but that Random Forests is often, but not always, the best classifier.

Download Full-text

ANALISIS KESEHATAN TERUMBU KARANG BERDASARKAN KARAKTERISTIK SUNGAI, LAUT, DAN POPULASI AREA PEMUKIMAN MENGGUNAKAN MACHINE LEARNING

IJIS - Indonesian Journal On Information System ◽

10.36549/ijis.v5i2.119 ◽

2020 ◽

Vol 5 (2) ◽

Author(s):

Adinda miftahul Ilmi Habiba ◽

Agi Prasetiadi ◽

Cepi Ramdani

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Nearest Neighbor ◽

Ensemble Classifier ◽

Support Vector ◽

Learning Support ◽

K Nearest Neighbor

Penelitian ini untuk mengetahui kualitas kesehatan terumbu karang disuatu wilayah di Indonesia dengan mengambil beberapa faktor seperti wisatawan yang datang, latitude, longtitude, suhu, tahun, populasi warga, jumlah pemuda, dan jumlah industri, dan metode yang digunakan adalah machine learning dengan algoritma K-Nearest Neighbor, Support Vector Machine, dan Ensemble Classifier, untuk ensemble menggunkan randomforest untuk mengambil cabang-cabang pohon atau fitur keputusan yang paling relevan dengan output, penelitian ini diharapkan bisa menjadi acuan bagi wilayah yang kondisi terumbu karangnya masih kurang baik dapat mencontoh wilayah yang kondisi terumbu karangnya sudah baik dengan melihat faktor apa saja yang mempengaruhi terumbu karang disuatu wilayah itu masuk kategori baik. Hasil akhir dari penelitian ini pada algoritma K-Nearest Neighbor faktor yang berpengaruh bagi kesehatan terumbu karang yaitu wisatawan yang datang, latitude, longtitude, suhu, tahum dan pupulasi warga, sementara pada algoritma Support Vector Machine faktor yang berpengaruh wisatawan yang datang, Latitude, suhu dan tahun untuk algoritma Ensemble Classifier faktor yang berpengaruh wisatawan yang datang, latitude, longtitude, suhu dan jumlah industry, Pada kasus ini algoritma Support Vector Machine memiliki kinerja lebih baik dibandingkan K-Nearest Neighbor dan Ensemble Classifier.Kata Kunci: Ekosistem, Ensemble Classifier, K-Nearest Neighbor, Machine Learning, Support Vector Machine

Download Full-text

Employing Divergent Machine Learning Classifiers to Upgrade the Preciseness of Image Retrieval Systems

Cybernetics and Information Technologies ◽

10.2478/cait-2020-0029 ◽

2020 ◽

Vol 20 (3) ◽

pp. 75-85

Author(s):

Shefali Dhingra ◽

Poonam Bansal

Keyword(s):

Machine Learning ◽

Image Retrieval ◽

Nearest Neighbor ◽

Machine Learning Algorithms ◽

Visual Features ◽

Support Vector ◽

K Nearest Neighbor ◽

Retrieval Systems ◽

Retrieval Efficiency ◽

Benchmark Datasets

AbstractContent Based Image Retrieval (CBIR) system is an efficient search engine which has the potentiality of retrieving the images from huge repositories by extracting the visual features. It includes color, texture and shape. Texture is the most eminent feature among all. This investigation focuses upon the classification complications that crop up in case of big datasets. In this, texture techniques are explored with machine learning algorithms in order to increase the retrieval efficiency. We have tested our system on three texture techniques using various classifiers which are Support vector machine, K-Nearest Neighbor (KNN), Naïve Bayes and Decision Tree (DT). Variant evaluation metrics precision, recall, false alarm rate, accuracy etc. are figured out to measure the competence of the designed CBIR system on two benchmark datasets, i.e. Wang and Brodatz. Result shows that with both these datasets the KNN and DT classifier hand over superior results as compared to others.

Download Full-text

Prediction of Mechanical Properties of Steel using Data Science Techniques

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c3952.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 235-241 ◽

Cited By ~ 1

Keyword(s):

Mechanical Properties ◽

Tensile Strength ◽

Stainless Steel ◽

Data Science ◽

Nearest Neighbor ◽

Ensemble Methods ◽

Processing Parameters ◽

Support Vector ◽

K Nearest Neighbor ◽

Engineering Applications

Stainless steel is most extensively utilized material in all engineering applications, house hold products, constructions, because it is environment friendly and can be recycled. The principal purpose of this paper is to implement different data science algorithms for predicting stainless steel mechanical properties. Integrating Data science techniques in material science and engineering helps manufacturers, designers, researchers and students in understanding the selection, discovery and development of materials used for various engineering applications. Data science algorithms help to find out the properties of the material without performing any experiments. The Data Science techniques such as Random Forest, Neural Network, Linear regression, K- Nearest Neighbor, Support vector Machine, Decision Tree, and Ensemble methods are used for predicting Tensile Strength by specifying processing parameters of stainless steel like carbon content, sectional size, temperature, manufacturing process. The research here is developed as part of AICTE grant sanctioned under RPS scheme [19] and it aims to implement different data science algorithms for predicting Tensile strength of steel and identifying the algorithm with decent prediction accuracy.

Download Full-text

Identifying Modes of Driving Railway Trains from GPS Trajectory Data: An Ensemble Classifier-Based Approach

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi7080308 ◽

2018 ◽

Vol 7 (8) ◽

pp. 308 ◽

Cited By ~ 4

Author(s):

Han Zheng ◽

Zanyang Cui ◽

Xingchen Zhang

Keyword(s):

Nearest Neighbor ◽

Capacity Utilization ◽

Real Data ◽

Parameter Tuning ◽

Integrated Approach ◽

Ensemble Classifier ◽

Gradient Boosting ◽

Support Vector ◽

K Nearest Neighbor ◽

Trajectory Data

Recognizing Modes of Driving Railway Trains (MDRT) can help to solve railway freight transportation problems in driver behavior research, auto-driving system design and capacity utilization optimization. Previous studies have focused on analyses and applications of MDRT, but there is currently no approach to automatically and effectively identify MDRT in the context of big data. In this study, we propose an integrated approach including data preprocessing, feature extraction, classifiers modeling, training and parameter tuning, and model evaluation to infer MDRT using GPS data. The highlights of this study are as follows: First, we propose methods for extracting Driving Segmented Standard Deviation Features (DSSDF) combined with classical features for the purpose of improving identification performances. Second, we find the most suitable classifier for identifying MDRT based on a comparison of performances of K-Nearest Neighbor, Support Vector Machines, AdaBoost, Random Forest, Gradient Boosting Decision Tree, and XGBoost. From the real-data experiment, we conclude that: (i) The ensemble classifier XGBoost produces the best performance with an accuracy of 92.70%; (ii) The group of DSSDF plays an important role in identifying MDRT with an accuracy improvement of 11.2% (using XGBoost). The proposed approach has been applied in capacity utilization optimization and new driver training for the Baoshen Railway.

Download Full-text

Educational Data Classification using Data Mining and Kernel Ensemble Classifier

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b1238.1292s419 ◽

2019 ◽

Vol 9 (2S4) ◽

pp. 667-670

Keyword(s):

Data Mining ◽

Radial Basis Function ◽

Basis Function ◽

Data Classification ◽

Ensemble Classifier ◽

Ensemble Classification ◽

Support Vector ◽

Linear Polynomial ◽

Radial Basis ◽

Using Data

The success of students gives the good name for institution and it become popular. Due to the large number of student’s database it is difficult to identify the performance and activities of each student. The educational data mining is used to identify the performance and status of the students individually. In this study, the Educational Data Classification (EDC) using data mining technique and kernel ensemble classification using Support Vector Machine (SVM) based kernels like linear, polynomial, quadratic and Radial Basis Function (RBF) is discussed. Initially the data preprocessing is made to remove the raw data into understandable format. The SVM kernels like linear, polynomial, quadratic and radial basis function based ensemble classifier is used for classification of student’s data. The data mining is used for making final decision of student’s performance in class like activities and interaction with electronic learning system. The performance of the system is evaluated by kalboard 360 database. The performance of the system is made by classification accuracy of 72.52% using SVM kernel ensemble classification

Download Full-text

POLYNOMIAL NETWORKS VERSUS OTHER TECHNIQUES IN TEXT CATEGORIZATION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001408006247 ◽

2008 ◽

Vol 22 (02) ◽

pp. 295-322 ◽

Cited By ~ 5

Author(s):

MAYY M. AL-TAHRAWI ◽

RAED ABU ZITAR

Keyword(s):

High Performance ◽

Text Categorization ◽

Nearest Neighbor ◽

Classification Performance ◽

Support Vector ◽

K Nearest Neighbor ◽

New Techniques ◽

Vector Machines ◽

Benchmark Datasets ◽

Automatic Text

Many techniques and algorithms for automatic text categorization had been devised and proposed in the literature. However, there is still much space for researchers in this area to improve existing algorithms or come up with new techniques for text categorization (TC). Polynomial Networks (PNs) were never used before in TC. This can be attributed to the huge datasets used in TC, as well as the technique itself which has high computational demands. In this paper, we investigate and propose using PNs in TC. The proposed PN classifier has achieved a competitive classification performance in our experiments. More importantly, this high performance is achieved in one shot training (noniteratively) and using just 0.25%–0.5% of the corpora features. Experiments are conducted on the two benchmark datasets in TC: Reuters-21578 and the 20 Newsgroups. Five well-known classifiers are experimented on the same data and feature subsets: the state-of-the-art Support Vector Machines (SVM), Logistic Regression (LR), the k-nearest-neighbor (kNN), Naive Bayes (NB), and the Radial Basis Function (RBF) networks.

Download Full-text

Hybrid Disease Diagnosis Using Multiobjective Optimization with Evolutionary Parameter Optimization

Journal of Healthcare Engineering ◽

10.1155/2017/5907264 ◽

2017 ◽

Vol 2017 ◽

pp. 1-27 ◽

Cited By ~ 12

Author(s):

MadhuSudana Rao Nalluri ◽

Kannan K. ◽

Manisha M. ◽

Diptendu Sinha Roy

Keyword(s):

Sensitivity And Specificity ◽

Intelligent Systems ◽

Prediction Accuracy ◽

Statistical Tests ◽

Disease Diagnosis ◽

Ensemble Classification ◽

Support Vector ◽

Hybrid Intelligent Systems ◽

Benchmark Datasets ◽

Individual Classifier

With the widespread adoption of e-Healthcare and telemedicine applications, accurate, intelligent disease diagnosis systems have been profoundly coveted. In recent years, numerous individual machine learning-based classifiers have been proposed and tested, and the fact that a single classifier cannot effectively classify and diagnose all diseases has been almost accorded with. This has seen a number of recent research attempts to arrive at a consensus using ensemble classification techniques. In this paper, a hybrid system is proposed to diagnose ailments using optimizing individual classifier parameters for two classifier techniques, namely, support vector machine (SVM) and multilayer perceptron (MLP) technique. We employ three recent evolutionary algorithms to optimize the parameters of the classifiers above, leading to six alternative hybrid disease diagnosis systems, also referred to as hybrid intelligent systems (HISs). Multiple objectives, namely, prediction accuracy, sensitivity, and specificity, have been considered to assess the efficacy of the proposed hybrid systems with existing ones. The proposed model is evaluated on 11 benchmark datasets, and the obtained results demonstrate that our proposed hybrid diagnosis systems perform better in terms of disease prediction accuracy, sensitivity, and specificity. Pertinent statistical tests were carried out to substantiate the efficacy of the obtained results.

Download Full-text

Predicting hyperlipidemia using enhanced ensemble classifier

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.10693 ◽

2018 ◽

Vol 7 (3) ◽

pp. 1114

Author(s):

Lakshmi K S ◽

G Vadivu ◽

Suja Subramanian

Keyword(s):

Ensemble Classifier ◽

Ensemble Classification ◽

Support Vector ◽

Bayes Classifier ◽

Health Records ◽

Knn Classifier ◽

Classifier Performance ◽

Decision Tree Method ◽

Tree Method ◽

Better Than

Advancement in medical technology has resulted in bulk creation of electronic medical health records. These health records contain valuable data which are not fully utilized. Efficient usage of data mining techniques helps in discovering potentially relevant facts from medical records. Classification plays an important role in disease prediction. In this paper we developed a prediction model for predicting hyperlipidemia based on ensemble classification. Support Vector Machine, Naïve Bayes Classifier, KNN Classifier and Decision Tree method are combined for developing the ensemble classifier. Performance of each classifier is evaluated separately. An overall accuracy of 97.07% has been obtained by using ensemble approach which is better than the performance of each classifier.

Download Full-text

Urdu Sentiment Analysis Using Supervised Machine Learning Approach

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001418510011 ◽

2017 ◽

Vol 32 (02) ◽

pp. 1851001 ◽

Cited By ~ 17

Author(s):

Neelam Mukhtar ◽

Mohammad Abid Khan

Keyword(s):

Support Vector Machine ◽

Feature Extraction ◽

Decision Tree ◽

Sentiment Analysis ◽

Nearest Neighbor ◽

Supervised Machine Learning ◽

Support Vector ◽

Ensemble Of Classifiers ◽

Machine Learning Approach ◽

Better Than

From the last decade, Sentiment Analysis of languages such as English and Chinese are particularly the focus of attention but resource poor languages such as Urdu are mostly ignored by the research community, which is focused in this research. After acquiring data from various blogs of about 14 different genres, the data is being annotated with the help of human annotators. Three well-known classifiers, that is, Support Vector Machine, Decision tree and [Formula: see text]-Nearest Neighbor ([Formula: see text]-NN) are tested, their outputs are compared and their results are ultimately improved in several iterations after taking a number of steps that include stop words removal, feature extraction, identification and extraction of important features. extraction. Initially, the performance of the classifiers is not satisfactory as the accuracy achieved by all the three is below 50%. Ensemble of classifiers is also tried but the results are not fruitful (in terms of high accuracy). The results are analyzed carefully and improvements are made including feature extraction that raised the performance of these classifiers to a satisfactory level. It is further concluded that [Formula: see text]-NN is performing better than Support Vector Machine and Decision tree in terms of accuracy, precision, recall and [Formula: see text]-measure.

Download Full-text