A nearest-neighbor-based ensemble classifier and its large-sample optimality

ANALISIS KESEHATAN TERUMBU KARANG BERDASARKAN KARAKTERISTIK SUNGAI, LAUT, DAN POPULASI AREA PEMUKIMAN MENGGUNAKAN MACHINE LEARNING

IJIS - Indonesian Journal On Information System ◽

10.36549/ijis.v5i2.119 ◽

2020 ◽

Vol 5 (2) ◽

Author(s):

Adinda miftahul Ilmi Habiba ◽

Agi Prasetiadi ◽

Cepi Ramdani

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Nearest Neighbor ◽

Ensemble Classifier ◽

Support Vector ◽

Learning Support ◽

K Nearest Neighbor

Penelitian ini untuk mengetahui kualitas kesehatan terumbu karang disuatu wilayah di Indonesia dengan mengambil beberapa faktor seperti wisatawan yang datang, latitude, longtitude, suhu, tahun, populasi warga, jumlah pemuda, dan jumlah industri, dan metode yang digunakan adalah machine learning dengan algoritma K-Nearest Neighbor, Support Vector Machine, dan Ensemble Classifier, untuk ensemble menggunkan randomforest untuk mengambil cabang-cabang pohon atau fitur keputusan yang paling relevan dengan output, penelitian ini diharapkan bisa menjadi acuan bagi wilayah yang kondisi terumbu karangnya masih kurang baik dapat mencontoh wilayah yang kondisi terumbu karangnya sudah baik dengan melihat faktor apa saja yang mempengaruhi terumbu karang disuatu wilayah itu masuk kategori baik. Hasil akhir dari penelitian ini pada algoritma K-Nearest Neighbor faktor yang berpengaruh bagi kesehatan terumbu karang yaitu wisatawan yang datang, latitude, longtitude, suhu, tahum dan pupulasi warga, sementara pada algoritma Support Vector Machine faktor yang berpengaruh wisatawan yang datang, Latitude, suhu dan tahun untuk algoritma Ensemble Classifier faktor yang berpengaruh wisatawan yang datang, latitude, longtitude, suhu dan jumlah industry, Pada kasus ini algoritma Support Vector Machine memiliki kinerja lebih baik dibandingkan K-Nearest Neighbor dan Ensemble Classifier.Kata Kunci: Ekosistem, Ensemble Classifier, K-Nearest Neighbor, Machine Learning, Support Vector Machine

Download Full-text

LARGE SAMPLE PROPERTIES OF NEAREST NEIGHBOR DENSITY FUNCTION ESTIMATORS

Statistical Decision Theory and Related Topics ◽

10.1016/b978-0-12-307560-4.50018-1 ◽

1977 ◽

pp. 269-279 ◽

Cited By ~ 23

Author(s):

David S. Moore ◽

James W. Yackel

Keyword(s):

Density Function ◽

Nearest Neighbor ◽

Large Sample

Download Full-text

Identifying Modes of Driving Railway Trains from GPS Trajectory Data: An Ensemble Classifier-Based Approach

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi7080308 ◽

2018 ◽

Vol 7 (8) ◽

pp. 308 ◽

Cited By ~ 4

Author(s):

Han Zheng ◽

Zanyang Cui ◽

Xingchen Zhang

Keyword(s):

Nearest Neighbor ◽

Capacity Utilization ◽

Real Data ◽

Parameter Tuning ◽

Integrated Approach ◽

Ensemble Classifier ◽

Gradient Boosting ◽

Support Vector ◽

K Nearest Neighbor ◽

Trajectory Data

Recognizing Modes of Driving Railway Trains (MDRT) can help to solve railway freight transportation problems in driver behavior research, auto-driving system design and capacity utilization optimization. Previous studies have focused on analyses and applications of MDRT, but there is currently no approach to automatically and effectively identify MDRT in the context of big data. In this study, we propose an integrated approach including data preprocessing, feature extraction, classifiers modeling, training and parameter tuning, and model evaluation to infer MDRT using GPS data. The highlights of this study are as follows: First, we propose methods for extracting Driving Segmented Standard Deviation Features (DSSDF) combined with classical features for the purpose of improving identification performances. Second, we find the most suitable classifier for identifying MDRT based on a comparison of performances of K-Nearest Neighbor, Support Vector Machines, AdaBoost, Random Forest, Gradient Boosting Decision Tree, and XGBoost. From the real-data experiment, we conclude that: (i) The ensemble classifier XGBoost produces the best performance with an accuracy of 92.70%; (ii) The group of DSSDF plays an important role in identifying MDRT with an accuracy improvement of 11.2% (using XGBoost). The proposed approach has been applied in capacity utilization optimization and new driver training for the Baoshen Railway.

Download Full-text

Stacked Framework for Ensemble of Heterogeneous Classification Algorithms

Journal of Circuits System and Computers ◽

10.1142/s0218126621502698 ◽

2021 ◽

pp. 2150269

Author(s):

H. Benjamin Fredrick David ◽

A. Suruliandi ◽

S. P. Raja

Keyword(s):

Nearest Neighbor ◽

Weighted Average ◽

Ensemble Methods ◽

Ensemble Classifier ◽

Ensemble Classification ◽

Support Vector ◽

Weighted Vote ◽

Ensemble Of Classifiers ◽

Avant Garde ◽

Benchmark Datasets

Ensemble methods fabricate a sequence of classifiers for classifying fresh instances by procuring a weighted vote of their individual predictions. Toning down the error and increasing accuracy is an avant-garde problem in ensemble classification. This paper presents a novel generic object-oriented voting and weighting adapted stacking framework for utilizing an ensemble of classifiers for prediction. This universal framework operates based on the weighted average of the probabilities of any suite of base learners and the final prediction is the aggregate of their respective votes. For illustrative purposes, three familiar heterogeneous classifiers, such as the Support Vector Machine, [Formula: see text]-Nearest Neighbor and Naïve Bayes, are utilized as candidates for ensemble classification using the proposed stacked framework. Further, the ensemble classifier built upon the framework is compared with others and evaluated using various cross-validation levels and percentage splits on a range of benchmark datasets. The outcome distinguishes the framework from the competition. The proposed framework is used to predict the crime propensity of prisoners most accurately, with 99.9901% accuracy.

Download Full-text

A Signature Verification System with Ensemble Classifier

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c5445.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 4132-4136

Keyword(s):

Nearest Neighbor ◽

Naive Bayes ◽

Ensemble Classifier ◽

Naïve Bayes ◽

Signature Verification ◽

K Nearest Neighbor ◽

Verification System ◽

Handwritten Signature

Handwritten signature is considered as one of the established authentication process to study the behavioral nature of a person. This paper focuses on verification of offline handwritten signatures (for English scripts) as either genuine or forgery. Here the considered samples are genuine, skilled and simple forgeries. The verification is carried out by ensembling the three base classifiers Naive Bayes (NB), K-Nearest Neighbor (KNN) and Kmeans classifiers. The accuracies for skilled and simple forgeries are obtained as 86 % and 92 % respectively.

Download Full-text

SUBiNN: a stacked uni- and bivariate kNN sparse ensemble

Advances in Data Analysis and Classification ◽

10.1007/s11634-021-00462-7 ◽

2021 ◽

Author(s):

Tiffany Elsten ◽

Mark de Rooij

Keyword(s):

Random Forests ◽

Nearest Neighbor ◽

Ensemble Methods ◽

Predictive Performance ◽

Ensemble Classifier ◽

Support Vector ◽

Data Sets ◽

Vector Machines ◽

Lasso Method ◽

Nearest Neighbor Classifiers

AbstractNearest Neighbor classification is an intuitive distance-based classification method. It has, however, two drawbacks: (1) it is sensitive to the number of features, and (2) it does not give information about the importance of single features or pairs of features. In stacking, a set of base-learners is combined in one overall ensemble classifier by means of a meta-learner. In this manuscript we combine univariate and bivariate nearest neighbor classifiers that are by itself easily interpretable. Furthermore, we combine these classifiers by a Lasso method that results in a sparse ensemble of nonlinear main and pairwise interaction effects. We christened the new method SUBiNN: Stacked Uni- and Bivariate Nearest Neighbors. SUBiNN overcomes the two drawbacks of simple nearest neighbor methods. In extensive simulations and using benchmark data sets, we evaluate the predictive performance of SUBiNN and compare it to other nearest neighbor ensemble methods as well as Random Forests and Support Vector Machines. Results indicate that SUBiNN often outperforms other nearest neighbor methods, that SUBiNN is well capable of identifying noise features, but that Random Forests is often, but not always, the best classifier.

Download Full-text

Metode Ensemble Classifier untuk Mendeteksi Jenis Attention Deficit Hyperactivity Disorder (SDHD) pada Anak Usia Dini

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.2019631313 ◽

2019 ◽

Vol 6 (3) ◽

pp. 301

Author(s):

Indri - Ati ◽

Ari Kusyanti

Keyword(s):

Attention Deficit ◽

Nearest Neighbor ◽

Ensemble Classifier ◽

Development Stage ◽

Majority Voting ◽

Nearest Neighbour ◽

K Nearest Neighbor ◽

K Value ◽

Attention Deficit Hyperactive Disorder ◽

Hyperactivity Disorder

Pada awal masa perkembangan, beberapa anak mengalami hambatan diantaranya sulit untuk diam, sulit untuk berkonsentrasi dan mengontrol perilakunya, apabila anak mengalami gangguan pemusatan perhatian dan sulit mengontrol perilaku yang sesuai, dapat disebut dengan ADHD (Attention Deficit Hyperactive Disorder). Ini merupakan masalah yang serius dikarenakan anak penyandang ADHD mengalami masalah perilaku sosial, emosional dan mengalami kesulitan belajar sekolah sehingga akan mempengaruhi perkembangan pada masa dewasa anak penyandang ADHD. Oleh karena itu perlu diketahui gejala ADHD sejak dini, agar dapat dilakukan suatu penanganan dengan cepat dan tepat. Penelitian ini menghasilkan aplikasi yang digunakan untuk mendeteksi jenis ADHD berdasarkan gejala-gejala yang di masukkan oleh pengguna sehingga akan tampil hasil klasifikasi jenis ADHD nya secara otomatis. Aplikasi ini menggunakan metode Ensemble Classifier yaitu metode yang menggabungkan beberapa classifier agar dapat meningkatkan akurasi yang dihasilkan. Pada tahap klasifikasi setiap data akan dihitung menggunakan K-Nearest Neighbour (KNN), Fuzzy K-Nearest Neighbour (FKNN) dan Neighbour Weighted K-Nearest Neighbour (NWKNN). Hasil perhitungan ketiga classifier tersebut akan diproses kembali dengan metode Ensemble Classifier dengan menggunakan majority voting untuk penentuan klasnya. Hasil akurasi tertinggi dari metode ensemble classifier yaitu 95% dengan nilai k optimal yaitu k=10. Akan tetapi semakin besar nilai k yaitu diatas k=20 maka nilai akurasi untuk masing-masing algoritme akan semakin turun. Hal ini dikarenakan semua algoritme penentuan klasifikasinya berdasarkan jumlah ketetanggaannya. Maka semakin banyak jumlah tetangga yang diperhitungkan maka kemungkinan salah klasifikasinya semakin besar. AbstractAt the beginning of the development stage, some children experience difficulty to calm, to concentrate and to control their behavior. These symptoms are known as ADHD (Attention Deficit Hyperactive Disorder). This research develops an application that is used to defineADHD based on symptoms that that is entered by the user so that it will show its ADHD type automatically. This application uses the Ensemble Classifier method, in which a method that allows some classifier in order to increase the resulting value. At the classification stage each data will be calculated using K-Nearest Neighbor (KNN), Fuzzy K-Nearest Neighbor (FKNN) and Neighbor Weighted K-Nearest Neighbor (NWKNN). The results of the three classifier calculations will return using the Ensemble Classifier method using the majority voting for class determination. Acceptance results from the ensemble classifier method is 95% with the optimal k value k = 10. However, when the k value, i.e k >=20 then the value for each algorithm will decrease. This is due to the calculation of all the classification algorithm based on the number of its neighbors. Therefore, the more neighbours that are calculated then the possibility of misclassification is greater.

Download Full-text

Classification of Fatigue Phases in Healthy and Diabetic Adults Using Wearable Sensor

Sensors ◽

10.3390/s20236897 ◽

2020 ◽

Vol 20 (23) ◽

pp. 6897

Author(s):

Lilia Aljihmani ◽

Oussama Kerdjidj ◽

Yibo Zhu ◽

Ranjana K. Mehta ◽

Madhav Erraguntla ◽

...

Keyword(s):

Nearest Neighbor ◽

Ensemble Classifier ◽

Machine Learning Techniques ◽

Support Vector ◽

Dominant Hand ◽

Ensemble Classifiers ◽

Window Length ◽

K Nearest Neighbor ◽

Hand Tremor ◽

Time Frequency

Fatigue is defined as “a loss of force-generating capacity” in a muscle that can intensify tremor. Tremor quantification can facilitate early detection of fatigue onset so that preventative or corrective controls can be taken to minimize work-related injuries and improve the performance of tasks that require high-levels of accuracy. We focused on developing a system that recognizes and classifies voluntary effort and detects phases of fatigue. The experiment was designed to extract and evaluate hand-tremor data during the performance of both rest and effort tasks. The data were collected from the wrist and finger of the participant’s dominant hand. To investigate tremor, time, frequency domain features were extracted from the accelerometer signal for segments of 45 and 90 samples/window. Analysis using advanced signal processing and machine-learning techniques such as decision tree, k-nearest neighbor, support vector machine, and ensemble classifiers were applied to discover models to classify rest and effort tasks and the phases of fatigue. Evaluation of the classifier’s performance was assessed based on various metrics using 5-fold cross-validation. The recognition of rest and effort tasks using an ensemble classifier based on the random subspace and window length of 45 samples was deemed to be the most accurate (96.1%). The highest accuracy (~98%) that distinguished between early and late fatigue phases was achieved using the same classifier and window length.

Download Full-text