A nearest-neighbor-based ensemble classifier and its large-sample optimality

Author(s):  
Majid Mojirsheibani ◽  
William Pouliot
2020 ◽  
Vol 5 (2) ◽  
Author(s):  
Adinda miftahul Ilmi Habiba ◽  
Agi Prasetiadi ◽  
Cepi Ramdani

Penelitian ini untuk mengetahui kualitas kesehatan terumbu karang disuatu wilayah di Indonesia dengan mengambil beberapa faktor seperti wisatawan yang datang, latitude, longtitude, suhu, tahun, populasi warga, jumlah pemuda, dan jumlah industri, dan metode yang digunakan adalah machine learning dengan algoritma K-Nearest Neighbor, Support Vector Machine, dan Ensemble Classifier, untuk ensemble menggunkan randomforest untuk mengambil cabang-cabang pohon atau fitur keputusan yang paling relevan dengan output, penelitian ini diharapkan bisa menjadi acuan bagi wilayah yang kondisi terumbu karangnya masih kurang baik dapat mencontoh wilayah yang kondisi terumbu karangnya sudah baik dengan melihat faktor apa saja yang mempengaruhi terumbu karang disuatu wilayah itu masuk kategori baik. Hasil akhir dari penelitian ini pada algoritma K-Nearest Neighbor faktor yang berpengaruh bagi kesehatan terumbu karang yaitu wisatawan yang datang, latitude, longtitude, suhu, tahum dan pupulasi warga, sementara pada algoritma Support Vector Machine faktor yang berpengaruh wisatawan yang datang, Latitude, suhu dan tahun untuk algoritma Ensemble Classifier faktor yang berpengaruh wisatawan yang datang, latitude, longtitude, suhu dan jumlah industry, Pada kasus ini algoritma Support Vector Machine memiliki kinerja lebih baik dibandingkan K-Nearest Neighbor dan Ensemble Classifier.Kata Kunci: Ekosistem, Ensemble Classifier, K-Nearest Neighbor, Machine Learning, Support Vector Machine 


2018 ◽  
Vol 7 (8) ◽  
pp. 308 ◽  
Author(s):  
Han Zheng ◽  
Zanyang Cui ◽  
Xingchen Zhang

Recognizing Modes of Driving Railway Trains (MDRT) can help to solve railway freight transportation problems in driver behavior research, auto-driving system design and capacity utilization optimization. Previous studies have focused on analyses and applications of MDRT, but there is currently no approach to automatically and effectively identify MDRT in the context of big data. In this study, we propose an integrated approach including data preprocessing, feature extraction, classifiers modeling, training and parameter tuning, and model evaluation to infer MDRT using GPS data. The highlights of this study are as follows: First, we propose methods for extracting Driving Segmented Standard Deviation Features (DSSDF) combined with classical features for the purpose of improving identification performances. Second, we find the most suitable classifier for identifying MDRT based on a comparison of performances of K-Nearest Neighbor, Support Vector Machines, AdaBoost, Random Forest, Gradient Boosting Decision Tree, and XGBoost. From the real-data experiment, we conclude that: (i) The ensemble classifier XGBoost produces the best performance with an accuracy of 92.70%; (ii) The group of DSSDF plays an important role in identifying MDRT with an accuracy improvement of 11.2% (using XGBoost). The proposed approach has been applied in capacity utilization optimization and new driver training for the Baoshen Railway.


Author(s):  
H. Benjamin Fredrick David ◽  
A. Suruliandi ◽  
S. P. Raja

Ensemble methods fabricate a sequence of classifiers for classifying fresh instances by procuring a weighted vote of their individual predictions. Toning down the error and increasing accuracy is an avant-garde problem in ensemble classification. This paper presents a novel generic object-oriented voting and weighting adapted stacking framework for utilizing an ensemble of classifiers for prediction. This universal framework operates based on the weighted average of the probabilities of any suite of base learners and the final prediction is the aggregate of their respective votes. For illustrative purposes, three familiar heterogeneous classifiers, such as the Support Vector Machine, [Formula: see text]-Nearest Neighbor and Naïve Bayes, are utilized as candidates for ensemble classification using the proposed stacked framework. Further, the ensemble classifier built upon the framework is compared with others and evaluated using various cross-validation levels and percentage splits on a range of benchmark datasets. The outcome distinguishes the framework from the competition. The proposed framework is used to predict the crime propensity of prisoners most accurately, with 99.9901% accuracy.


2019 ◽  
Vol 8 (4) ◽  
pp. 4132-4136

Handwritten signature is considered as one of the established authentication process to study the behavioral nature of a person. This paper focuses on verification of offline handwritten signatures (for English scripts) as either genuine or forgery. Here the considered samples are genuine, skilled and simple forgeries. The verification is carried out by ensembling the three base classifiers Naive Bayes (NB), K-Nearest Neighbor (KNN) and Kmeans classifiers. The accuracies for skilled and simple forgeries are obtained as 86 % and 92 % respectively.


Author(s):  
Tiffany Elsten ◽  
Mark de Rooij

AbstractNearest Neighbor classification is an intuitive distance-based classification method. It has, however, two drawbacks: (1) it is sensitive to the number of features, and (2) it does not give information about the importance of single features or pairs of features. In stacking, a set of base-learners is combined in one overall ensemble classifier by means of a meta-learner. In this manuscript we combine univariate and bivariate nearest neighbor classifiers that are by itself easily interpretable. Furthermore, we combine these classifiers by a Lasso method that results in a sparse ensemble of nonlinear main and pairwise interaction effects. We christened the new method SUBiNN: Stacked Uni- and Bivariate Nearest Neighbors. SUBiNN overcomes the two drawbacks of simple nearest neighbor methods. In extensive simulations and using benchmark data sets, we evaluate the predictive performance of SUBiNN and compare it to other nearest neighbor ensemble methods as well as Random Forests and Support Vector Machines. Results indicate that SUBiNN often outperforms other nearest neighbor methods, that SUBiNN is well capable of identifying noise features, but that Random Forests is often, but not always, the best classifier.


2019 ◽  
Vol 6 (3) ◽  
pp. 301
Author(s):  
Indri - Ati ◽  
Ari Kusyanti

<p class="Abstract">Pada awal masa perkembangan, beberapa anak mengalami hambatan diantaranya sulit untuk diam, sulit untuk berkonsentrasi dan mengontrol perilakunya, apabila anak mengalami gangguan pemusatan perhatian dan sulit mengontrol perilaku yang sesuai, dapat disebut dengan ADHD (Attention Deficit Hyperactive Disorder). Ini merupakan masalah yang serius dikarenakan anak penyandang ADHD mengalami masalah perilaku sosial, emosional dan mengalami kesulitan belajar sekolah sehingga akan mempengaruhi perkembangan pada masa dewasa anak penyandang ADHD. Oleh karena itu perlu diketahui gejala ADHD sejak dini, agar dapat dilakukan suatu penanganan dengan cepat dan tepat. Penelitian ini menghasilkan aplikasi yang digunakan untuk mendeteksi jenis ADHD berdasarkan gejala-gejala yang di masukkan oleh pengguna sehingga akan tampil hasil klasifikasi jenis ADHD nya secara otomatis. Aplikasi ini menggunakan metode Ensemble Classifier yaitu metode yang menggabungkan beberapa classifier agar dapat meningkatkan  akurasi yang dihasilkan. Pada tahap klasifikasi setiap data akan dihitung menggunakan  K-Nearest Neighbour (KNN), Fuzzy K-Nearest Neighbour (FKNN) dan Neighbour Weighted K-Nearest Neighbour (NWKNN).  Hasil perhitungan ketiga classifier  tersebut akan diproses kembali dengan metode  Ensemble Classifier dengan menggunakan majority voting untuk penentuan klasnya. Hasil akurasi tertinggi dari metode ensemble classifier yaitu 95% dengan nilai k optimal yaitu k=10. Akan tetapi semakin besar nilai k yaitu diatas k=20 maka nilai akurasi untuk masing-masing algoritme akan semakin turun. Hal ini dikarenakan semua algoritme penentuan klasifikasinya berdasarkan jumlah ketetanggaannya. Maka semakin banyak jumlah tetangga yang diperhitungkan maka kemungkinan salah klasifikasinya semakin besar.</p><p class="Abstract"> </p><p class="Abstract"><em><strong>Abstract</strong></em></p><p class="Abstract"><em>At the beginning of the development stage, some children experience difficulty to calm, to concentrate and to control their behavior. These symptoms are known as ADHD (Attention Deficit Hyperactive Disorder). This research develops an application that is used to defineADHD based on symptoms that that is entered by the user so that it will show its ADHD type automatically. This application uses the Ensemble Classifier method, in which a method that allows some classifier in order to increase the resulting value. At the classification stage each data will be calculated using K-Nearest Neighbor (KNN), Fuzzy K-Nearest Neighbor (FKNN) and Neighbor Weighted K-Nearest Neighbor (NWKNN). The results of the three classifier calculations will return using the Ensemble Classifier method using the majority voting for class determination. Acceptance results from the ensemble classifier method is 95% with the optimal k value k = 10. However, when the k value, i.e k &gt;=20 then the value for each algorithm will decrease. This is due to the calculation of all the classification algorithm based on the number of its neighbors. Therefore,  the more neighbours that are calculated then the possibility of misclassification is greater.</em></p><p class="Abstract"><em><strong><br /></strong></em></p>


Sensors ◽  
2020 ◽  
Vol 20 (23) ◽  
pp. 6897
Author(s):  
Lilia Aljihmani ◽  
Oussama Kerdjidj ◽  
Yibo Zhu ◽  
Ranjana K. Mehta ◽  
Madhav Erraguntla ◽  
...  

Fatigue is defined as “a loss of force-generating capacity” in a muscle that can intensify tremor. Tremor quantification can facilitate early detection of fatigue onset so that preventative or corrective controls can be taken to minimize work-related injuries and improve the performance of tasks that require high-levels of accuracy. We focused on developing a system that recognizes and classifies voluntary effort and detects phases of fatigue. The experiment was designed to extract and evaluate hand-tremor data during the performance of both rest and effort tasks. The data were collected from the wrist and finger of the participant’s dominant hand. To investigate tremor, time, frequency domain features were extracted from the accelerometer signal for segments of 45 and 90 samples/window. Analysis using advanced signal processing and machine-learning techniques such as decision tree, k-nearest neighbor, support vector machine, and ensemble classifiers were applied to discover models to classify rest and effort tasks and the phases of fatigue. Evaluation of the classifier’s performance was assessed based on various metrics using 5-fold cross-validation. The recognition of rest and effort tasks using an ensemble classifier based on the random subspace and window length of 45 samples was deemed to be the most accurate (96.1%). The highest accuracy (~98%) that distinguished between early and late fatigue phases was achieved using the same classifier and window length.


Sign in / Sign up

Export Citation Format

Share Document