Nearest Neighbour (NN) and k-Nearest Neighbour (kNN) Supervised Classification Algorithms

Author(s):  
Monica Borda ◽  
Romulus Terebes ◽  
Raul Malutan ◽  
Ioana Ilea ◽  
Mihaela Cislariu ◽  
...  
2008 ◽  
Vol 17 (03) ◽  
pp. 415-431 ◽  
Author(s):  
JASON CHAN ◽  
IRENA KOPRINSKA ◽  
JOSIAH POON

Traditional supervised classification algorithms require a large number of labelled examples to perform accurately. Semi-supervised classification algorithms attempt to overcome this major limitation by also using unlabelled examples. Unlabelled examples have also been used to improve nearest neighbour text classification in a method called bridging. In this paper, we propose the use of bridging in a semi-supervised setting. We introduce a new bridging algorithm that can be used as a base classifier in most semi-supervised approaches. We empirically show that the classification performance of two semi-supervised algorithms, self-learning and co-training, improves with the use of our new bridging algorithm in comparison to using the standard classifier, JRipper. We propose a similarity metric for short texts and also study the performance of self-learning with a number of instance selection heuristics.


Author(s):  
Zainuri Saringat ◽  
Aida Mustapha ◽  
R. D. Rohmat Saedudin ◽  
Noor Azah Samsudin

Chronic Kidney Disease (CKD) is one of the leading cause of death contributed by other illnesses such as diabetes, hypertension, lupus, anemia or weak bones that lead to bone fractures. Early prediction of CKD is important in order to contain the disesase. However, instead of predicting the severity of CKD, the objective of this paper is to predict the diagnosis of CKD based on the symptoms or attributes observed in a particular case, whether the stage is acute or chronic. To achieve this, a classification model is proposed to label stage of severity for kidney diseases patients. The experiments then investigated the performance of the proposed classification model based on eight supervised classification algorithms, which are ZeroR, Rule Induction, Support Vector Machine, Naïve Bayes, Decision Tree, Decision Stump, k-Nearest Neighbour, and Classification via Regression. The performance of the all classifiers is evaluated based on accuracy, precision, and recall. The results showed that the regression classifier perform best in the kidney diagnostic procedure.


2020 ◽  
Vol 19 (01) ◽  
pp. 2040015
Author(s):  
Ahmad Alaiad ◽  
Hassan Najadat ◽  
Belal Mohsen ◽  
Khaled Balhaf

Background and objective: Chronic kidney disease (CKD) is one of the deadly diseases that can affect a lot of vital organs in the human body such as heart, liver, and lungs. Many individuals might be at early stage of kidney disease and not have any signs, which might lead to a sudden death. Previous research showed that early prediction of CKD is very important in the medical field for physicians’ decision-making and patients’ health and life. To this end, constructing an efficient prediction system for CKD, which is the goal of this paper, often reduces medical errors and overall healthcare cost. Methods: Classification and association rule mining techniques were integrated and utilised to construct an efficient system for predicting and diagnosing CKD and its causes using weka and SPSS as platform environments. In particular, five classification algorithms, namely, naive Bayes, decision tree, support vector machine, K-nearest neighbour, and JRip were used to achieve the research goal. In addition, Apriori algorithm was used to discover strong relationship rules between attributes. The experiments were conducted on real medical dataset collected from hospitals and patient monitoring systems. Results: The experiments achieved high accuracy of 98.50% for K-nearest neighbour (KNN) classifier and achieved 96.00% when using classier based on association rule (JRip). Conclusions: We conclude by showing that applying integrative approach by combining classification algorithms and association rule mining can significantly improve the classification accuracy and be more useful for CKD prediction. This research has also several theoretical and practical implications for the medical field and healthcare industry.


2021 ◽  
Author(s):  
jorge cabrera Alvargonzalez ◽  
Ana Larranaga Janeiro ◽  
Sonia Perez ◽  
Javier Martinez Torres ◽  
Lucia martinez lamas ◽  
...  

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been and remains one of the major challenges humanity has faced thus far. Over the past few months, large amounts of information have been collected that are only now beginning to be assimilated. In the present work, the existence of residual information in the massive numbers of rRT-PCRs that tested positive out of the almost half a million tests that were performed during the pandemic is investigated. This residual information is believed to be highly related to a pattern in the number of cycles that are necessary to detect positive samples as such. Thus, a database of more than 20,000 positive samples was collected, and two supervised classification algorithms (a support vector machine and a neural network) were trained to temporally locate each sample based solely and exclusively on the number of cycles determined in the rRT-PCR of each individual. Finally, the results obtained from the classification show how the appearance of each wave is coincident with the surge of each of the variants present in the region of Galicia (Spain) during the development of the SARS-CoV-2 pandemic and clearly identified with the classification algorithm.


Author(s):  
Tobias Scheffer

For many classification problems, unlabeled training data are inexpensive and readily available, whereas labeling training data imposes costs. Semi-supervised classification algorithms aim at utilizing information contained in unlabeled data in addition to the (few) labeled data.


2020 ◽  
Vol 37 (4) ◽  
pp. 723-739 ◽  
Author(s):  
Anton Sokolov ◽  
Egor Dmitriev ◽  
Cyril Gengembre ◽  
Hervé Delbarre

AbstractThe problem is considered of atmospheric meteorological events’ classification, such as sea breezes, fogs, and high winds, in coastal areas. In situ wind, temperature, humidity, pressure, radiance, and turbulence meteorological measurements are used as predictors. Local atmospheric events of 2013–14 were analyzed and classified manually using data of the measurement campaign in the coastal area of the English Channel in Dunkirk, France. The results of that categorization allowed the training of a few supervised classification algorithms using the data of an ultrasonic anemometer as predictors. The comparison was carried out for the K-nearest-neighbors classifier, support vector machine, and two Bayesian classifiers—quadratic discriminant analysis and Parzen–Rozenblatt window. The analysis showed that the K-nearest-neighbors and quadratic discriminant analysis classifiers reveal the best classification accuracy (up to 80% correctly classified meteorological events). The latter classifier has higher calculation speed and is less sensitive to unbalanced data and the overtraining problem. The most informative atmospheric parameters for events recognition were revealed for each algorithm. The results obtained showed that supervised classification algorithms contribute to automation of processing and analyzing of local meteorological measurements.


Sign in / Sign up

Export Citation Format

Share Document