neighbor classification
Recently Published Documents


TOTAL DOCUMENTS

396
(FIVE YEARS 61)

H-INDEX

43
(FIVE YEARS 3)

2022 ◽  
Vol 2022 ◽  
pp. 1-11
Author(s):  
Yunsheng Song ◽  
Xiaohan Kong ◽  
Chao Zhang

Owing to the absence of hypotheses of the underlying distributions of the data and the strong generation ability, the k -nearest neighbor (kNN) classification algorithm is widely used to face recognition, text classification, emotional analysis, and other fields. However, kNN needs to compute the similarity between the unlabeled instance and all the training instances during the prediction process; it is difficult to deal with large-scale data. To overcome this difficulty, an increasing number of acceleration algorithms based on data partition are proposed. However, they lack theoretical analysis about the effect of data partition on classification performance. This paper has made a theoretical analysis of the effect using empirical risk minimization and proposed a large-scale k -nearest neighbor classification algorithm based on neighbor relationship preservation. The process of searching the nearest neighbors is converted to a constrained optimization problem. Then, it gives the estimation of the difference on the objective function value under the optimal solution with data partition and without data partition. According to the obtained estimation, minimizing the similarity of the instances in the different divided subsets can largely reduce the effect of data partition. The minibatch k -means clustering algorithm is chosen to perform data partition for its effectiveness and efficiency. Finally, the nearest neighbors of the test instance are continuously searched from the set generated by successively merging the candidate subsets until they do not change anymore, where the candidate subsets are selected based on the similarity between the test instance and cluster centers. Experiment results on public datasets show that the proposed algorithm can largely keep the same nearest neighbors and no significant difference in classification accuracy as the original kNN classification algorithm and better results than two state-of-the-art algorithms.


Author(s):  
Abdellah Agrima ◽  
Ilham Mounir ◽  
Abdelmajid Farchi ◽  
Laila Elmaazouzi ◽  
Badia Mounir

In this article, we present an automatic technique for recognizing emotional states from speech signals. The main focus of this paper is to present an efficient and reduced set of acoustic features that allows us to recognize the four basic human emotions (anger, sadness, joy, and neutral). The proposed features vector is composed by twenty-eight measurements corresponding to standard acoustic features such as formants, fundamental frequency (obtained by Praat software) as well as introducing new features based on the calculation of the energies in some specific frequency bands and their distributions (thanks to MATLAB codes). The extracted measurements are obtained from syllabic units’ consonant/vowel (CV) derived from Moroccan Arabic dialect emotional database (MADED) corpus. Thereafter, the data which has been collected is then trained by a k-nearest-neighbor (KNN) classifier to perform the automated recognition phase. The results reach 64.65% in the multi-class classification and 94.95% for classification between positive and negative emotions.


2021 ◽  
Vol 11 (21) ◽  
pp. 10361
Author(s):  
Decheng Hsieh ◽  
Lieuhen Chen ◽  
Taiping Sun

The discretionary damage of mental suffering in fatal car accident cases in Taiwan is subjective, uncertain, and unpredictable; thus, plaintiffs, defendants, and their lawyers find it difficult to judge whether spending much of their money and time on the lawsuit is worthwhile and which legal factors judges will consider important and dominant when they are assessing the mental suffering damages. To address these problems, we propose k-nearest neighbor, classification and regression trees, and random forests as learning algorithms for regression to build optimal predictive models. In addition, we reveal the importance ranking of legal factors by permutation feature importance. The experimental results show that the random forest model outperformed the other models and achieved good performance, and “the mental suffering damages that plaintiff claims” and “the age of the victim” play important roles in assessments of mental suffering damages in fatal car accident cases in Taiwan. Therefore, litigants and their lawyers can predict the discretionary damages of mental suffering in advance and wisely decide whether they should litigate or not, and then they can focus on the crucial legal factors and develop the best litigation strategy.


2021 ◽  
Author(s):  
Yong Li

BACKGROUND Preventing in-hospital mortality in Patients with ST-segment elevation myocardial infarction (STEMI) is a crucial step. OBJECTIVE The objective of our research was to to develop and externally validate the diagnostic model of in-hospital mortality in acute STEMI patients used artificial intelligence methods. METHODS As our datasets were highly imbalanced, we evaluated the effect of down-sampling methods. Therefore, down-sampling techniques was additionally implemented on the original dataset to create 1 balanced datasets. This ultimately yielded 2 datasets; original, and down-sampling. We divide non-randomly the American population into a training set and a test set , and anther American population as the validation set. We used artificial intelligence methods to develop and externally validate the diagnostic model of in-hospital mortality in acute STEMI patients, including logistic regression, decision tree, extreme gradient boosting (XGBoost), K nearest neighbor classification model ,and multi-layer perceptron.We used confusion matrix combined with the area under the receiver operating characteristic curve (AUC) to evaluate the pros and cons of the above models. RESULTS The strongest predictors of in-hospital mortality were age, female, cardiogenic shock, atrial fibrillation(AF), ventricular fibrillation(VF),in-hospital bleeding and medical history such as hypertension, old myocardial infarction.The F2 score of logistic regression in the training set, the test set , and the validation data set were 0.7, 0.7, and 0.54 respectively.The F2 score of XGBoost were 0.74, 0.52, and 0.54 respectively. The F2 score of decision tree were 0.72, 0.51,and 0.52 respectively. The F2 score of K nearest neighbor classification model were 0.64,0.47, and 0.49 respectively. The F2 score of multi-layer perceptron were 0.71, 0.54, and 0.54 respectively. The AUC of logistic regression in the training set, the test set, and the validation data set were 0.72, 0.73, and 0.76 respectively. The AUC of XGoBost were 0.75, 0.73, and 0.75 respectively. The AUC of decision tree were 0.75, 0.71,and 0.74 respectively. The AUC of K nearest neighbor classification model were 0.71,0.69, and 0.72 respectively. The AUC of multi-layer perceptron were 0.73, 0.74, and 0.75 respectively. The diagnostic model built by logistic regression was the best. CONCLUSIONS The strongest predictors of in-hospital mortality were age, female, cardiogenic shock, AF, VF,in-hospital bleeding and medical history such as hypertension, old myocardial infarction. We had used artificial intelligence methods developed and externally validated the diagnostic model of in-hospital mortality in acute STEMI patients.The diagnostic model built by logistic regression was the best. CLINICALTRIAL We registered this study with WHO International Clinical Trials Registry Platform (ICTRP) (registration number: ChiCTR1900027129; registered date: 1 November 2019). http://www.chictr.org.cn/edit.aspx?pid=44888&htm=4.


2021 ◽  
Vol 13 (2) ◽  
pp. 76-83
Author(s):  
Ridho Ananda ◽  
Agi Prasetiadi

Classification is one of the data mining topics that will predict an object to go into a certain group. The prediction process can be performed by using similarity measures, classification trees, or regression. On the other hand, Procrustes refers to a technique of matching two configurations that have been implemented for outlier detection. Based on the result, Procrustes has a potential to tackle the misclassification problem when the outliers are assumed as the misclassified object. Therefore, the Procrustes classification algorithm (PrCA) and Procrustes nearest neighbor classification algorithm (PNNCA) were proposed in this paper. The results of those algorithms had been compared to the classical classification algorithms, namely k-Nearest Neighbor (k-NN), Support Vector Machine (SVM), AdaBoost (AB), Random Forest (RF), Logistic Regression (LR), and Ridge Regression (RR). The data used were iris, cancer, liver, seeds, and wine dataset. The minimum and maximum accuracy values obtained by the PrCA algorithm were 0.610 and 0.925, while the PNNCA were 0.610 and 0.963. PrCA was generally better than k-NN, SVM, and AB. Meanwhile, PNNCA was generally better than k-NN, SVM, AB, and RF. Based on the results, PrCA and PNNCA certainly deserve to be proposed as a new approach in the classification process.


Author(s):  
Lin Qiu ◽  
Yanpeng Qu ◽  
Changjing Shang ◽  
Longzhi Yang ◽  
Fei Chao ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document