Efficiently Producing the K Nearest Neighbors in the Skyline on Vertically Partitioned Tables

2013 ◽  
Vol 3 (2) ◽  
pp. 58-77
Author(s):  
Marlene Goncalves ◽  
Maria-Esther Vidal

Criteria that induce a Skyline naturally represent user's preference conditions useful to discard irrelevant data in large datasets. However, in the presence of high-dimensional Skyline spaces, the size of the Skyline can still be very large, making unfeasible for users to process this set of points. To identify the best points among the Skyline, the Top-k Skyline approach has been proposed. Top-k Skyline uses discriminatory criteria to induce a total order of the points that comprise the Skyline, and recognizes the best or top-k points based on these criteria. In this article the authors model queries as multi-dimensional points that represent bounds of VPT (Vertically Partitioned Table) property values, and datasets as sets of multi-dimensional points; the problem is to locate the k best tuples in the dataset whose distance to the query is minimized. A tuple is among the k best tuples whenever there is not another tuple that is better in all dimensions, and that is closer to the query point, i.e., the k best tuples correspond to the k nearest points to the query that are incomparable or belong to the skyline. The authors name these tuples the k nearest neighbors in the skyline. The authors propose a hybrid approach that combines Skyline and Top-k solutions and develop two algorithms: TKSI and k-NNSkyline. The proposed algorithms identify among the skyline tuples, the k ones with the lowest values of the distance metric, i.e., the k nearest neighbors to the multi-dimensional query that are incomparable. Empirically, we study the performance and quality of TKSI and k-NNSkyline. The authors’ experimental results show the TKSI is able to speed up the computation of the Top-k Skyline in at least 50% percent with respect to the state-of-the-art solutions, whenever k is smaller than the size of the Skyline. Additionally, the authors’ results suggest that k-NNSkyline outperforms existing solutions by up to three orders of magnitude.

Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 2940
Author(s):  
Luciano Ortenzi ◽  
Simone Figorilli ◽  
Corrado Costa ◽  
Federico Pallottino ◽  
Simona Violino ◽  
...  

The degree of olive maturation is a very important factor to consider at harvest time, as it influences the organoleptic quality of the final product, for both oil and table use. The Jaén index, evaluated by measuring the average coloring of olive fruits (peel and pulp), is currently considered to be one of the most indicative methods to determine the olive ripening stage, but it is a slow assay and its results are not objective. The aim of this work is to identify the ripeness degree of olive lots through a real-time, repeatable, and objective machine vision method, which uses RGB image analysis based on a k-nearest neighbors classification algorithm. To overcome different lighting scenarios, pictures were subjected to an automatic colorimetric calibration method—an advanced 3D algorithm using known values. To check the performance of the automatic machine vision method, a comparison was made with two visual operator image evaluations. For 10 images, the number of black, green, and purple olives was also visually evaluated by these two operators. The accuracy of the method was 60%. The system could be easily implemented in a specific mobile app developed for the automatic assessment of olive ripeness directly in the field, for advanced georeferenced data analysis.


2013 ◽  
Vol 51 ◽  
pp. 27-34 ◽  
Author(s):  
Jesús Bobadilla ◽  
Fernando Ortega ◽  
Antonio Hernando ◽  
Guillermo Glez-de-Rivera

Author(s):  
Ann Nosseir ◽  
Seif Eldin A. Ahmed

Having a system that classifies different types of fruits and identifies the quality of fruits will be of a value in various areas especially in an area of mass production of fruits’ products. This paper presents a novel system that differentiates between four fruits types and identifies the decayed ones from the fresh. The algorithms used are based on the colour and the texture features of the fruits’ images. The algorithms extract the RGB values and the first statistical order and second statistical of the Gray Level Co-occurrence Matrix (GLCM) values. To segregate between the fruits’ types, Fine, Medium, Coarse, Cosine, Cubic, and Weighted K-Nearest Neighbors algorithms are applied. The accuracy percentages of each are 96.3%, 93.8%, 25%, 83.8%, 90%, and 95% respectively.  These steps are tested with 46 pictures taken from a mobile phone of seasonal fruits at the time i.e., banana, apple, and strawberry. All types were accurately identifying.  To tell apart the decayed fruits from the fresh, the linear and quadratic Support Vector Machine (SVM) algorithms differentiated between them based on the colour segmentation and the texture feature algorithms values of each fruit image. The accuracy of the linear SVM is 96% and quadratic SVM 98%.


2019 ◽  
Vol 16 (10) ◽  
pp. 4425-4430 ◽  
Author(s):  
Devendra Prasad ◽  
Sandip Kumar Goyal ◽  
Avinash Sharma ◽  
Amit Bindal ◽  
Virendra Singh Kushwah

Machine Learning is a growing area in computer science in today’s era. This article is focusing on prediction analysis using K-Nearest Neighbors (KNN) Machine Learning algorithm. Data in the dataset are processed, analyzed and predicated using the specified algorithm. Introduction of various Machine Learning algorithms, its pros and cons have been discussed. The KNN algorithm with detail study is given and it is implemented on the specified data with certain parameters. The research work elucidates prediction analysis and explicates the prediction of quality of restaurants.


2019 ◽  
Vol 12 (4) ◽  
pp. 72
Author(s):  
Sara Alomari ◽  
Salha Abdullah

Concept maps have been used to assist learners as an effective learning method in identifying relationships between information, especially when teaching materials have many topics or concepts. However, making a manual concept map is a long and tedious task. It is time-consuming and demands an intensive effort in reading the full content and reasoning the relationships among concepts. Due to this inefficiency, many studies are carried out to develop intelligent algorithms using several data mining techniques. In this research, the authors aim at improving Text Analysis-Association Rules Mining (TA-ARM) algorithm using the weighted K-nearest neighbors (KNN) algorithm instead of the traditional KNN. The weighted KNN is expected to optimize the classification accuracy, which will, eventually, enhance the quality of the generated concept map.


2018 ◽  
Vol 8 (10) ◽  
pp. 1927 ◽  
Author(s):  
Zuzana Dankovičová ◽  
Dávid Sovák ◽  
Peter Drotár ◽  
Liberios Vokorokos

This paper addresses the processing of speech data and their utilization in a decision support system. The main aim of this work is to utilize machine learning methods to recognize pathological speech, particularly dysphonia. We extracted 1560 speech features and used these to train the classification model. As classifiers, three state-of-the-art methods were used: K-nearest neighbors, random forests, and support vector machine. We analyzed the performance of classifiers with and without gender taken into account. The experimental results showed that it is possible to recognize pathological speech with as high as a 91.3% classification accuracy.


Author(s):  
Xinzhong Zhu ◽  
Xinwang Liu ◽  
Miaomiao Li ◽  
En Zhu ◽  
Li Liu ◽  
...  

The recently proposed multiple kernel k-means with incomplete kernels (MKKM-IK) optimally integrates a group of pre-specified incomplete kernel matrices to improve clustering performance. Though it demonstrates promising performance in various applications, we observe that it does not \emph{sufficiently  consider the local structure among data and indiscriminately forces all pairwise sample similarity to equally align with their ideal similarity values}. This could make the incomplete kernels less effectively imputed, and in turn adversely affect the clustering performance. In this paper, we propose a novel localized incomplete multiple kernel k-means (LI-MKKM) algorithm to address this issue. Different from existing MKKM-IK, LI-MKKM only requires the similarity of a sample to its k-nearest neighbors to align with their ideal similarity values. This helps the clustering algorithm to focus on closer sample pairs that shall stay together and avoids involving unreliable similarity evaluation for farther sample pairs. We carefully design a three-step iterative algorithm to solve the resultant optimization problem and theoretically prove its convergence. Comprehensive experiments on eight benchmark datasets demonstrate that our algorithm significantly outperforms the state-of-the-art comparable algorithms proposed in the recent literature, verifying the advantage of considering local structure.


Sign in / Sign up

Export Citation Format

Share Document