Stochastic Temporal Data Upscaling Using the Generalized k-Nearest Neighbor Algorithm

Three methods of temporal data upscaling, which may collectively be called the generalized k-nearest neighbor (GkNN) method, are considered. The accuracy of the GkNN simulation of month by month yield is considered (where the term yield denotes the dependent variable). The notion of an eventually well-distributed time series is introduced and on the basis of this assumption some properties of the average annual yield and its variance for a GkNN simulation are computed. The total yield over a planning period is determined and a general framework for considering the GkNN algorithm based on the notion of stochastically dependent time series is described and it is shown that for a sufficiently large training set the GkNN simulation has the same statistical properties as the training data. An example of the application of the methodology is given in the problem of simulating yield of a rainwater tank given monthly climatic data.

Download Full-text

k-Nearest Neighbor Learning with Graph Neural Networks

Mathematics ◽

10.3390/math9080830 ◽

2021 ◽

Vol 9 (8) ◽

pp. 830

Author(s):

Seokho Kang

Keyword(s):

Neural Network ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Weighting Function ◽

High Sensitivity ◽

Training Data ◽

K Nearest Neighbor ◽

Main Challenge ◽

Benchmark Datasets ◽

Graph Neural Networks

k-nearest neighbor (kNN) is a widely used learning algorithm for supervised learning tasks. In practice, the main challenge when using kNN is its high sensitivity to its hyperparameter setting, including the number of nearest neighbors k, the distance function, and the weighting function. To improve the robustness to hyperparameters, this study presents a novel kNN learning method based on a graph neural network, named kNNGNN. Given training data, the method learns a task-specific kNN rule in an end-to-end fashion by means of a graph neural network that takes the kNN graph of an instance to predict the label of the instance. The distance and weighting functions are implicitly embedded within the graph neural network. For a query instance, the prediction is obtained by performing a kNN search from the training data to create a kNN graph and passing it through the graph neural network. The effectiveness of the proposed method is demonstrated using various benchmark datasets for classification and regression tasks.

Download Full-text

PANK-A financial time series prediction model integrating principal component analysis, affinity propagation clustering and nested k-nearest neighbor regression

Journal of Interdisciplinary Mathematics ◽

10.1080/09720502.2018.1456825 ◽

2018 ◽

Vol 21 (3) ◽

pp. 717-728 ◽

Cited By ~ 5

Author(s):

Li Tang ◽

Heping Pan ◽

Yiyong Yao

Keyword(s):

Principal Component Analysis ◽

Time Series ◽

Prediction Model ◽

Nearest Neighbor ◽

Financial Time Series ◽

Time Series Prediction ◽

Principal Component ◽

K Nearest Neighbor ◽

Financial Time ◽

Affinity Propagation Clustering

Download Full-text

Android Malware Detection using Machine Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1011.0982s1219 ◽

2020 ◽

Vol 8 (2S12) ◽

pp. 65-70

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Machine Learning Algorithms ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

User Interest ◽

Android Malware ◽

Android Malware Detection

Machine Learning is empowering many aspects of day-to-day lives from filtering the content on social networks to suggestions of products that we may be looking for. This technology focuses on taking objects as image input to find new observations or show items based on user interest. The major discussion here is the Machine Learning techniques where we use supervised learning where the computer learns by the input data/training data and predict result based on experience. We also discuss the machine learning algorithms: Naïve Bayes Classifier, K-Nearest Neighbor, Random Forest, Decision Tress, Boosted Trees, Support Vector Machine, and use these classifiers on a dataset Malgenome and Drebin which are the Android Malware Dataset. Android is an operating system that is gaining popularity these days and with a rise in demand of these devices the rise in Android Malware. The traditional techniques methods which were used to detect malware was unable to detect unknown applications. We have run this dataset on different machine learning classifiers and have recorded the results. The experiment result provides a comparative analysis that is based on performance, accuracy, and cost.

Download Full-text

An Incremental Isomap Method for Hyperspectral Dimensionality Reduction and Classification

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.87.7.445 ◽

2021 ◽

Vol 87 (6) ◽

pp. 445-455

Author(s):

Yi Ma ◽

Zezhong Zheng ◽

Yutang Ma ◽

Mingcang Zhu ◽

Ran Huang ◽

...

Keyword(s):

Manifold Learning ◽

Nearest Neighbor ◽

Hyperspectral Image ◽

Hyperspectral Data ◽

Training Data ◽

Support Vector ◽

Data Sets ◽

K Nearest Neighbor ◽

Data Set ◽

Data Points

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.

Download Full-text

Business Intelligence using the K-Nearest Neighbor Algorithm to Analyze Customer Behavior in Online Crowdfunding Systems

E3S Web of Conferences ◽

10.1051/e3sconf/202020216005 ◽

2020 ◽

Vol 202 ◽

pp. 16005

Author(s):

Chashif Syadzali ◽

Suryono Suryono ◽

Jatmiko Endro Suseno

Keyword(s):

Business Intelligence ◽

Nearest Neighbor ◽

Customer Behavior ◽

Training Data ◽

Business Strategies ◽

Intelligence Analysis ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm

Customer behavior classification can be useful to assist companies in conducting business intelligence analysis. Data mining techniques can classify customer behavior using the K-Nearest Neighbor algorithm based on the customer's life cycle consisting of prospect, responder, active and former. Data used to classify include age, gender, number of donations, donation retention and number of user visits. The calculation results from 2,114 data in the classification of each customer’s category are namely active by 1.18%, prospect by 8.99%, responder by 4.26% and former by 85.57%. System accuracy using a range of K from K = 1 to K = 20 produces that the highest accuracy is 94.3731% at a value of K = 4. The results of the training data that produce a classification of user behavior can be used as a Business Intelligence analysis that is useful for companies in determining business strategies by knowing the target of optimal market.

Download Full-text

Identification of Leukemia Subtypes from Microscopic Images Using Convolutional Neural Network

Diagnostics ◽

10.3390/diagnostics9030104 ◽

2019 ◽

Vol 9 (3) ◽

pp. 104 ◽

Cited By ~ 11

Author(s):

Ahmed ◽

Yigit ◽

Isik ◽

Alpkocak

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Leukemia Data

Leukemia is a fatal cancer and has two main types: Acute and chronic. Each type has two more subtypes: Lymphoid and myeloid. Hence, in total, there are four subtypes of leukemia. This study proposes a new approach for diagnosis of all subtypes of leukemia from microscopic blood cell images using convolutional neural networks (CNN), which requires a large training data set. Therefore, we also investigated the effects of data augmentation for an increasing number of training samples synthetically. We used two publicly available leukemia data sources: ALL-IDB and ASH Image Bank. Next, we applied seven different image transformation techniques as data augmentation. We designed a CNN architecture capable of recognizing all subtypes of leukemia. Besides, we also explored other well-known machine learning algorithms such as naive Bayes, support vector machine, k-nearest neighbor, and decision tree. To evaluate our approach, we set up a set of experiments and used 5-fold cross-validation. The results we obtained from experiments showed that our CNN model performance has 88.25% and 81.74% accuracy, in leukemia versus healthy and multiclass classification of all subtypes, respectively. Finally, we also showed that the CNN model has a better performance than other wellknown machine learning algorithms.

Download Full-text

WINkNN: Windowed Intervals’ Number kNN Classifier for Efficient Time-Series Applications

Mathematics ◽

10.3390/math8030413 ◽

2020 ◽

Vol 8 (3) ◽

pp. 413 ◽

Cited By ~ 2

Author(s):

Chris Lytridis ◽

Anna Lekova ◽

Christos Bazinas ◽

Michail Manios ◽

Vassilis G. Kaburlasos

Keyword(s):

Time Series ◽

Ad Hoc ◽

Nearest Neighbor ◽

Classification Performance ◽

Human Robot Interaction ◽

Time Series Classification ◽

K Nearest Neighbor ◽

Time Dimension ◽

Knn Classifier ◽

Benchmark Datasets

Our interest is in time series classification regarding cyber–physical systems (CPSs) with emphasis in human-robot interaction. We propose an extension of the k nearest neighbor (kNN) classifier to time-series classification using intervals’ numbers (INs). More specifically, we partition a time-series into windows of equal length and from each window data we induce a distribution which is represented by an IN. This preserves the time dimension in the representation. All-order data statistics, represented by an IN, are employed implicitly as features; moreover, parametric non-linearities are introduced in order to tune the geometrical relationship (i.e., the distance) between signals and consequently tune classification performance. In conclusion, we introduce the windowed IN kNN (WINkNN) classifier whose application is demonstrated comparatively in two benchmark datasets regarding, first, electroencephalography (EEG) signals and, second, audio signals. The results by WINkNN are superior in both problems; in addition, no ad-hoc data preprocessing is required. Potential future work is discussed.

Download Full-text

EVALUATION OF MUTUAL INFORMATION ESTIMATORS FOR TIME SERIES

International Journal of Bifurcation and Chaos ◽

10.1142/s0218127409025298 ◽

2009 ◽

Vol 19 (12) ◽

pp. 4197-4215 ◽

Cited By ~ 32

Author(s):

ANGELIKI PAPANA ◽

DIMITRIS KUGIUMTZIS

Keyword(s):

Time Series ◽

Nonlinear Systems ◽

Mutual Information ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Series Length ◽

K Nearest Neighbors ◽

Data Adaptive ◽

The Comparative Study ◽

Selection Of

We study some of the most commonly used mutual information estimators, based on histograms of fixed or adaptive bin size, k-nearest neighbors and kernels and focus on optimal selection of their free parameters. We examine the consistency of the estimators (convergence to a stable value with the increase of time series length) and the degree of deviation among the estimators. The optimization of parameters is assessed by quantifying the deviation of the estimated mutual information from its true or asymptotic value as a function of the free parameter. Moreover, some commonly used criteria for parameter selection are evaluated for each estimator. The comparative study is based on Monte Carlo simulations on time series from several linear and nonlinear systems of different lengths and noise levels. The results show that the k-nearest neighbor is the most stable and less affected by the method-specific parameter. A data adaptive criterion for optimal binning is suggested for linear systems but it is found to be rather conservative for nonlinear systems. It turns out that the binning and kernel estimators give the least deviation in identifying the lag of the first minimum of mutual information from nonlinear systems, and are stable in the presence of noise.

Download Full-text

A methodology for applying k-nearest neighbor to time series forecasting

Artificial Intelligence Review ◽

10.1007/s10462-017-9593-z ◽

2017 ◽

Vol 52 (3) ◽

pp. 2019-2037 ◽

Cited By ~ 13

Author(s):

Francisco Martínez ◽

María Pilar Frías ◽

María Dolores Pérez ◽

Antonio Jesús Rivera

Keyword(s):

Time Series ◽

Nearest Neighbor ◽

Time Series Forecasting ◽

K Nearest Neighbor

Download Full-text

Tone Classification Matches Kodàly Handsign with the K-Nearest Neighbor Method at Leap Motion Controller

International Journal on Information and Communication Technology (IJoICT) ◽

10.21108/ijoict.2019.52.283 ◽

2020 ◽

Vol 5 (2) ◽

pp. 40

Author(s):

Muhammad Croassacipto ◽

Muhammad Ichwan ◽

Dina Budhi Utami

Keyword(s):

Music Education ◽

Nearest Neighbor ◽

Human Interaction ◽

Training Data ◽

Leap Motion ◽

K Nearest Neighbor ◽

Motion Controller ◽

K Value ◽

Leap Motion Controller ◽

Natural Function

<p>Hands can produce a variety of poses in which each pose can have a meaning or purpose that can be used as a form of communication determined according to a general agreement or who communicate. Hand pose can be used as human interaction with the computer is faster, intuitive, and in line with the natural function of the human body called Handsign. One of them is Kodàly Handsign, made by a Hungarian composer named Zoltán Kodály, which is a concept in music education in Hungary. This hand sign is used in interactive angklung performances in determining the tone that will be played by the K-Nearest Neighbor (KNN) algorithm classification process based on hand poses. This classification process is performed on the extracted data from Leap Motion Controller, which takes Pitch, Roll, and Yaw values based on basic aircraft principle. The results of the research were conducted five times with the value of k periodically 1,3,5,7,9 with test data consisting pose of 874 Do', 702 Si, 913 La, 612 Sol, 661 Fa, 526 Mi, 891 Re, and 1004 Do punctuation on 21099 training data. The test results can recognize hand poses with the optimal k value k=1 with an accuracy level of 94.87%.</p>

Download Full-text