Exchange Market Liquidity Prediction with the K-Nearest Neighbor Approach: Crypto vs. Fiat Currencies

In this paper, we compare the predictions on the market liquidity in crypto and fiat currencies between two traditional time series methods, the autoregressive moving average (ARMA) and the generalized autoregressive conditional heteroskedasticity (GARCH), and the machine learning algorithm called the k-nearest neighbor (KNN) approach. We measure market liquidity as the log rates of bid-ask spreads in a sample of three cryptocurrencies (Bitcoin, Ethereum, and Ripple) and 16 major fiat currencies from 9 February 2018 to 8 February 2019. We find that the KNN approach is better suited for capturing the market liquidity in a cryptocurrency in the short-term than the ARMA and GARCH models maybe due to the complexity of the microstructure of the market. Considering traditional time series models, we find that ARMA models perform well when estimating the liquidity of fiat currencies in developed markets, whereas GARCH models do the same for fiat currencies in emerging markets. Nevertheless, our results show that the KNN approach can better predict the log rates of the bid-ask spreads of crypto and fiat currencies than ARMA and GARCH models.

Download Full-text

Efficient detection of hacker community based on twitter data using complex networks and machine learning algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210458 ◽

2021 ◽

pp. 1-17

Author(s):

Ahmed Al-Tarawneh ◽

Ja’afer Al-Saraireh

Keyword(s):

Machine Learning ◽

Complex Networks ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

Efficient Detection ◽

Suggested Keywords

Twitter is one of the most popular platforms used to share and post ideas. Hackers and anonymous attackers use these platforms maliciously, and their behavior can be used to predict the risk of future attacks, by gathering and classifying hackers’ tweets using machine-learning techniques. Previous approaches for detecting infected tweets are based on human efforts or text analysis, thus they are limited to capturing the hidden text between tweet lines. The main aim of this research paper is to enhance the efficiency of hacker detection for the Twitter platform using the complex networks technique with adapted machine learning algorithms. This work presents a methodology that collects a list of users with their followers who are sharing their posts that have similar interests from a hackers’ community on Twitter. The list is built based on a set of suggested keywords that are the commonly used terms by hackers in their tweets. After that, a complex network is generated for all users to find relations among them in terms of network centrality, closeness, and betweenness. After extracting these values, a dataset of the most influential users in the hacker community is assembled. Subsequently, tweets belonging to users in the extracted dataset are gathered and classified into positive and negative classes. The output of this process is utilized with a machine learning process by applying different algorithms. This research build and investigate an accurate dataset containing real users who belong to a hackers’ community. Correctly, classified instances were measured for accuracy using the average values of K-nearest neighbor, Naive Bayes, Random Tree, and the support vector machine techniques, demonstrating about 90% and 88% accuracy for cross-validation and percentage split respectively. Consequently, the proposed network cyber Twitter model is able to detect hackers, and determine if tweets pose a risk to future institutions and individuals to provide early warning of possible attacks.

Download Full-text

k-Nearest Neighbor Learning with Graph Neural Networks

Mathematics ◽

10.3390/math9080830 ◽

2021 ◽

Vol 9 (8) ◽

pp. 830

Author(s):

Seokho Kang

Keyword(s):

Neural Network ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Weighting Function ◽

High Sensitivity ◽

Training Data ◽

K Nearest Neighbor ◽

Main Challenge ◽

Benchmark Datasets ◽

Graph Neural Networks

k-nearest neighbor (kNN) is a widely used learning algorithm for supervised learning tasks. In practice, the main challenge when using kNN is its high sensitivity to its hyperparameter setting, including the number of nearest neighbors k, the distance function, and the weighting function. To improve the robustness to hyperparameters, this study presents a novel kNN learning method based on a graph neural network, named kNNGNN. Given training data, the method learns a task-specific kNN rule in an end-to-end fashion by means of a graph neural network that takes the kNN graph of an instance to predict the label of the instance. The distance and weighting functions are implicitly embedded within the graph neural network. For a query instance, the prediction is obtained by performing a kNN search from the training data to create a kNN graph and passing it through the graph neural network. The effectiveness of the proposed method is demonstrated using various benchmark datasets for classification and regression tasks.

Download Full-text

Intelligent Dynamic Identification Technique of Industrial Products in a Robotic Workplace

Sensors ◽

10.3390/s21051797 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1797

Author(s):

Ján Vachálek ◽

Dana Šišmišová ◽

Pavol Vašek ◽

Jan Rybář ◽

Juraj Slovák ◽

...

Keyword(s):

Machine Learning ◽

Control Charts ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Conveyor Belt ◽

Standard Uncertainty ◽

K Nearest Neighbor ◽

Industrial Products ◽

Dynamic Identification ◽

Identification Technique

The article deals with aspects of identifying industrial products in motion based on their color. An automated robotic workplace with a conveyor belt, robot and an industrial color sensor is created for this purpose. Measured data are processed in a database and then statistically evaluated in form of type A standard uncertainty and type B standard uncertainty, in order to obtain combined standard uncertainties results. Based on the acquired data, control charts of RGB color components for identified products are created. Influence of product speed on the measuring process identification and process stability is monitored. In case of identification uncertainty i.e., measured values are outside the limits of control charts, the K-nearest neighbor machine learning algorithm is used. This algorithm, based on the Euclidean distances to the classified value, estimates its most accurate iteration. This results into the comprehensive system for identification of product moving on conveyor belt, where based on the data collection and statistical analysis using machine learning, industry usage reliability is demonstrated.

Download Full-text

Limiting distributions of maximum likelihood estimators for unstable autoregressive moving-average time series with general autoregressive heteroscedastic errors

10.1214/aos/1030563979 ◽

1998 ◽

Vol 26 (1) ◽

pp. 84-125 ◽

Cited By ~ 69

Author(s):

Shiqing Ling ◽

W. K. Li

Keyword(s):

Time Series ◽

Maximum Likelihood ◽

Moving Average ◽

Maximum Likelihood Estimators ◽

Autoregressive Moving Average ◽

Limiting Distributions ◽

Heteroscedastic Errors

Download Full-text

KNN classifier based approach for multi-class sentiment analysis of twitter data

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.12656 ◽

2018 ◽

Vol 7 (3) ◽

pp. 1372

Author(s):

Soudamini Hota ◽

Sudhir Pathak

Keyword(s):

Sentiment Analysis ◽

Opinion Mining ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Online News ◽

Classification Algorithm ◽

Sentiment Classification ◽

Supervised Machine Learning ◽

K Nearest Neighbor ◽

News Reports

‘Sentiment’ literally means ‘Emotions’. Sentiment analysis, synonymous to opinion mining, is a type of data mining that refers to the analy-sis of data obtained from microblogging sites, social media updates, online news reports, user reviews etc., in order to study the sentiments of the people towards an event, organization, product, brand, person etc. In this work, sentiment classification is done into multiple classes. The proposed methodology based on KNN classification algorithm shows an improvement over one of the existing methodologies which is based on SVM classification algorithm. The data used for analysis has been taken from Twitter, this being the most popular microblogging site. The source data has been extracted from Twitter using Python’s Tweepy. N-Gram modeling technique has been used for feature extraction and the supervised machine learning algorithm k-nearest neighbor has been used for sentiment classification. The performance of proposed and existing techniques is compared in terms of accuracy, precision and recall. It is analyzed and concluded that the proposed technique performs better in terms of all the standard evaluation parameters.

Download Full-text

PANK-A financial time series prediction model integrating principal component analysis, affinity propagation clustering and nested k-nearest neighbor regression

Journal of Interdisciplinary Mathematics ◽

10.1080/09720502.2018.1456825 ◽

2018 ◽

Vol 21 (3) ◽

pp. 717-728 ◽

Cited By ~ 5

Author(s):

Li Tang ◽

Heping Pan ◽

Yiyong Yao

Keyword(s):

Principal Component Analysis ◽

Time Series ◽

Prediction Model ◽

Nearest Neighbor ◽

Financial Time Series ◽

Time Series Prediction ◽

Principal Component ◽

K Nearest Neighbor ◽

Financial Time ◽

Affinity Propagation Clustering

Download Full-text

The K Nearest Neighbor Algorithm for Imputation of Missing Longitudinal Prenatal Alcohol Data

10.21203/rs.3.rs-32456/v2 ◽

2021 ◽

Author(s):

Ayesha Sania ◽

Nicolo Pini ◽

Morgan Nelson ◽

Michael Myers ◽

Lauren Shuffrey ◽

...

Keyword(s):

Missing Data ◽

Missing Values ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Drinking Behavior ◽

Nearest Neighbors ◽

First Trimester ◽

Epidemiologic Studies ◽

K Nearest Neighbor ◽

Timeline Followback

Abstract Background — Missing data are a source of bias in epidemiologic studies. This is problematic in alcohol research where data missingness is linked to drinking behavior. Methods — The Safe Passage study was a prospective investigation of prenatal drinking and fetal/infant outcomes (n=11,083). Daily alcohol consumption for last reported drinking day and 30 days prior was recorded using Timeline Followback method. Of 3.2 million person-days, data were missing for 0.36 million. We imputed missing data using a machine learning algorithm; “K Nearest Neighbor” (K-NN). K-NN imputes missing values for a participant using data of participants closest to it. Imputed values were weighted for the distances from nearest neighbors and matched for day of week. Validation was done on randomly deleted data for 5-15 consecutive days. Results — Data from 5 nearest neighbors and segments of 55 days provided imputed values with least imputation error. After deleting data segments from with no missing days first trimester, there was no difference between actual and predicted values for 64% of deleted segments. For 31% of the segments, imputed data were within +/-1 drink/day of the actual. Conclusions — K-NN can be used to impute missing data in longitudinal studies of alcohol use during pregnancy with high accuracy.

Download Full-text

The time series regression analysis in evaluating the economic impact of COVID-19 cases in Indonesia

Model Assisted Statistics and Applications ◽

10.3233/mas-210533 ◽

2021 ◽

Vol 16 (3) ◽

pp. 197-210

Author(s):

Utriweni Mukhaiyar ◽

Devina Widyanti ◽

Sandy Vantika

Keyword(s):

Time Series ◽

Exchange Rate ◽

Moving Average ◽

Autoregressive Moving Average ◽

Transfer Function Model ◽

Function Model ◽

Time Series Regression ◽

Vector Autoregressive ◽

Daily Data ◽

The Impact

This study aims to determine the impact of COVID-19 cases in Indonesia on the USD/IDR exchange rate using the Transfer Function Model and Vector Autoregressive Moving-Average with Exogenous Regressors (VARMAX) Model. This paper uses daily data on the COVID-19 case in Indonesia, the USD/IDR exchange rate, and the IDX Composite period from 1 March to 29 June 2020. The analysis shows: (1) the higher the increase of the number of COVID-19 cases in Indonesia will significantly weaken the USD/IDR exchange rate, (2) an increase of 1% in the number of COVID-19 cases in Indonesia six days ago will weaken the USD/IDR exchange rate by 0.003%, (3) an increase of 1% in the number of COVID-19 cases in Indonesia seven days ago will weaken the USD/IDR exchange rate by 0.17%, and (4) an increase of 1% in the number of COVID-19 cases in Indonesia eight days ago will weaken the USD/IDR exchange rate by 0.24%.

Download Full-text

IntelliFin: Advanced Stock Prediction using Hybrid ML and LSTM Model with Financial Indicators powered by Sentiment Determination using NLP

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d8437.069520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 428-433

Keyword(s):

Stock Market ◽

Stock Prices ◽

Stock Price ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Majority Voting ◽

Support Vector ◽

K Nearest Neighbor ◽

Financial History ◽

The Right

Stock Trading has been one of the most important parts of the financial world for decades. People investing in the share market analyze the financial history of a corporation, the news related to it and study huge amounts of data so as to predict its stock price trend. The right investment i.e. buying and selling a company stock at the right time leads to monetary benefits and can make one a millionaire overnight. The stock market is an extremely fluctuating platform wherein data is produced in humongous quantities and is influenced by numerous disparate factors such as socio-political issues, financial activities like splits and dividends, news as well as rumors. This work proposes a novel system “IntelliFin” to predict the share market trend. The system uses the various stock market technical indicators along with the company's historical market data trends to predict the share prices. The system employs the sentiment determination of a company's financial and socio-political news for a more accurate prediction. This system is implemented using two models. The first is a hybrid LSTM model optimized by an ADAM optimizer. The other is a hybrid ML model which integrates a Support Vector Regressor, K-Nearest Neighbor classifier, an RF classifier and a Linear Regressor using a Majority Voting algorithm. Both models employ a sentiment analyzer to account for the news impacting the stock prices which is powered by NLP. The models are trained continuously using Reinforcement Learning implemented by the Q-Learning Algorithm to increase the consistency and accuracy. The project aims to support the inexperienced investors, who don't have enough experience in investing in the stock market and help them maximize their profit and minimize or eliminate the losses. The developed system will also serve as a tool for professional investors to help and aid their decision making.

Download Full-text