MODIFIED NEAREST NEIGHBOR METHOD FOR MULTISTEP AHEAD TIME SERIES FORECASTING

Author(s):  
SYED RAHAT ABBAS ◽  
MUHAMMAD ARIF

Multistep ahead time series forecasting has become an important activity in various fields of science and technology due to its usefulness in future events management. Nearest neighbor search is a pattern matching algorithm for forecasting, and the accuracy of the method considerably depends on the similarity of the pattern found in the database with the reference pattern. Original time series is embedded into optimal dimension. The optimal dimension is determined by using autocorrelation function plot. The last vector in the embedded matrix is taken as the reference vector and all the previous vectors as candidate vectors. In nearest neighbor algorithm, the reference vector is matched with all the candidate vectors in terms of Euclidean distance and the best matched pattern is used for forecasting. In this paper, we have proposed a hybrid distance measure to improve the search of the nearest neighbor. The proposed method is based on cross-correlation and Euclidean distance. The candidate patterns are shortlisted by using cross-correlation and then Euclidean distance is used to select the best matched pattern. Moreover, in multistep ahead forecasting, standard nearest neighbor method introduces a bias in the search which results in higher forecasting errors. We have modified the search methodology to remove the bias by ignoring the latest forecasted value during the search of the nearest neighbor in the subsequent iteration. The proposed algorithm is evaluated on two benchmark time series as well as two real life time series.

Author(s):  
SYED RAHAT ABBAS ◽  
MUHAMMAD ARIF

Long range or multistep-ahead time series forecasting is an important issue in various fields of business, science and technology. In this paper, we have proposed a modified nearest neighbor based algorithm that can be used for long range time series forecasting. In the original time series, optimal selection of embedding dimension that can unfold the dynamics of the system is improved by using upsampling of the time series. Zeroth order cross-correlation and Euclidian distance criterion are used to select the nearest neighbor from up-sampled time series. Embedding dimension size and number of candidate vectors for nearest neighbor selection play an important role in forecasting. The size of embedding is optimized by using auto-correlation function (ACF) plot of the time series. It is observed that proposed algorithm outperforms the standard nearest neighbor algorithm. The cross-correlation based criteria shows better performance than Euclidean distance criteria.


2020 ◽  
Author(s):  
Cameron Hargreaves ◽  
Matthew Dyer ◽  
Michael Gaultois ◽  
Vitaliy Kurlin ◽  
Matthew J Rosseinsky

It is a core problem in any field to reliably tell how close two objects are to being the same, and once this relation has been established we can use this information to precisely quantify potential relationships, both analytically and with machine learning (ML). For inorganic solids, the chemical composition is a fundamental descriptor, which can be represented by assigning the ratio of each element in the material to a vector. These vectors are a convenient mathematical data structure for measuring similarity, but unfortunately, the standard metric (the Euclidean distance) gives little to no variance in the resultant distances between chemically dissimilar compositions. We present the Earth Mover’s Distance (EMD) for inorganic compositions, a well-defined metric which enables the measure of chemical similarity in an explainable fashion. We compute the EMD between two compositions from the ratio of each of the elements and the absolute distance between the elements on the modified Pettifor scale. This simple metric shows clear strength at distinguishing compounds and is efficient to compute in practice. The resultant distances have greater alignment with chemical understanding than the Euclidean distance, which is demonstrated on the binary compositions of the Inorganic Crystal Structure Database (ICSD). The EMD is a reliable numeric measure of chemical similarity that can be incorporated into automated workflows for a range of ML techniques. We have found that with no supervision the use of this metric gives a distinct partitioning of binary compounds into clear trends and families of chemical property, with future applications for nearest neighbor search queries in chemical database retrieval systems and supervised ML techniques.


2016 ◽  
Vol 2016 ◽  
pp. 1-14 ◽  
Author(s):  
Mingjun Deng ◽  
Shiru Qu

There are many short-term road travel time forecasting studies based on time series, but indeed, road travel time not only relies on the historical travel time series, but also depends on the road and its adjacent sections history flow. However, few studies have considered that. This paper is based on the correlation of flow spatial distribution and the road travel time series, applying nearest neighbor and nonparametric regression method to build a forecasting model. In aspect of spatial nearest neighbor search, three different space distances are defined. In addition, two forecasting functions are introduced: one combines the forecasting value by mean weight and the other uses the reciprocal of nearest neighbors distance as combined weight. Three different distances are applied in nearest neighbor search, which apply to the two forecasting functions. For travel time series, the nearest neighbor and nonparametric regression are applied too. Then minimizing forecast error variance is utilized as an objective to establish the combination model. The empirical results show that the combination model can improve the forecast performance obviously. Besides, the experimental results of the evaluation for the computational complexity show that the proposed method can satisfy the real-time requirement.


2018 ◽  
Vol 2018 ◽  
pp. 1-17 ◽  
Author(s):  
Hyung-Ju Cho

We investigate the k-nearest neighbor (kNN) join in road networks to determine the k-nearest neighbors (NNs) from a dataset S to every object in another dataset R. The kNN join is a primitive operation and is widely used in many data mining applications. However, it is an expensive operation because it combines the kNN query and the join operation, whereas most existing methods assume the use of the Euclidean distance metric. We alternatively consider the problem of processing kNN joins in road networks where the distance between two points is the length of the shortest path connecting them. We propose a shared execution-based approach called the group-nested loop (GNL) method that can efficiently evaluate kNN joins in road networks by exploiting grouping and shared execution. The GNL method can be easily implemented using existing kNN query algorithms. Extensive experiments using several real-life roadmaps confirm the superior performance and effectiveness of the proposed method in a wide range of problem settings.


2017 ◽  
Vol 52 (3) ◽  
pp. 2019-2037 ◽  
Author(s):  
Francisco Martínez ◽  
María Pilar Frías ◽  
María Dolores Pérez ◽  
Antonio Jesús Rivera

Author(s):  
Luis Alexander Calvo-Valverde ◽  
David Elías Alfaro-Barboza

The ability to make short or long term predictions is at the heart of much of science. In the last decade, the data science community have been highly interested in foretelling real life events, using data mining techniques to discover meaningful rules or patterns, from different data types, including Time Series. Short-term predictions based on “the shape” of meaningful rules lead to a vast number of applications. The discovery of meaningful rules is achieved through efficient algorithms, equipped with a robust and accurate distance measure. Consequently, it is important to wisely choose a distance measure that can deal with noise, entropy and other technical constraints, to get accurate outcomes of similarity from the comparison between two time series. In this work, we do believe that Dynamic Time Warping based on Cubic Spline Interpolation (SIDTW), can be useful to carry out the similarity computation for two specific algorithms: 1- DiscoverRules() and 2- TestRules(). Mohammad Shokoohi-Yekta et al developed a framework, using these two algoritghms, to find and test meaningful rules from time series. Our research expanded the scope of their project, adding a set of well-known similarity search measures, including SIDTW as novel and enhanced version of DTW.


Author(s):  
Yue Pang ◽  
Bo Yao ◽  
Xiangdong Zhou ◽  
Yong Zhang ◽  
Yiming Xu ◽  
...  

Electricity demand forecasting is a very important problem for energy supply and environmental protection. It can be formalized as a hierarchical time series forecasting problem with the aggregation constraints according to the geographical hierarchy, since the sum of the prediction results of the disaggregated time series should be equal to the prediction results of the aggregated ones. However in most previous work, the aggregation consistency is ensured at the loss of forecast accuracy. In this paper, we propose a novel clustering-based hierarchical electricity time series forecasting approach. Instead of dealing with the geographical hierarchy directly, we explore electricity consumption patterns by clustering analysis and build a new consumption pattern based time series hierarchy. We then present a novel hierarchical forecasting method with consumption hierarchical aggregation constraints to improve the electricity demand predictions of the bottom level, followed by a ``bottom-up" method to obtain forecasts of the geographical higher levels. Especially, we observe that in our consumption pattern based hierarchy the reconciliation error of the bottom level time series is ``correlated" to its membership degree of the corresponding cluster (consumption pattern), and hence apply this correlations as the regularization term in our forecasting objective function. Extensive experiments on real-life datasets verify that our approach achieves the best prediction accuracy, compared with the state-of-the-art methods.


Sign in / Sign up

Export Citation Format

Share Document