scholarly journals Kernel Distance Measures for Time Series, Random Fields and Other Structured Data

Author(s):  
Srinjoy Das ◽  
Hrushikesh N. Mhaskar ◽  
Alexander Cloninger

This paper introduces kdiff, a novel kernel-based measure for estimating distances between instances of time series, random fields and other forms of structured data. This measure is based on the idea of matching distributions that only overlap over a portion of their region of support. Our proposed measure is inspired by MPdist which has been previously proposed for such datasets and is constructed using Euclidean metrics, whereas kdiff is constructed using non-linear kernel distances. Also, kdiff accounts for both self and cross similarities across the instances and is defined using a lower quantile of the distance distribution. Comparing the cross similarity to self similarity allows for measures of similarity that are more robust to noise and partial occlusions of the relevant signals. Our proposed measure kdiff is a more general form of the well known kernel-based Maximum Mean Discrepancy distance estimated over the embeddings. Some theoretical results are provided for separability conditions using kdiff as a distance measure for clustering and classification problems where the embedding distributions can be modeled as two component mixtures. Applications are demonstrated for clustering of synthetic and real-life time series and image data, and the performance of kdiff is compared to competing distance measures for clustering.

Mathematics ◽  
2021 ◽  
Vol 9 (21) ◽  
pp. 2708
Author(s):  
Achilleas Anastasiou ◽  
Peter Hatzopoulos ◽  
Alex Karagrigoriou ◽  
George Mavridoglou

In this work, we focus on the development of new distance measure algorithms, namely, the Causality Within Groups (CAWG), the Generalized Causality Within Groups (GCAWG) and the Causality Between Groups (CABG), all of which are based on the well-known Granger causality. The proposed distances together with the associated algorithms are suitable for multivariate statistical data analysis including unsupervised classification (clustering) purposes for the analysis of multivariate time series data with emphasis on financial and economic data where causal relationships are frequently present. For exploring the appropriateness of the proposed methodology, we implement, for illustrative purposes, the proposed algorithms to hierarchical clustering for the classification of 19 EU countries based on seven variables related to health resources in healthcare systems.


Author(s):  
Enang, Ekaette Inyang ◽  
Ojua, Doris Nkan ◽  
T. T. Ojewale

This study employed the method of calibration on product type estimator to propose calibration product type estimators using three distance measures namely; chi-square distance measure, the minimum entropy distance measure and the modified chi-square distance measure for single constraint. The estimators of variances of the proposed estimators were also obtained. An empirical study to ascertain the performance of these estimators was carried out using real life and stimulated data set. The result with the real life data showed that the proposed calibration product type estimator  produced better estimates of the population mean  compared to   and . Results from the simulation study showed that the proposed calibration product type estimators had a high gain in efficiency as compared to the product type estimator. The simulation result also showed that the proposed estimators were more consistent and reliable under the Gamma and Exponential distributions with the exponential distribution taking the lead. The conventional product type estimator however was found to be better if the underlying distributional assumption is normal in nature.


Author(s):  
SYED RAHAT ABBAS ◽  
MUHAMMAD ARIF

Multistep ahead time series forecasting has become an important activity in various fields of science and technology due to its usefulness in future events management. Nearest neighbor search is a pattern matching algorithm for forecasting, and the accuracy of the method considerably depends on the similarity of the pattern found in the database with the reference pattern. Original time series is embedded into optimal dimension. The optimal dimension is determined by using autocorrelation function plot. The last vector in the embedded matrix is taken as the reference vector and all the previous vectors as candidate vectors. In nearest neighbor algorithm, the reference vector is matched with all the candidate vectors in terms of Euclidean distance and the best matched pattern is used for forecasting. In this paper, we have proposed a hybrid distance measure to improve the search of the nearest neighbor. The proposed method is based on cross-correlation and Euclidean distance. The candidate patterns are shortlisted by using cross-correlation and then Euclidean distance is used to select the best matched pattern. Moreover, in multistep ahead forecasting, standard nearest neighbor method introduces a bias in the search which results in higher forecasting errors. We have modified the search methodology to remove the bias by ignoring the latest forecasted value during the search of the nearest neighbor in the subsequent iteration. The proposed algorithm is evaluated on two benchmark time series as well as two real life time series.


Author(s):  
Luis Alexander Calvo-Valverde ◽  
David Elías Alfaro-Barboza

The ability to make short or long term predictions is at the heart of much of science. In the last decade, the data science community have been highly interested in foretelling real life events, using data mining techniques to discover meaningful rules or patterns, from different data types, including Time Series. Short-term predictions based on “the shape” of meaningful rules lead to a vast number of applications. The discovery of meaningful rules is achieved through efficient algorithms, equipped with a robust and accurate distance measure. Consequently, it is important to wisely choose a distance measure that can deal with noise, entropy and other technical constraints, to get accurate outcomes of similarity from the comparison between two time series. In this work, we do believe that Dynamic Time Warping based on Cubic Spline Interpolation (SIDTW), can be useful to carry out the similarity computation for two specific algorithms: 1- DiscoverRules() and 2- TestRules(). Mohammad Shokoohi-Yekta et al developed a framework, using these two algoritghms, to find and test meaningful rules from time series. Our research expanded the scope of their project, adding a set of well-known similarity search measures, including SIDTW as novel and enhanced version of DTW.


Author(s):  
D. N. Ojua ◽  
J. A. Abuchu ◽  
E. O. Ojua ◽  
E. I. Enang

Calibration approach adjusts the original design weights by incorporating an auxiliary variable into it, to make the estimator be in the form of a regression estimator. This method was employed to propose calibration product type estimators using three distance measures namely; chi-square distance measure, the minimum entropy distance measure and the modified chi-square distance measure using double constraints. The estimators of variances of the proposed estimators were also obtained. An empirical study to ascertain the performance of these estimators using a secondary data set and simulated data under underlying distributional assumptions of Gamma, Normal and Exponential distributions with varying sample sizes of 10%, 15%, 20% and 25% were carried out. The result with the real life data showed that the calibration product type estimator from chi-square distance measure estimated the population mean with minimum bias than and obtained from the other distance measures. The result from real life data also revealed that the estimator obtained from chi-square distance measure under two constraints was more efficient than the other three estimators. The result from simulation studies showed that the proposed calibration product type estimators outperform the conventional product type estimator in term of efficiency, consistency and reliability under the Gamma and Exponential distributions with the exponential distribution taking the lead. The conventional product type estimator however was found to be better under normal distribution. It was also observed that as sample size increases there was no significant change in the performance of these proposed estimators which justifies the preference with small sample size.


Author(s):  
WEI LU ◽  
LIYONG ZHANG ◽  
JIANHUA YANG ◽  
XIAODONG LIU

Most researchers of time series forecasting devote to design and develop quantitative models for pursuing high accuracy of forecasting on the numerical level. However, in real world, the numerical accuracy is sometimes not necessary for human cognition and decision-making and the numerical results of forecasting based on quantitative model are deficient in interpretability, thus the development of qualitative forecasting model of time series becomes an evident challenge. In this paper, the improved fuzzy cognitive map (IFCM) are proposed first, and then it is applied to develop qualitative model for linguistic forecasting of time series together with fuzzy c-means clustering technology and real-coded genetic algorithm (RCGA). Two real life time series are used to test the developed forecasting model and compare with another method based on FCM, whose results show the developed FCM forecasting model is more simpler and high quality on the linguistic level.


2018 ◽  
Vol 2018 ◽  
pp. 1-8 ◽  
Author(s):  
Jae Young Choi ◽  
Bumshik Lee

Time series forecasting is essential for various engineering applications in finance, geology, and information technology, etc. Long Short-Term Memory (LSTM) networks are nowadays gaining renewed interest and they are replacing many practical implementations of the time series forecasting systems. This paper presents a novel LSTM ensemble forecasting algorithm that effectively combines multiple forecast (prediction) results from a set of individual LSTM networks. The main advantages of our LSTM ensemble method over other state-of-the-art ensemble techniques are summarized as follows: (1) we develop a novel way of dynamically adjusting the combining weights that are used for combining multiple LSTM models to produce the composite prediction output; for this, our method is devised for updating combining weights at each time step in an adaptive and recursive way by using both past prediction errors and forgetting weight factor; (2) our method is capable of well capturing nonlinear statistical properties in the time series, which considerably improves the forecasting accuracy; (3) our method is straightforward to implement and computationally efficient when it comes to runtime performance because it does not require the complex optimization in the process of finding combining weights. Comparative experiments demonstrate that our proposed LSTM ensemble method achieves state-of-the-art forecasting performance on four real-life time series datasets publicly available.


Author(s):  
Abdul Haseeb Ganie ◽  
Surender Singh

AbstractPicture fuzzy set (PFS) is a direct generalization of the fuzzy sets (FSs) and intuitionistic fuzzy sets (IFSs). The concept of PFS is suitable to model the situations that involve more answers of the type yes, no, abstain, and refuse. In this study, we introduce a novel picture fuzzy (PF) distance measure on the basis of direct operation on the functions of membership, non-membership, neutrality, refusal, and the upper bound of the function of membership of two PFSs. We contrast the proposed PF distance measure with the existing PF distance measures and discuss the advantages in the pattern classification problems. The application of fuzzy and non-standard fuzzy models in the real data is very challenging as real data is always found in crisp form. Here, we also derive some conversion formulae to apply proposed method in the real data set. Moreover, we introduce a new multi-attribute decision-making (MADM) method using the proposed PF distance measure. In addition, we justify necessity of the newly proposed MADM method using appropriate counterintuitive examples. Finally, we contrast the performance of the proposed MADM method with the classical MADM methods in the PF environment.


2018 ◽  
Vol 8 (2) ◽  
pp. 121-132 ◽  
Author(s):  
Esra Akdeniz ◽  
Erol Egrioglu ◽  
Eren Bas ◽  
Ufuk Yolcu

Abstract Real-life time series have complex and non-linear structures. Artificial Neural Networks have been frequently used in the literature to analyze non-linear time series. High order artificial neural networks, in view of other artificial neural network types, are more adaptable to the data because of their expandable model order. In this paper, a new recurrent architecture for Pi-Sigma artificial neural networks is proposed. A learning algorithm based on particle swarm optimization is also used as a tool for the training of the proposed neural network. The proposed new high order artificial neural network is applied to three real life time series data and also a simulation study is performed for Istanbul Stock Exchange data set.


2016 ◽  
Vol 8 (1) ◽  
pp. 78-98 ◽  
Author(s):  
Dániel Topál ◽  
István Matyasovszkyt ◽  
Zoltán Kern ◽  
István Gábor Hatvani

AbstractTime series often contain breakpoints of different origin, i.e. breakpoints, caused by (i) shifts in trend, (ii) other changes in trend and/or, (iii) changes in variance. In the present study, artificially generated time series with white and red noise structures are analyzed using three recently developed breakpoint detection methods. The time series are modified so that the exact “locations” of the artificial breakpoints are prescribed, making it possible to evaluate the methods exactly. Hence, the study provides a deeper insight into the behaviour of the three different breakpoint detection methods. Utilizing this experience can help solving breakpoint detection problems in real-life data sets, as is demonstrated with two examples taken from the fields of paleoclimate research and petrology.


Sign in / Sign up

Export Citation Format

Share Document