A distributed real-time data prediction framework for large-scale time-series data using stream processing

2017 ◽  
Vol 10 (2) ◽  
pp. 145-165 ◽  
Author(s):  
Kehe Wu ◽  
Yayun Zhu ◽  
Quan Li ◽  
Ziwei Wu

Purpose The purpose of this paper is to propose a data prediction framework for scenarios which require forecasting demand for large-scale data sources, e.g., sensor networks, securities exchange, electric power secondary system, etc. Concretely, the proposed framework should handle several difficult requirements including the management of gigantic data sources, the need for a fast self-adaptive algorithm, the relatively accurate prediction of multiple time series, and the real-time demand. Design/methodology/approach First, the autoregressive integrated moving average-based prediction algorithm is introduced. Second, the processing framework is designed, which includes a time-series data storage model based on the HBase, and a real-time distributed prediction platform based on Storm. Then, the work principle of this platform is described. Finally, a proof-of-concept testbed is illustrated to verify the proposed framework. Findings Several tests based on Power Grid monitoring data are provided for the proposed framework. The experimental results indicate that prediction data are basically consistent with actual data, processing efficiency is relatively high, and resources consumption is reasonable. Originality/value This paper provides a distributed real-time data prediction framework for large-scale time-series data, which can exactly achieve the requirement of the effective management, prediction efficiency, accuracy, and high concurrency for massive data sources.

2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Longhai Yang ◽  
Hong Xu ◽  
Xiqiao Zhang ◽  
Shuai Li ◽  
Wenchao Ji

The application and development of new technology make it possible to acquire real-time data of vehicles. Based on these real-time data, the behavior of vehicles can be analyzed. The prediction of vehicle behavior provides data support for the fine management of traffic. This paper proposes speed and acceleration have fractal features by R/S analysis of the time series data of speed and acceleration. Based on the characteristic analysis of microscopic parameters, the characteristic indexes of parameters are quantified, the fractal multistep prediction model of microparameters is established, and the BP (back propagation neural networks) model is established to estimate predictable step of fractal prediction model. The fractal multistep prediction model is used to predict speed acceleration in the predictable step. NGSIM trajectory data are used to test the multistep prediction model. The results show that the proposed fractal multistep prediction model can effectively realize the multistep prediction of vehicle speed.


Author(s):  
Seng Hansun

AbstrakFuzzy time series merupakan salah satu metode soft computing yang telah digunakan dan diterapkan dalam analisis data runtun waktu. Tujuan utama dari fuzzy time series adalah untuk memprediksi data runtun waktu yang dapat digunakan secara luas pada sembarang data real time, termasuk data pasar modal.Banyak peneliti yang telah berkontribusi dalam pengembangan analisis data runtun waktu menggunakan fuzzy time series, seperti Chen dan Hsu [1], Jilani dkk. [2], serta Stevenson dan Porter [3]. Dalam penelitian ini, dicoba untuk menerapkan metode fuzzy time series pada salah satu indikator pergerakan harga saham, yakni data IHSG (Indeks Harga Saham Gabungan).Kinerja metode yang diusulkan dievaluasi dengan menghitung tingkat akurasi dan tingkat kehandalan metode fuzzy time series yang diterapkan pada data IHSG. Melalui pendekatan ini, diharapkan metode fuzzy time series dapat menjadi alternatif untuk memprediksi data IHSG yang merupakan salah satu indikator pergerakan harga saham di Indonesia. Kata kunci – fuzzy time series, data runtun waktu, soft computing, IHSG AbstractFuzzy time series is one of the soft computing method that been used and implemented in time series analysis. The main goal of fuzzy time series is to predict time series data that can be used widely in any real time data, including stock market share.Many researchers have contributed in the development of fuzzy time series analysis, such as Chen and Hsu [1], Jilani [2], and Stevenson and Porter [3]. In this research, we will try to implement the fuzzy time series method in one of the stock market change indicator, i.e. the Jakarta composite index or also known as IHSG (Indeks Harga Saham Gabungan).The research is continued by calculating the accuracy and robustness of the method which has been implemented on IHSG data. By this approach, we hope it can be an alternative to predict the IHSG data which is an indicator of stock price changes in Indonesia. Keywords – fuzzy time series, time series data, soft computing, IHSG


2016 ◽  
Vol 50 (1) ◽  
pp. 41-57 ◽  
Author(s):  
Linghe Huang ◽  
Qinghua Zhu ◽  
Jia Tina Du ◽  
Baozhen Lee

Purpose – Wiki is a new form of information production and organization, which has become one of the most important knowledge resources. In recent years, with the increase of users in wikis, “free rider problem” has been serious. In order to motivate editors to contribute more to a wiki system, it is important to fully understand their contribution behavior. The purpose of this paper is to explore the law of dynamic contribution behavior of editors in wikis. Design/methodology/approach – After developing a dynamic model of contribution behavior, the authors employed both the metrological and clustering methods to process the time series data. The experimental data were collected from Baidu Baike, a renowned Chinese wiki system similar to Wikipedia. Findings – There are four categories of editors: “testers,” “dropouts,” “delayers” and “stickers.” Testers, who contribute the least content and stop contributing rapidly after editing a few articles. After editing a large amount of content, dropouts stop contributing completely. Delayers are the editors who do not stop contributing during the observation time, but they may stop contributing in the near future. Stickers, who keep contributing and edit the most content, are the core editors. In addition, there are significant time-of-day and holiday effects on the number of editors’ contributions. Originality/value – By using the method of time series analysis, some new characteristics of editors and editor types were found. Compared with the former studies, this research also had a larger sample. Therefore, the results are more scientific and representative and can help managers to better optimize the wiki systems and formulate incentive strategies for editors.


2021 ◽  
Author(s):  
Sadnan Al Manir ◽  
Justin Niestroy ◽  
Maxwell Adam Levinson ◽  
Timothy Clark

Introduction: Transparency of computation is a requirement for assessing the validity of computed results and research claims based upon them; and it is essential for access to, assessment, and reuse of computational components. These components may be subject to methodological or other challenges over time. While reference to archived software and/or data is increasingly common in publications, a single machine-interpretable, integrative representation of how results were derived, that supports defeasible reasoning, has been absent. Methods: We developed the Evidence Graph Ontology, EVI, in OWL 2, with a set of inference rules, to provide deep representations of supporting and challenging evidence for computations, services, software, data, and results, across arbitrarily deep networks of computations, in connected or fully distinct processes. EVI integrates FAIR practices on data and software, with important concepts from provenance models, and argumentation theory. It extends PROV for additional expressiveness, with support for defeasible reasoning. EVI treats any com- putational result or component of evidence as a defeasible assertion, supported by a DAG of the computations, software, data, and agents that produced it. Results: We have successfully deployed EVI for very-large-scale predictive analytics on clinical time-series data. Every result may reference its own evidence graph as metadata, which can be extended when subsequent computations are executed. Discussion: Evidence graphs support transparency and defeasible reasoning on results. They are first-class computational objects, and reference the datasets and software from which they are derived. They support fully transparent computation, with challenge and support propagation. The EVI approach may be extended to include instruments, animal models, and critical experimental reagents.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Jing Zhao ◽  
Shubo Liu ◽  
Xingxing Xiong ◽  
Zhaohui Cai

Privacy protection is one of the major obstacles for data sharing. Time-series data have the characteristics of autocorrelation, continuity, and large scale. Current research on time-series data publication mainly ignores the correlation of time-series data and the lack of privacy protection. In this paper, we study the problem of correlated time-series data publication and propose a sliding window-based autocorrelation time-series data publication algorithm, called SW-ATS. Instead of using global sensitivity in the traditional differential privacy mechanisms, we proposed periodic sensitivity to provide a stronger degree of privacy guarantee. SW-ATS introduces a sliding window mechanism, with the correlation between the noise-adding sequence and the original time-series data guaranteed by sequence indistinguishability, to protect the privacy of the latest data. We prove that SW-ATS satisfies ε-differential privacy. Compared with the state-of-the-art algorithm, SW-ATS is superior in reducing the error rate of MAE which is about 25%, improving the utility of data, and providing stronger privacy protection.


Sign in / Sign up

Export Citation Format

Share Document