scholarly journals Research on the Data-Driven Quality Control Method of Hydrological Time Series Data

Water ◽  
2018 ◽  
Vol 10 (12) ◽  
pp. 1712 ◽  
Author(s):  
Qun Zhao ◽  
Yuelong Zhu ◽  
Dingsheng Wan ◽  
Yufeng Yu ◽  
Xifeng Cheng

Ensuring the quality of hydrological data has become a key issue in the field of hydrology. Based on the characteristics of hydrological data, this paper proposes a data-driven quality control method for hydrological data. For continuous hydrological time series data, two combined forecasting models and one statistical control model are constructed from horizontal, vertical, and statistical perspectives and the three models provide three confidence intervals. Set the suspicious level based on the number of confidence intervals for data violations, control the data, and provide suggested values for suspicious and missing data. For the discrete hydrological data with large time-space difference, the similar weight topological map between the neighboring stations is established centering on the hydrological station under the test and it is adjusted continuously with the seasonal changes. Lastly, a spatial interpolation model is established to detect the data. The experimental results show that the quality control method proposed in this paper can effectively detect and control the data, find suspicious and erroneous data, and provide suggested values.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Tuan D. Pham

AbstractAutomated analysis of physiological time series is utilized for many clinical applications in medicine and life sciences. Long short-term memory (LSTM) is a deep recurrent neural network architecture used for classification of time-series data. Here time–frequency and time–space properties of time series are introduced as a robust tool for LSTM processing of long sequential data in physiology. Based on classification results obtained from two databases of sensor-induced physiological signals, the proposed approach has the potential for (1) achieving very high classification accuracy, (2) saving tremendous time for data learning, and (3) being cost-effective and user-comfortable for clinical trials by reducing multiple wearable sensors for data recording.


2021 ◽  
Author(s):  
Tetsuya Yamada ◽  
Shoi Shi

Comprehensive and evidence-based countermeasures against emerging infectious diseases have become increasingly important in recent years. COVID-19 and many other infectious diseases are spread by human movement and contact, but complex transportation networks in 21 century make it difficult to predict disease spread in rapidly changing situations. It is especially challenging to estimate the network of infection transmission in the countries that the traffic and human movement data infrastructure is not yet developed. In this study, we devised a method to estimate the network of transmission of COVID-19 from the time series data of its infection and applied it to determine its spread across areas in Japan. We incorporated the effects of soft lockdowns, such as the declaration of a state of emergency, and changes in the infection network due to government-sponsored travel promotion, and predicted the spread of infection using the Tokyo Olympics as a model. The models used in this study are available online, and our data-driven infection network models are scalable, whether it be at the level of a city, town, country, or continent, and applicable anywhere in the world, as long as the time-series data of infections per region is available. These estimations of effective distance and the depiction of infectious disease networks based on actual infection data are expected to be useful in devising data-driven countermeasures against emerging infectious diseases worldwide.


2020 ◽  
Vol 34 (10) ◽  
pp. 13720-13721
Author(s):  
Won Kyung Lee

A multivariate time-series forecasting has great potentials in various domains. However, it is challenging to find dependency structure among the time-series variables and appropriate time-lags for each variable, which change dynamically over time. In this study, I suggest partial correlation-based attention mechanism which overcomes the shortcomings of existing pair-wise comparisons-based attention mechanisms. Moreover, I propose data-driven series-wise multi-resolution convolutional layers to represent the input time-series data for domain agnostic learning.


2020 ◽  
Vol 12 (1) ◽  
pp. 10
Author(s):  
W Glenn Bond ◽  
Haley Dozier ◽  
Thomas L Arnold ◽  
Michael Y Lam ◽  
Quyen T Dong ◽  
...  

Attempts to leverage operational time-series data in Condition Based Maintenance (CBM) approaches to optimize the life cycle management and Reliability, Availability, and Maintainability (RAM) of military vehicles have encountered several obstacles over decades of data collection. These obstacles have beset similar approaches on civilian ground vehicles, as well as on aircraft and other complex systems. Analysis of operational data is critical because it represents a continuous recording of the state of the system. Applying rudimentary data analytics to operational data can provide insights like fuel usage patterns or observed reliability of one vehicle or even a fleet. Monitoring trends and analyzing patterns in this data over time, however, can provide insight into the health of a vehicle, a complex system, or a fleet, predicting mean time to failure or compiling logistic or life cycle needs. Such High-Performance Data Analytics (HPDA) on operational time-series datasets has been historically difficult due to the large amount of data gathered from vehicle sensors, the lack of association between clusters observed in the data and failures or unscheduled maintenance events, and the deficiency of unsupervised learning techniques for time-series data. We present an HPDA environment and a method of discovering patterns in vehicle operational data that determines models for predicting the likelihood of imminent failure, referred to as Parameter-Based Indicators (PBIs). Our method is a data-driven approach that uses both time-series and relational maintenance data. This hybrid approach combines both supervised and unsupervised machine learning and data analytic techniques to correlate labeled, relational maintenance event data with unlabeled operational time-series data utilizing the DoD High Performance Computing (HPC) capabilities at the U.S. Army Engineer Research and Development Center. In leveraging both time-series and relational data, we demonstrate a means of fast, purely data-driven model creation that is more broadly applicable and requires less a priori information than physics informed, data-driven models. By blending these approaches, this system will be able to relate some lifecycle management goals through the workflow to generate specific PBIs that will predict failures or highlight appropriate areas of concern in individual or collective vehicle histories.


2021 ◽  
Vol 17 (4) ◽  
pp. 306-320
Author(s):  
Rahmah Mohd Lokoman ◽  
Fadhilah Yusof ◽  
Nor Eliza Alias ◽  
Zulkifli Yusop

Copula model has applied in various hydrologic studies, however, most analyses conducted does not considering the non-stationary conditions that may exist in the time series. To investigate the dependence structure between two rainfall stations at Johor Bahru, two methods have been applied. The first method considers the non-stationary condition that exists in the data, while the second method assumes stationarity in the time series data.  Through goodness-off-fit (GOF) and simulation tests, performance of both methods are compared in this study. The results obtained in this study highlight the importance of considering non-stationarity conditions in the hydrological data.


Author(s):  
Fakhri J. Hasanov ◽  
Jeyhun L. Mikayilov

In this short note, the described step-by-step derivations of the industrial energy demand function from the production function framework and provided researchers with two specifications. Then we applied these theoretical specifications to the time series data as empirical analysis. We concluded that theories should be considered at the beginning of the empirical analyses but the data also should be allowed to speak freely. Hence, the main suggestion of this short note is that it would be a better strategy to consider the combination of theory-driven and data-driven approaches in the empirical analyses.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Irfan Haider Shakri

Purpose The purpose of this study is to compare five data-driven-based ML techniques to predict the time series data of Bitcoin returns, namely, alternating model tree, random forest (RF), multiple linear regression, multi-layer perceptron regression and M5 Tree algorithms. Design/methodology/approach The data used to forecast time series data of Bitcoin returns ranges from 8 July 2010 to 30 Aug 2020. This study used several predictors to predict bitcoin returns including economic policy uncertainty, equity market volatility index, S&P returns, USD/EURO exchange rates, oil and gold prices, volatilities and returns. Five statistical indexes, namely, correlation coefficient, mean absolute error, root mean square error, relative absolute error and root relative squared error are determined. The results of these metrices are used to develop colour intensity ranking. Findings Among the machine learning (ML) techniques used in this study, RF models has shown superior predictive ability for estimating the Bitcoin returns. Originality/value This study is first of its kind to use and compare ML models in the prediction of Bitcoins. More studies can be carried out by using further cryptocurrencies and other ML data-driven models in future.


2021 ◽  
pp. 2150316
Author(s):  
Qingxiang Feng ◽  
Haipeng Wei ◽  
Jun Hu ◽  
Wenzhe Xu ◽  
Fan Li ◽  
...  

Most of the existing researches on public health events focus on the number and duration of events in a year or month, which are carried out by regression equation. COVID-19 epidemic, which was discovered in Wuhan, Hubei Province, quickly spread to the whole country, and then appeared as a global public health event. During the epidemic period, Chinese netizens inquired about the dynamics of COVID-19 epidemic through Baidu search platform, and learned about relevant epidemic prevention information. These groups’ search behavior data not only reflect people’s attention to COVID-19 epidemic, but also contain the stage characteristics and evolution trend of COVID-19 epidemic. Therefore, the time, space and attribute laws of propagation of COVID-19 epidemic can be discovered by deeply mining more information in the time series data of search behavior. In this study, it is found that transforming time series data into visibility network through the principle of visibility algorithm can dig more hidden information in time series data, which may help us fully understand the attention to COVID-19 epidemic in Chinese provinces and cities, and evaluate the deficiencies of early warning and prevention of major epidemics. What’s more, it will improve the ability to cope with public health crisis and social decision-making level.


2003 ◽  
Vol 06 (02) ◽  
pp. 119-134 ◽  
Author(s):  
LUIS A. GIL-ALANA

In this article we propose the use of a version of the tests of Robinson [32] for testing unit and fractional roots in financial time series data. The tests have a standard null limit distribution and they are the most efficient ones in the context of Gaussian disturbances. We compute finite sample critical values based on non-Gaussian disturbances and the power properties of the tests are compared when using both, the asymptotic and the finite-sample (Gaussian and non-Gaussian) critical values. The tests are applied to the monthly structure of several stock market indexes and the results show that the if the underlying I(0) disturbances are white noise, the confidence intervals include the unit root; however, if they are autocorrelated, the unit root is rejected in favour of smaller degrees of integration. Using t-distributed critical values, the confidence intervals for the non-rejection values are generally narrower than with the asymptotic or than with the Gaussian finite-sample ones, suggesting that they may better describe the time series behaviour of the data examined.


2021 ◽  
Vol 12 (2) ◽  
pp. 1-21
Author(s):  
Zijian Li ◽  
Ruichu Cai ◽  
Hong Wei Ng ◽  
Marianne Winslett ◽  
Tom Z. J. Fu ◽  
...  

Data-driven models are becoming essential parts in modern mechanical systems, commonly used to capture the behavior of various equipment and varying environmental characteristics. Despite the advantages of these data-driven models on excellent adaptivity to high dynamics and aging equipment, they are usually hungry for massive labels, mostly contributed by human engineers at a high cost. Fortunately, domain adaptation enhances the model generalization by utilizing the labeled source data and the unlabeled target data. However, the mainstream domain adaptation methods cannot achieve ideal performance on time series data, since they assume that the conditional distributions are equal. This assumption works well in the static data but is inapplicable for the time series data. Even the first-order Markov dependence assumption requires the dependence between any two consecutive time steps. In this article, we assume that the causal mechanism is invariant and present our Causal Mechanism Transfer Network (CMTN) for time series domain adaptation. By capturing causal mechanisms of time series data, CMTN allows the data-driven models to exploit existing data and labels from similar systems, such that the resulting model on a new system is highly reliable even with limited data. We report our empirical results and lessons learned from two real-world case studies, on chiller plant energy optimization and boiler fault detection, which outperform the existing state-of-the-art method.


Sign in / Sign up

Export Citation Format

Share Document