scholarly journals Spatio-temporal modelling for non-stationary point referenced data

2021 ◽  
Author(s):  
Lindsay Morris

<p><b>Spatial and spatio-temporal phenomena are commonly modelled as Gaussian processes via the geostatistical model (Gelfand & Banerjee, 2017). In the geostatistical model the spatial dependence structure is modelled using covariance functions. Most commonly, the covariance functions impose an assumption of spatial stationarity on the process. That means the covariance between observations at particular locations depends only on the distance between the locations (Banerjee et al., 2014). It has been widely recognized that most, if not all, processes manifest spatially nonstationary covariance structure Sampson (2014). If the study domain is small in area or there is not enough data to justify more complicated nonstationary approaches, then stationarity may be assumed for the sake of mathematical convenience (Fouedjio, 2017). However, relationships between variables can vary significantly over space, and a ‘global’ estimate of the relationships may obscure interesting geographical phenomena (Brunsdon et al., 1996; Fouedjio, 2017; Sampson & Guttorp, 1992). </b></p> <p>In this thesis, we considered three non-parametric approaches to flexibly account for non-stationarity in both spatial and spatio-temporal processes. First, we proposed partitioning the spatial domain into sub-regions using the K-means clustering algorithm based on a set of appropriate geographic features. This allowed for fitting separate stationary covariance functions to the smaller sub-regions to account for local differences in covariance across the study region. Secondly, we extended the concept of covariance network regression to model the covariance matrix of both spatial and spatio-temporal processes. The resulting covariance estimates were found to be more flexible in accounting for spatial autocorrelation than standard stationary approaches. The third approach involved geographic random forest methodology using a neighbourhood structure for each location constructed through clustering. We found that clustering based on geographic measures such as longitude and latitude ensured that observations that were too far away to have any influence on the observations near the locations where a local random forest was fitted were not selected to form the neighbourhood. </p> <p>In addition to developing flexible methods to account for non-stationarity, we developed a pivotal discrepancy measure approach for goodness-of-fit testing of spatio-temporal geostatistical models. We found that partitioning the pivotal discrepancy measures increased the power of the test.</p>

2021 ◽  
Author(s):  
Lindsay Morris

<p><b>Spatial and spatio-temporal phenomena are commonly modelled as Gaussian processes via the geostatistical model (Gelfand & Banerjee, 2017). In the geostatistical model the spatial dependence structure is modelled using covariance functions. Most commonly, the covariance functions impose an assumption of spatial stationarity on the process. That means the covariance between observations at particular locations depends only on the distance between the locations (Banerjee et al., 2014). It has been widely recognized that most, if not all, processes manifest spatially nonstationary covariance structure Sampson (2014). If the study domain is small in area or there is not enough data to justify more complicated nonstationary approaches, then stationarity may be assumed for the sake of mathematical convenience (Fouedjio, 2017). However, relationships between variables can vary significantly over space, and a ‘global’ estimate of the relationships may obscure interesting geographical phenomena (Brunsdon et al., 1996; Fouedjio, 2017; Sampson & Guttorp, 1992). </b></p> <p>In this thesis, we considered three non-parametric approaches to flexibly account for non-stationarity in both spatial and spatio-temporal processes. First, we proposed partitioning the spatial domain into sub-regions using the K-means clustering algorithm based on a set of appropriate geographic features. This allowed for fitting separate stationary covariance functions to the smaller sub-regions to account for local differences in covariance across the study region. Secondly, we extended the concept of covariance network regression to model the covariance matrix of both spatial and spatio-temporal processes. The resulting covariance estimates were found to be more flexible in accounting for spatial autocorrelation than standard stationary approaches. The third approach involved geographic random forest methodology using a neighbourhood structure for each location constructed through clustering. We found that clustering based on geographic measures such as longitude and latitude ensured that observations that were too far away to have any influence on the observations near the locations where a local random forest was fitted were not selected to form the neighbourhood. </p> <p>In addition to developing flexible methods to account for non-stationarity, we developed a pivotal discrepancy measure approach for goodness-of-fit testing of spatio-temporal geostatistical models. We found that partitioning the pivotal discrepancy measures increased the power of the test.</p>


2017 ◽  
Vol 6 (3) ◽  
pp. 43
Author(s):  
Nikolai Kolev ◽  
Jayme Pinto

The dependence structure between 756 prices for futures on crude oil and natural gas traded on NYMEX is analyzed  using  a combination of novel time-series and copula tools.  We model the log-returns on each commodity individually by Generalized Autoregressive Score models and account for dependence between them by fitting various copulas to corresponding  error terms. Our basic assumption is that the dependence structure may vary over time, but the ratio between the joint distribution of error terms and the product of marginal distributions (e.g., Sibuya's dependence function) remains the same, being time-invariant.  By performing conventional goodness-of-fit tests, we select the best copula, being member of the currently  introduced class of  Sibuya-type copulas.


Atmosphere ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 238
Author(s):  
Pablo Contreras ◽  
Johanna Orellana-Alvear ◽  
Paul Muñoz ◽  
Jörg Bendix ◽  
Rolando Célleri

The Random Forest (RF) algorithm, a decision-tree-based technique, has become a promising approach for applications addressing runoff forecasting in remote areas. This machine learning approach can overcome the limitations of scarce spatio-temporal data and physical parameters needed for process-based hydrological models. However, the influence of RF hyperparameters is still uncertain and needs to be explored. Therefore, the aim of this study is to analyze the sensitivity of RF runoff forecasting models of varying lead time to the hyperparameters of the algorithm. For this, models were trained by using (a) default and (b) extensive hyperparameter combinations through a grid-search approach that allow reaching the optimal set. Model performances were assessed based on the R2, %Bias, and RMSE metrics. We found that: (i) The most influencing hyperparameter is the number of trees in the forest, however the combination of the depth of the tree and the number of features hyperparameters produced the highest variability-instability on the models. (ii) Hyperparameter optimization significantly improved model performance for higher lead times (12- and 24-h). For instance, the performance of the 12-h forecasting model under default RF hyperparameters improved to R2 = 0.41 after optimization (gain of 0.17). However, for short lead times (4-h) there was no significant model improvement (0.69 < R2 < 0.70). (iii) There is a range of values for each hyperparameter in which the performance of the model is not significantly affected but remains close to the optimal. Thus, a compromise between hyperparameter interactions (i.e., their values) can produce similar high model performances. Model improvements after optimization can be explained from a hydrological point of view, the generalization ability for lead times larger than the concentration time of the catchment tend to rely more on hyperparameterization than in what they can learn from the input data. This insight can help in the development of operational early warning systems.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Prabina Kumar Meher ◽  
Anil Rai ◽  
Atmakuri Ramakrishna Rao

Abstract Background Localization of messenger RNAs (mRNAs) plays a crucial role in the growth and development of cells. Particularly, it plays a major role in regulating spatio-temporal gene expression. The in situ hybridization is a promising experimental technique used to determine the localization of mRNAs but it is costly and laborious. It is also a known fact that a single mRNA can be present in more than one location, whereas the existing computational tools are capable of predicting only a single location for such mRNAs. Thus, the development of high-end computational tool is required for reliable and timely prediction of multiple subcellular locations of mRNAs. Hence, we develop the present computational model to predict the multiple localizations of mRNAs. Results The mRNA sequences from 9 different localizations were considered. Each sequence was first transformed to a numeric feature vector of size 5460, based on the k-mer features of sizes 1–6. Out of 5460 k-mer features, 1812 important features were selected by the Elastic Net statistical model. The Random Forest supervised learning algorithm was then employed for predicting the localizations with the selected features. Five-fold cross-validation accuracies of 70.87, 68.32, 68.36, 68.79, 96.46, 73.44, 70.94, 97.42 and 71.77% were obtained for the cytoplasm, cytosol, endoplasmic reticulum, exosome, mitochondrion, nucleus, pseudopodium, posterior and ribosome respectively. With an independent test set, accuracies of 65.33, 73.37, 75.86, 72.99, 94.26, 70.91, 65.53, 93.60 and 73.45% were obtained for the respective localizations. The developed approach also achieved higher accuracies than the existing localization prediction tools. Conclusions This study presents a novel computational tool for predicting the multiple localization of mRNAs. Based on the proposed approach, an online prediction server “mLoc-mRNA” is accessible at http://cabgrid.res.in:8080/mlocmrna/. The developed approach is believed to supplement the existing tools and techniques for the localization prediction of mRNAs.


Author(s):  
Fei Yang ◽  
Yanchen Wang ◽  
Peter J. Jin ◽  
Dingbang Li ◽  
Zhenxing Yao

Cellular phone data has been proven to be valuable in the analysis of residents’ travel patterns. Existing studies mostly identify the trip ends through rule-based or clustering algorithms. These methods largely depend on subjective experience and users’ communication behaviors. Moreover, limited by privacy policy, the accuracy of these methods is difficult to assess. In this paper, points of interest data is applied to supplement cellular phone data’s missing information generated by users’ behaviors. Specifically, a random forest model for trip end identification is proposed using multi-dimensional attributes. A field data acquisition test is designed and conducted with communication operators to implement synchronized cellular phone data and real trip information collection. The proposed identification approach is empirically evaluated with real trip information. Results show that the overall trip end detection precision and recall reach 95.2% and 88.7% with an average distance error of 269 m, and the time errors of the trip ends are less than 10 min. Compared with the rule-based approach, clustering algorithm, naive Bayes method, and support vector machine, the proposed method has better performance in accuracy and consistency.


2015 ◽  
Vol 8 (1) ◽  
pp. 103-124
Author(s):  
Gabriel Gaiduchevici

AbstractThe copula-GARCH approach provides a flexible and versatile method for modeling multivariate time series. In this study we focus on describing the credit risk dependence pattern between real and financial sectors as it is described by two representative iTraxx indices. Multi-stage estimation is used for parametric ARMA-GARCH-copula models. We derive critical values for the parameter estimates using asymptotic, bootstrap and copula sampling methods. The results obtained indicate a positive symmetric dependence structure with statistically significant tail dependence coefficients. Goodness-of-Fit tests indicate which model provides the best fit to data.


Author(s):  
J. W. Li ◽  
X. Q. Han ◽  
J. W. Jiang ◽  
Y. Hu ◽  
L. Liu

Abstract. How to establish an effective method of large data analysis of geographic space-time and quickly and accurately find the hidden value behind geographic information has become a current research focus. Researchers have found that clustering analysis methods in data mining field can well mine knowledge and information hidden in complex and massive spatio-temporal data, and density-based clustering is one of the most important clustering methods.However, the traditional DBSCAN clustering algorithm has some drawbacks which are difficult to overcome in parameter selection. For example, the two important parameters of Eps neighborhood and MinPts density need to be set artificially. If the clustering results are reasonable, the more suitable parameters can not be selected according to the guiding principles of parameter setting of traditional DBSCAN clustering algorithm. It can not produce accurate clustering results.To solve the problem of misclassification and density sparsity caused by unreasonable parameter selection in DBSCAN clustering algorithm. In this paper, a DBSCAN-based data efficient density clustering method with improved parameter optimization is proposed. Its evaluation index function (Optimal Distance) is obtained by cycling k-clustering in turn, and the optimal solution is selected. The optimal k-value in k-clustering is used to cluster samples. Through mathematical and physical analysis, we can determine the appropriate parameters of Eps and MinPts. Finally, we can get clustering results by DBSCAN clustering. Experiments show that this method can select parameters reasonably for DBSCAN clustering, which proves the superiority of the method described in this paper.


2022 ◽  
Vol 11 (1) ◽  
pp. 60
Author(s):  
Zhihuan Wang ◽  
Chenguang Meng ◽  
Mengyuan Yao ◽  
Christophe Claramunt

Maritime ports are critical logistics hubs that play an important role when preventing the transmission of COVID-19-imported infections from incoming international-going ships. This study introduces a data-driven method to dynamically model infection risks of international ports from imported COVID-19 cases. The approach is based on global Automatic Identification System (AIS) data and a spatio-temporal clustering algorithm that both automatically identifies ports and countries approached by ships and correlates them with country COVID-19 statistics and stopover dates. The infection risk of an individual ship is firstly modeled by considering the current number of COVID-19 cases of the approached countries, increase rate of the new cases, and ship capacity. The infection risk of a maritime port is mainly calculated as the aggregation of the risks of all of the ships stopovering at a specific date. This method is applied to track the risk of the imported COVID-19 of the main cruise ports worldwide. The results show that the proposed method dynamically estimates the risk level of the overseas imported COVID-19 of cruise ports and has the potential to provide valuable support to improve prevention measures and reduce the risk of imported COVID-19 cases in seaports.


2020 ◽  
Vol 6 (10) ◽  
pp. 2002-2023
Author(s):  
Shahid Latif ◽  
Firuza Mustafa

Floods are becoming the most severe and challenging hydrologic issue at the Kelantan River basin in Malaysia. Flood episodes are usually thoroughly characterized by flood peak discharge flow, volume and duration series. This study incorporated the copula-based methodology in deriving the joint distribution analysis of the annual flood characteristics and the failure probability for assessing the bivariate hydrologic risk. Both the Archimedean and Gaussian copula family were introduced and tested as possible candidate functions. The copula dependence parameters are estimated using the method-of-moment estimation procedure. The Gaussian copula was recognized as the best-fitted distribution for capturing the dependence structure of the flood peak-volume and peak-duration pairs based on goodness-of-fit test statistics and was further employed to derive the joint return periods. The bivariate hydrologic risks of flood peak flow and volume pair, and flood peak flow and duration pair in different return periods (i.e., 5, 10, 20, 50 and 100 years) were estimated and revealed that the risk statistics incrementally increase in the service lifetime and, at the same instant, incrementally decrease in return periods. In addition, we found that ignoring the mutual dependency can underestimate the failure probabilities where the univariate events produced a lower failure probability than the bivariate events. Similarly, the variations in bivariate hydrologic risk with the changes of flood peak in the different synthetic flood volume and duration series (i.e., 5, 10, 20, 50 and 100 years return periods) under different service lifetimes are demonstrated. Investigation revealed that the value of bivariate hydrologic risk statistics incrementally increases over the project lifetime (i.e., 30, 50, and 100 years) service time, and at the same time, it incrementally decreases in the return period of flood volume and duration. Overall, this study could provide a basis for making an appropriate flood defence plan and long-lasting infrastructure designs. Doi: 10.28991/cej-2020-03091599 Full Text: PDF


Sign in / Sign up

Export Citation Format

Share Document