Detecting long-memory: Monte Carlo simulations and application to daily streamflow processes

Abstract. The Lo's modified rescaled adjusted range test (R/S test) (Lo, 1991), GPH test (Geweke and Porter-Hudak, 1983) and two approximate maximum likelihood estimation methods, i.e., Whittle's estimator (W-MLE) and another one implemented in S-Plus (S-MLE) based on the algorithm of Haslett and Raftery (1989) are evaluated through intensive Monte Carlo simulations for detecting the existence of long-memory. It is shown that it is difficult to find an appropriate lag q for Lo's test for different short-memory autoregressive (AR) and fractionally integrated autoregressive and moving average (ARFIMA) processes, which makes the use of Lo's test very tricky. In general, the GPH test outperforms the Lo's test, but for cases where a strong short-range dependence exists (e.g., AR(1) processes with φ=0.95 or even 0.99), the GPH test gets useless, even for time series of large data size. On the other hand, the estimates of d given by S-MLE and W-MLE seem to give a good indication of whether or not the long-memory is present. The simulation results show that data size has a significant impact on the power of all the four methods because the availability of larger samples allows one to inspect the asymptotical properties better. Generally, the power of Lo's test and GPH test increases with increasing data size, and the estimates of d with GPH method, S-MLE method and W-MLE method converge with increasing data size. If no large enough data set is available, we should be aware of the possible bias of the estimates. The four methods are applied to daily average discharge series recorded at 31 gauging stations with different drainage areas in eight river basins in Europe, Canada and USA to detect the existence of long-memory. The results show that the presence of long-memory in 29 daily series is confirmed by at least three methods, whereas the other two series are indicated to be long-memory processes with two methods. The intensity of long-memory in daily streamflow processes has only a very weak positive relationship with the scale of watershed.

Download Full-text

Detecting long-memory: Monte Carlo simulations and application to daily streamflow processes

Hydrology and Earth System Sciences Discussions ◽

10.5194/hessd-3-1603-2006 ◽

2006 ◽

Vol 3 (4) ◽

pp. 1603-1627 ◽

Cited By ~ 5

Author(s):

W. Wang ◽

P. H. A. J. M. van Gelder ◽

J. K. Vrijling ◽

X. Chen

Keyword(s):

Time Series ◽

Monte Carlo ◽

Maximum Likelihood ◽

Long Memory ◽

Positive Relationship ◽

Estimation Method ◽

Likelihood Estimation ◽

Large Data ◽

Daily Streamflow ◽

Maximum Likelihood Estimation Method

Abstract. The Lo's R/S tests (Lo, 1991), GPH test (Geweke and Porter-Hudak, 1983) and the maximum likelihood estimation method implemented in S-Plus (S-MLE) are evaluated through intensive Mote Carlo simulations for detecting the existence of long-memory. It is shown that, it is difficult to find an appropriate lag q for Lo's test for different AR and ARFIMA processes, which makes the use of Lo's test very tricky. In general, the GPH test outperforms the Lo's test, but for cases where there is strong autocorrelations (e.g., AR(1) processes with φ=0.97 or even 0.99), the GPH test is totally useless, even for time series of large data size. Although S-MLE method does not provide a statistic test for the existence of long-memory, the estimates of d given by S-MLE seems to give a good indication of whether or not the long-memory is present. Data size has a significant impact on the power of all the three methods. Generally, the power of Lo's test and GPH test increases with the increase of data size, and the estimates of d with GPH test and S-MLE converge with the increase of data size. According to the results with the Lo's R/S test (Lo, 1991), GPH test (Geweke and Porter-Hudak, 1983) and the S-MLE method, all daily flow series exhibit long-memory. The intensity of long-memory in daily streamflow processes has only a very weak positive relationship with the scale of watershed.

Download Full-text

An Accurate Substitution Method To Minimize Left Censoring Bias in Serum Steroid Measurements

Endocrinology ◽

10.1210/en.2019-00340 ◽

2019 ◽

Vol 160 (10) ◽

pp. 2395-2400 ◽

Cited By ~ 5

Author(s):

David J Handelsman ◽

Lam P Ly

Keyword(s):

Data Analysis ◽

Ad Hoc ◽

Likelihood Estimation ◽

Large Data ◽

Serum Testosterone ◽

Accurate Method ◽

Estimation Methods ◽

Data Sets ◽

Full Data ◽

Data Set

Abstract Hormone assay results below the assay detection limit (DL) can introduce bias into quantitative analysis. Although complex maximum likelihood estimation methods exist, they are not widely used, whereas simple substitution methods are often used ad hoc to replace the undetectable (UD) results with numeric values to facilitate data analysis with the full data set. However, the bias of substitution methods for steroid measurements is not reported. Using a large data set (n = 2896) of serum testosterone (T), DHT, estradiol (E2) concentrations from healthy men, we created modified data sets with increasing proportions of UD samples (≤40%) to which we applied five different substitution methods (deleting UD samples as missing and substituting UD sample with DL, DL/√2, DL/2, or 0) to calculate univariate descriptive statistics (mean, SD) or bivariate correlations. For all three steroids and for univariate as well as bivariate statistics, bias increased progressively with increasing proportion of UD samples. Bias was worst when UD samples were deleted or substituted with 0 and least when UD samples were substituted with DL/√2, whereas the other methods (DL or DL/2) displayed intermediate bias. Similar findings were replicated in randomly drawn small subsets of 25, 50, and 100. Hence, we propose that in steroid hormone data with ≤40% UD samples, substituting UD with DL/√2 is a simple, versatile, and reasonably accurate method to minimize left censoring bias, allowing for data analysis with the full data set.

Download Full-text

Estimation Methods of Alpha Power Exponential Distribution with Applications to Engineering and Medical Data

Pakistan Journal of Statistics and Operation Research ◽

10.18187/pjsor.v16i1.3129 ◽

2020 ◽

pp. 149-166 ◽

Cited By ~ 6

Author(s):

Mazen Nassar ◽

Ahmed Z. Afify ◽

Mohammed Shakhatreh

Keyword(s):

Monte Carlo ◽

Monte Carlo Simulations ◽

Exponential Distribution ◽

Real Data ◽

Medical Data ◽

Estimation Methods ◽

Alpha Power ◽

Unknown Parameters ◽

Data Set ◽

Distribution Parameters

This paper addresses the estimation of the unknown parameters of the alphapower exponential distribution (Mahdavi and Kundu, 2017) using nine frequentist estimation methods. We discuss the nite sample properties of the parameterestimates of the alpha power exponential distribution via Monte Carlo simulations. The potentiality of the distribution is analyzed by means of two real datasets from the elds of engineering and medicine. Finally, we use the maximumlikelihood method to derive the estimates of the distribution parameters undercompeting risks data and analyze one real data set.

Download Full-text

Some statistical and CI models to predict chaotic high-frequency financial data

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189107 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6419-6430

Author(s):

Dusan Marcek

Keyword(s):

Time Series Data ◽

Moving Average ◽

Methodological Approach ◽

Back Propagation ◽

Large Data ◽

Series Data ◽

Data Set ◽

Training Time ◽

Optimal Population ◽

Forecast Time

To forecast time series data, two methodological frameworks of statistical and computational intelligence modelling are considered. The statistical methodological approach is based on the theory of invertible ARIMA (Auto-Regressive Integrated Moving Average) models with Maximum Likelihood (ML) estimating method. As a competitive tool to statistical forecasting models, we use the popular classic neural network (NN) of perceptron type. To train NN, the Back-Propagation (BP) algorithm and heuristics like genetic and micro-genetic algorithm (GA and MGA) are implemented on the large data set. A comparative analysis of selected learning methods is performed and evaluated. From performed experiments we find that the optimal population size will likely be 20 with the lowest training time from all NN trained by the evolutionary algorithms, while the prediction accuracy level is lesser, but still acceptable by managers.

Download Full-text

Maximum Likelihood Estimation of the VAR(1) Model Parameters with Missing Observations

Mathematical Problems in Engineering ◽

10.1155/2013/848120 ◽

2013 ◽

Vol 2013 ◽

pp. 1-13 ◽

Cited By ~ 2

Author(s):

Helena Mouriño ◽

Maria Isabel Barão

Keyword(s):

Stochastic Process ◽

Missing Data ◽

Maximum Likelihood ◽

Moving Average ◽

Practical Importance ◽

Likelihood Estimation ◽

Model Parameters ◽

Missing Observations ◽

Data Set ◽

The Impact

Missing-data problems are extremely common in practice. To achieve reliable inferential results, we need to take into account this feature of the data. Suppose that the univariate data set under analysis has missing observations. This paper examines the impact of selecting an auxiliary complete data set—whose underlying stochastic process is to some extent interdependent with the former—to improve the efficiency of the estimators for the relevant parameters of the model. The Vector AutoRegressive (VAR) Model has revealed to be an extremely useful tool in capturing the dynamics of bivariate time series. We propose maximum likelihood estimators for the parameters of the VAR(1) Model based on monotone missing data pattern. Estimators’ precision is also derived. Afterwards, we compare the bivariate modelling scheme with its univariate counterpart. More precisely, the univariate data set with missing observations will be modelled by an AutoRegressive Moving Average (ARMA(2,1)) Model. We will also analyse the behaviour of the AutoRegressive Model of order one, AR(1), due to its practical importance. We focus on the mean value of the main stochastic process. By simulation studies, we conclude that the estimator based on the VAR(1) Model is preferable to those derived from the univariate context.

Download Full-text

Overshooting calibration and age determination from evolved binary systems

Astronomy and Astrophysics ◽

10.1051/0004-6361/201832668 ◽

2018 ◽

Vol 615 ◽

pp. A62 ◽

Cited By ~ 7

Author(s):

G. Valle ◽

M. Dell’Omodarme ◽

P. G. Prada Moroni ◽

S. Degl’Innocenti

Keyword(s):

Monte Carlo ◽

Monte Carlo Simulations ◽

Relative Error ◽

Binary Systems ◽

The Other ◽

Helium Burning ◽

Primary Star ◽

Large Variability ◽

Convective Core ◽

The Masses

Aims. The capability of grid-based techniques to estimate the age together with the convective core overshooting efficiency of stars in detached eclipsing binary systems for main sequence stars has previously been investigated. We have extended this investigation to later evolutionary stages and have evaluated the bias and variability on the recovered age and convective core overshooting parameter accounting for both observational and internal uncertainties. Methods. We considered synthetic binary systems, whose age and overshooting efficiency should be recovered by applying the SCEPtER pipeline to the same grid of models used to build the mock stars. We focus our attention on a binary system composed of a 2.50 M⊙ primary star coupled with a 2.38 M⊙ secondary. To explore different evolutionary scenarios, we performed the estimation at three different times: when the primary is at the end of the central helium burning, when it is at the bottom of the RGB, and when it is in the helium core burning phase. The Monte Carlo simulations have been carried out for two typical values of accuracy on the mass determination, that is, 1% and 0.1%. Results. Adopting typical observational uncertainties, we found that the recovered age and overshooting efficiency are biased towards low values in all three scenarios. For an uncertainty on the masses of 1%, the underestimation is particularly relevant for a primary in the central helium burning stage, reaching − 8.5% in age and − 0.04 (− 25% relative error) in the overshooting parameter β. In the other scenarios, an undervaluation of the age by about 4% occurs. A large variability in the fitted values between Monte Carlo simulations was found: for an individual system calibration, the value of the overshooting parameter can vary from β = 0.0 to β = 0.26. When adopting a 0.1% error on the masses, the biases remain nearly unchanged but the global variability is suppressed by a factor of about two. We also explored the effect of a systematic discrepancy between the artificial systems and the model grid by accounting for an offset in the effective temperature of the stars by ± 150 K. For a mass error of 1% the overshooting parameter is largely biased towards the edges of the explored range, while for the lower mass uncertainty it is basically unconstrained from 0.0 to 0.2. We also evaluate the possibility of individually recovering the β value for both binary stars. We found that this is impossible for a primary near to central hydrogen exhaustion owing to huge biases for the primary star of + 0.14 (90% relative error), while in the other cases the fitted β are consistent, but always biased by about − 0.04 (− 25% relative error). Finally, the possibility to distinguish between models computed with mild overshooting from models with no overshooting was evaluated, resulting in a reassuring power of distinction greater than 80%. However, the scenario with a primary in the central helium burning was a notable exception, showing a power of distinction lower than 5%.

Download Full-text

An electron-impact cross section data set (10 eV–1 keV) of DNA constituents based on consistent experimental data: A requisite for Monte Carlo simulations

Radiation Physics and Chemistry ◽

10.1016/j.radphyschem.2016.09.027 ◽

2017 ◽

Vol 130 ◽

pp. 459-479 ◽

Cited By ~ 29

Author(s):

Marion U. Bug ◽

Woon Yong Baek ◽

Hans Rabus ◽

Carmen Villagrasa ◽

Sylvain Meylan ◽

...

Keyword(s):

Experimental Data ◽

Monte Carlo ◽

Monte Carlo Simulations ◽

Electron Impact ◽

Cross Section ◽

Cross Section Data ◽

Data Set ◽

Section Data ◽

Dna Constituents

Download Full-text

There Is No Logical Negation Here, But There Are Alternatives: Modeling Conversational Negation with Distributional Semantics

Computational Linguistics ◽

10.1162/coli_a_00262 ◽

2016 ◽

Vol 42 (4) ◽

pp. 637-660 ◽

Cited By ~ 2

Author(s):

Germán Kruszewski ◽

Denis Paperno ◽

Raffaella Bernardi ◽

Marco Baroni

Keyword(s):

Large Data ◽

Semantic Space ◽

The Other ◽

Similarity Function ◽

Distributional Semantics ◽

Application Domain ◽

Data Set ◽

Logical Negation ◽

The One ◽

The Right

Logical negation is a challenge for distributional semantics, because predicates and their negations tend to occur in very similar contexts, and consequently their distributional vectors are very similar. Indeed, it is not even clear what properties a “negated” distributional vector should possess. However, when linguistic negation is considered in its actual discourse usage, it often performs a role that is quite different from straightforward logical negation. If someone states, in the middle of a conversation, that “This is not a dog,” the negation strongly suggests a restricted set of alternative predicates that might hold true of the object being talked about. In particular, other canids and middle-sized mammals are plausible alternatives, birds are less likely, skyscrapers and other large buildings virtually impossible. Conversational negation acts like a graded similarity function, of the sort that distributional semantics might be good at capturing. In this article, we introduce a large data set of alternative plausibility ratings for conversationally negated nominal predicates, and we show that simple similarity in distributional semantic space provides an excellent fit to subject data. On the one hand, this fills a gap in the literature on conversational negation, proposing distributional semantics as the right tool to make explicit predictions about potential alternatives of negated predicates. On the other hand, the results suggest that negation, when addressed from a broader pragmatic perspective, far from being a nuisance, is an ideal application domain for distributional semantic methods.

Download Full-text

Clusters Formed by Dumbbell-like One-Patch Particles Confined in Thin Systems

10.21203/rs.3.rs-777716/v1 ◽

2021 ◽

Author(s):

Masahide Sato

Keyword(s):

Monte Carlo ◽

Monte Carlo Simulations ◽

Particle Interaction ◽

Three Dimensional ◽

The Other ◽

Spherical Particles ◽

Dimensionless Distance ◽

Large Island ◽

Cluster Shape

Abstract Performing isothermal-isochoric Monte Carlo simulations, I examine the types of clusters that dumbbell-like one–patch particles form in thin space between two parallel walls, assuming that each particle is synthesized through the merging of two particles, one non-attracting and the other attracting for which, for example, the inter-particle interaction is approximated by the DLVO model. The shape of these dumbbell-like particles is controlled by the ratio of the diameters q of the two spherical particles and by the dimensionless distance l between them. Using a modified Kern–Frenkel potential, I examine the dependence of the cluster shape on l and q. Large island-like clusters are created when q < 1. With increasing q, the clusters become chain-like. When q increases further, elongated clusters and regular polygonal clusters are created. In hte simulations, the cluster shape becomes three-dimensional with increasing l because the thickness of the thin system increases proportionally to l.

Download Full-text

Clusters formed by dumbbell-like one-patch particles confined in thin systems

Scientific Reports ◽

10.1038/s41598-021-97542-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Masahide Sato

Keyword(s):

Monte Carlo ◽

Monte Carlo Simulations ◽

Particle Interaction ◽

Three Dimensional ◽

The Other ◽

Spherical Particles ◽

Dimensionless Distance ◽

Large Island ◽

Cluster Shape

AbstractPerforming isothermal-isochoric Monte Carlo simulations, I examine the types of clusters that dumbbell-like one–patch particles form in thin space between two parallel walls, assuming that each particle is synthesized through the merging of two particles, one non-attracting and the other attracting for which, for example, the inter-particle interaction is approximated by the DLVO model . The shape of these dumbbell-like particles is controlled by the ratio of the diameters q of the two spherical particles and by the dimensionless distance l between these centers. Using a modified Kern–Frenkel potential, I examine the dependence of the cluster shape on l and q. Large island-like clusters are created when $$q<1$$ q < 1 . With increasing q, the clusters become chain-like . When q increases further, elongated clusters and regular polygonal clusters are created. In the simulations, the cluster shape becomes three-dimensional with increasing l because the thickness of the thin system increases proportionally to l.

Download Full-text