An easy way to create duration variables in binary cross-sectional time-series data

Author(s):  
Andrew Q. Philips

In cross-sectional time-series data with a dichotomous dependent variable, failing to account for duration dependence when it exists can lead to faulty inferences. A common solution is to include duration dummies, polynomials, or splines to proxy for duration dependence. Because creating these is not easy for the common practitioner, I introduce a new command, mkduration, that is a straightforward way to generate a duration variable for binary cross-sectional time-series data in Stata. mkduration can handle various forms of missing data and allows the duration variable to easily be turned into common parametric and nonparametric approximations.

2017 ◽  
Vol 25 (6) ◽  
pp. 645-653 ◽  
Author(s):  
Yuan Luo ◽  
Peter Szolovits ◽  
Anand S Dighe ◽  
Jason M Baron

Abstract Objective A key challenge in clinical data mining is that most clinical datasets contain missing data. Since many commonly used machine learning algorithms require complete datasets (no missing data), clinical analytic approaches often entail an imputation procedure to “fill in” missing data. However, although most clinical datasets contain a temporal component, most commonly used imputation methods do not adequately accommodate longitudinal time-based data. We sought to develop a new imputation algorithm, 3-dimensional multiple imputation with chained equations (3D-MICE), that can perform accurate imputation of missing clinical time series data. Methods We extracted clinical laboratory test results for 13 commonly measured analytes (clinical laboratory tests). We imputed missing test results for the 13 analytes using 3 imputation methods: multiple imputation with chained equations (MICE), Gaussian process (GP), and 3D-MICE. 3D-MICE utilizes both MICE and GP imputation to integrate cross-sectional and longitudinal information. To evaluate imputation method performance, we randomly masked selected test results and imputed these masked results alongside results missing from our original data. We compared predicted results to measured results for masked data points. Results 3D-MICE performed significantly better than MICE and GP-based imputation in a composite of all 13 analytes, predicting missing results with a normalized root-mean-square error of 0.342, compared to 0.373 for MICE alone and 0.358 for GP alone. Conclusions 3D-MICE offers a novel and practical approach to imputing clinical laboratory time series data. 3D-MICE may provide an additional tool for use as a foundation in clinical predictive analytics and intelligent clinical decision support.


Author(s):  
Josep Escrig Escrig ◽  
Buddhika Hewakandamby ◽  
Georgios Dimitrakis ◽  
Barry Azzopardi

Intermittent gas and liquid two-phase flow was generated in a 6 m × 67 mm diameter pipe mounted rotatable frame (vertical up to −20°). Air and a 5 mPa s silicone oil at atmospheric pressure were studied. Gas superficial velocities between 0.17 and 2.9 m/s and liquid superficial velocities between 0.023 and 0.47 m/s were employed. These runs were repeated at 7 angles making a total of 420 runs. Cross sectional void fraction time series were measured over 60 seconds for each run using a Wire Mesh Sensor and a twin plane Electrical Capacitance Tomography. The void fraction time series data were analysed in order to extract average void fraction, structure velocities and structure frequencies. Results are presented to illustrate the effect of the angle as well as the phase superficial velocities affect the intermittent flows behaviour. Existing correlations suggested to predict average void fraction and gas structures velocity and frequency in slug flow have been compared with new experimental results for any intermittent flow including: slug, cap bubble and churn. Good agreements have been seen for the gas structure velocity and mean void fraction. On the other hand, no correlation was found to predict the gas structure frequency, especially in vertical and inclined pipes.


2020 ◽  
Vol 27 (1) ◽  
Author(s):  
E Afrifa‐Yamoah ◽  
U. A. Mueller ◽  
S. M. Taylor ◽  
A. J. Fisher

2017 ◽  
Vol 12 (2) ◽  
pp. 151 ◽  
Author(s):  
Yusuf Ali Al-Hroot ◽  
Laith Akram Muflih AL-Qudah ◽  
Faris Irsheid Audeh Alkharabsha

This paper intends to investigate whether the financial crisis (2008) exerted an impact on the level of accounting conservatism in the case of Jordanian commercial banks before and during the financial crisis. The sample of this study includes 78 observations; these observations are based on the financial statements of all commercial banks in Jordan and may be referred to as cross-sectional data, whereas the period from 2005 to 2011 represents a range of years characterized by time series data. The appropriate regression model to measure the relationship between cross-sectional data and time series data is in this case the pooled data regression (PDR) using the ordinary least squares (OLS) method. The results indicate that the level of accounting conservatism had been steadily increasing over a period of three years from 2005 to 2007. The results also indicate that the level of accounting conservatism was subjected to an increase during crisis period between 2009 and 2011 compared with the level of accounting conservatism for the period 2005-2007 preceding the global financial crisis. The F-test was used in order to test the significant differences between the regression coefficients for the period before and during the global financial crisis. The results indicate a positive impact on the accounting conservatism during the global financial crisis compared with the period before the global financial crisis. The p-value is 0.040 which indicates that there are statistically significant differences between the two periods; these results are consistent with the results in Sampaio (2015).


1986 ◽  
Vol 2 (3) ◽  
pp. 331-349 ◽  
Author(s):  
John J. Beggs

This article proposes the use of spectral methods to pool cross-sectional replications (N) of time series data (T) for time series analysis. Spectral representations readily suggest a weighting scheme to pool the data. The asymptotically desirable properties of the resulting estimators seem to translate satisfactorily into samples as small as T = 25 with N = 5. Simulation results, Monte Carlo results, and an empirical example help confirm this finding. The article concludes that there are many empirical situations where spectral methods canbe used where they were previously eschewed.


2007 ◽  
Vol 23 (4) ◽  
pp. 227-237 ◽  
Author(s):  
Thomas Kubiak ◽  
Cornelia Jonas

Abstract. Patterns of psychological variables in time have been of interest to research from the beginning. This is particularly true for ambulatory monitoring research, where large (cross-sectional) time-series datasets are often the matter of investigation. Common methods for identifying cyclic variations include spectral analyses of time-series data or time-domain based strategies, which also allow for modeling cyclic components. Though the prerequisites of these sophisticated procedures, such as interval-scaled time-series variables, are seldom met, their usage is common. In contrast to the time-series approach, methods from a different field of statistics, directional or circular statistics, offer another opportunity for the detection of patterns in time, where fewer prerequisites have to be met. These approaches are commonly used in biology or geostatistics. They offer a wide range of analytical strategies to examine “circular data,” i.e., data where period of measurement is rotationally invariant (e.g., directions on the compass or daily hours ranging from 0 to 24, 24 being the same as 0). In psychology, however, circular statistics are hardly known at all. In the present paper, we intend to give a succinct introduction into the rationale of circular statistics and describe how this approach can be used for the detection of patterns in time, contrasting it with time-series analysis. We report data from a monitoring study, where mood and social interactions were assessed for 4 weeks in order to illustrate the use of circular statistics. Both the results of periodogram analyses and circular statistics-based results are reported. Advantages and possible pitfalls of the circular statistics approach are highlighted concluding that ambulatory assessment research can benefit from strategies borrowed from circular statistics.


2008 ◽  
Vol 9 (1) ◽  
pp. 1-19 ◽  
Author(s):  
KENTARO FUKUMOTO

AbstractLegislative scholars have debated what factors (e.g. divided government) account for the number of important laws a legislative body passes per year. This paper presents a monopoly model for explaining legislative production. It assumes that a legislature adjusts its law production so as to maximize its utility. The model predicts that socio-economic and political changes increase the marginal benefit of law production, whereas low negotiation costs and ample legislative resources decrease the marginal cost of law production. The model is tested in two ways. The first approach compares the legislatures of 42 developed and developing countries. The second analyzes Japanese lawmaking from 1949 to 1990, using an appropriate method for event count time series data. Both empirical investigations support the model's predictions for legislative production.


2011 ◽  
Vol 14 (2) ◽  
pp. 71-79
Author(s):  
Anh Tuan Duong

Time series data occur in many real life applications, ranging from science and engineering to business. In many of these applications, searching through large time series database based on query sequence is often desirable. Such similarity-based retrieval is also the basic subroutine in several advanced time series data mining tasks such as clustering, classification, finding motifs, detecting anomaly patterns, rule discovery and visualization. Although several different approaches have been developed, most are based on the common premise of dimensionality reduction and spatial access methods. This survey gives an overview of recent research and shows how the methods fit into a general framework of feature extraction.


2021 ◽  
Vol 6 (1) ◽  
pp. 1-4
Author(s):  
Bo Yuan Chang ◽  
Mohamed A. Naiel ◽  
Steven Wardell ◽  
Stan Kleinikkink ◽  
John S. Zelek

Over the past years, researchers have proposed various methods to discover causal relationships among time-series data as well as algorithms to fill in missing entries in time-series data. Little to no work has been done in combining the two strategies for the purpose of learning causal relationships using unevenly sampled multivariate time-series data. In this paper, we examine how the causal parameters learnt from unevenly sampled data (with missing entries) deviates from the parameters learnt using the evenly sampled data (without missing entries). However, to obtain the causal relationship from a given time-series requires evenly sampled data, which suggests filling the missing data values before obtaining the causal parameters. Therefore, the proposed method is based on applying a Gaussian Process Regression (GPR) model for missing data recovery, followed by several pairwise Granger causality equations in Vector Autoregssive form to fit the recovered data and obtain the causal parameters. Experimental results show that the causal parameters generated by using GPR data filling offers much lower RMSE than the dummy model (fill with last seen entry) under all missing values percentage, suggesting that GPR data filling can better preserve the causal relationships when compared with dummy data filling, thus should be considered when dealing with unevenly sampled time-series causality learning.


Sign in / Sign up

Export Citation Format

Share Document