Optimal number and allocation of data collection points for linear spline growth curve modeling

2016 ◽  
Vol 41 (4) ◽  
pp. 550-558 ◽  
Author(s):  
Wei Wu ◽  
Fan Jia ◽  
Richard Kinai ◽  
Todd D. Little

Spline growth modelling is a popular tool to model change processes with distinct phases and change points in longitudinal studies. Focusing on linear spline growth models with two phases and a fixed change point (the transition point from one phase to the other), we detail how to find optimal data collection designs that maximize the efficiency of detecting key parameters in the spline models, holding the total number of data points or sample size constant. We identify efficient designs for the cases where (a) the exact location of the change point is known (complete certainty), (b) only the interval that contains the change point is known (partial certainty), and (c) no prior knowledge on the location of the change point is available (zero certainty). We conclude with recommendations for optimal number and allocation of data collection points.

2020 ◽  
Vol 3 (4) ◽  
pp. 142-152
Author(s):  
Mohammad Waliul Hasanat ◽  
Kamna Anum ◽  
Ashikul Hoque ◽  
Mahmud Hamid ◽  
Sandy Francis Peris ◽  
...  

In developing countries, the role of women in the business sector is continuously improving. As a result, female enterprises have also been encouraged in Pakistan. This study is based on life cycle development phases from which women-owned enterprises have to go through in order to become successful. As a primary data source, face-to-face interviews with owners of successful women-owned enterprises were preferred. The data collection process was divided into two phases i.e. Phase-I and Phase-II. After data collection, qualitative analysis has been performed using NVIVO. Findings provide both generic and specific factors involved in life cycle development of women-owned enterprises. This study provides a detailed view of life cycle development model followed by successful women enterprises. The outcome of this research work is a theoretical finding which can be utilized by entrepreneurs owning small scale enterprises to improve their level of performance. Findings can also be helpful for potentially talented women interested in setting up their own business.


Water ◽  
2021 ◽  
Vol 13 (12) ◽  
pp. 1633
Author(s):  
Elena-Simona Apostol ◽  
Ciprian-Octavian Truică ◽  
Florin Pop ◽  
Christian Esposito

Due to the exponential growth of the Internet of Things networks and the massive amount of time series data collected from these networks, it is essential to apply efficient methods for Big Data analysis in order to extract meaningful information and statistics. Anomaly detection is an important part of time series analysis, improving the quality of further analysis, such as prediction and forecasting. Thus, detecting sudden change points with normal behavior and using them to discriminate between abnormal behavior, i.e., outliers, is a crucial step used to minimize the false positive rate and to build accurate machine learning models for prediction and forecasting. In this paper, we propose a rule-based decision system that enhances anomaly detection in multivariate time series using change point detection. Our architecture uses a pipeline that automatically manages to detect real anomalies and remove the false positives introduced by change points. We employ both traditional and deep learning unsupervised algorithms, in total, five anomaly detection and five change point detection algorithms. Additionally, we propose a new confidence metric based on the support for a time series point to be an anomaly and the support for the same point to be a change point. In our experiments, we use a large real-world dataset containing multivariate time series about water consumption collected from smart meters. As an evaluation metric, we use Mean Absolute Error (MAE). The low MAE values show that the algorithms accurately determine anomalies and change points. The experimental results strengthen our assumption that anomaly detection can be improved by determining and removing change points as well as validates the correctness of our proposed rules in real-world scenarios. Furthermore, the proposed rule-based decision support systems enable users to make informed decisions regarding the status of the water distribution network and perform effectively predictive and proactive maintenance.


Entropy ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. 740
Author(s):  
Hoshin V. Gupta ◽  
Mohammad Reza Ehsani ◽  
Tirthankar Roy ◽  
Maria A. Sans-Fuentes ◽  
Uwe Ehret ◽  
...  

We develop a simple Quantile Spacing (QS) method for accurate probabilistic estimation of one-dimensional entropy from equiprobable random samples, and compare it with the popular Bin-Counting (BC) and Kernel Density (KD) methods. In contrast to BC, which uses equal-width bins with varying probability mass, the QS method uses estimates of the quantiles that divide the support of the data generating probability density function (pdf) into equal-probability-mass intervals. And, whereas BC and KD each require optimal tuning of a hyper-parameter whose value varies with sample size and shape of the pdf, QS only requires specification of the number of quantiles to be used. Results indicate, for the class of distributions tested, that the optimal number of quantiles is a fixed fraction of the sample size (empirically determined to be ~0.25–0.35), and that this value is relatively insensitive to distributional form or sample size. This provides a clear advantage over BC and KD since hyper-parameter tuning is not required. Further, unlike KD, there is no need to select an appropriate kernel-type, and so QS is applicable to pdfs of arbitrary shape, including those with discontinuous slope and/or magnitude. Bootstrapping is used to approximate the sampling variability distribution of the resulting entropy estimate, and is shown to accurately reflect the true uncertainty. For the four distributional forms studied (Gaussian, Log-Normal, Exponential and Bimodal Gaussian Mixture), expected estimation bias is less than 1% and uncertainty is low even for samples of as few as 100 data points; in contrast, for KD the small sample bias can be as large as -10% and for BC as large as -50%. We speculate that estimating quantile locations, rather than bin-probabilities, results in more efficient use of the information in the data to approximate the underlying shape of an unknown data generating pdf.


2001 ◽  
Vol 38 (04) ◽  
pp. 1033-1054 ◽  
Author(s):  
Liudas Giraitis ◽  
Piotr Kokoszka ◽  
Remigijus Leipus

The paper studies the impact of a broadly understood trend, which includes a change point in mean and monotonic trends studied by Bhattacharyaet al.(1983), on the asymptotic behaviour of a class of tests designed to detect long memory in a stationary sequence. Our results pertain to a family of tests which are similar to Lo's (1991) modifiedR/Stest. We show that both long memory and nonstationarity (presence of trend or change points) can lead to rejection of the null hypothesis of short memory, so that further testing is needed to discriminate between long memory and some forms of nonstationarity. We provide quantitative description of trends which do or do not fool theR/S-type long memory tests. We show, in particular, that a shift in mean of a magnitude larger thanN-½, whereNis the sample size, affects the asymptotic size of the tests, whereas smaller shifts do not do so.


2016 ◽  
Vol 78 (1) ◽  
pp. 24-42 ◽  
Author(s):  
Martin Lytje

This study explores how Danish students experience returning to school following parental bereavement. Eighteen focus group interviews with 39 participants aged 9 to 17 years were conducted. All participants had experienced the loss of a primary caregiver. Data collection was divided into two phases. In Phase I, 22 participants from four grief groups were interviewed 4 times over the course of a year. During Phase II, confirmatory focus groups were undertaken with the 17 participants. This article explores findings related to the four themes of initial school response, long-term support, challenges within the class, and academic challenges. The study found that (a) students struggle to reconnect with classmates following the return to school and often feel alone, (b) schools fail to have guidelines in place for what they are allowed to do if becoming sad the class, and (c) schools seem to forget their loss as time passes.


2019 ◽  
Author(s):  
Benedikt Ley ◽  
Komal Raj Rijal ◽  
Jutta Marfurt ◽  
Nabaraj Adhikari ◽  
Megha Banjara ◽  
...  

Abstract Objective: Electronic data collection (EDC) has become a suitable alternative to paper based data collection (PBDC) in biomedical research even in resource poor settings. During a survey in Nepal, data were collected using both systems and data entry errors compared between both methods. Collected data were checked for completeness, values outside of realistic ranges, internal logic and date variables for reasonable time frames. Variables were grouped into 5 categories and the number of discordant entries were compared between both systems, overall and per variable category. Results: Data from 52 variables collected from 358 participants were available. Discrepancies between both data sets were found in 12.6% of all entries (2352/18,616). Differences between data points were identified in 18.0% (643/3,580) of continuous variables, 15.8% of time variables (113/716), 13.0% of date variables (140/1,074), 12.0% of text variables (86/716), and 10.9% of categorical variables (1,370/12,530). Overall 64% (1,499/2,352) of all discrepancies were due to data omissions, 76.6% (1,148/1,499) of missing entries were among categorical data. Omissions in PBDC (n=1002) were twice as frequent as in EDC (n=497, p<0.001). Data omissions, specifically among categorical variables were identified as the greatest source of error. If designed accordingly, EDC can address this short fall effectively.


Author(s):  
Carlos Henrique Nascimento ◽  
Ires Paula de Andrade Miranda

The purpose was to analyze the Problem-based learning (PBL) as a methodological alternative for primary school that favor learning about Amazonian ecosystems. This research is descriptive with a qualitative-quantitative approach. The study was carried out with students from the 9th year of primary school. The teaching methodology based on the PBL was applied in two phases: In the first phase, a test of previous conceptions was carried out in order to know the perception of the students on topics related to some units of landscapes of the Amazonian ecosystems. The second phase consisted of the implementation of the learning methodology in the school environment. Four different phases were established in the application: i) selection of topics; ii) problem formulation; iii) problem solving; iv) synthesis and evaluation. The data collection instruments used were: preconceptions test and skills chart. The results showed that after the application of the ABRP methodology, the cognitive recognition of the Amazonian ecosystems can be perceived in the students, reaching additional goals that the PCN establish.


2013 ◽  
Author(s):  
Greg Jensen

Identifying discontinuities (or change-points) in otherwise stationary time series is a powerful analytic tool. This paper outlines a general strategy for identifying an unknown number of change-points using elementary principles of Bayesian statistics. Using a strategy of binary partitioning by marginal likelihood, a time series is recursively subdivided on the basis of whether adding divisions (and thus increasing model complexity) yields a justified improvement in the marginal model likelihood. When this approach is combined with the use of conjugate priors, it yields the Conjugate Partitioned Recursion (CPR) algorithm, which identifies change-points without computationally intensive numerical integration. Using the CPR algorithm, methods are described for specifying change-point models drawn from a host of familiar distributions, both discrete (binomial, geometric, Poisson) and continuous (exponential, Gaussian, uniform, and multiple linear regression), as well as multivariate distribution (multinomial, multivariate normal, and multivariate linear regression). Methods by which the CPR algorithm could be extended or modified are discussed, and several detailed applications to data published in psychology and biomedical engineering are described.


Sign in / Sign up

Export Citation Format

Share Document