stat assoc
Recently Published Documents


TOTAL DOCUMENTS

19
(FIVE YEARS 10)

H-INDEX

2
(FIVE YEARS 0)

Trials ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Fan Xia ◽  
James P. Hughes ◽  
Emily C. Voldal ◽  
Patrick J. Heagerty

Abstract Background Stepped-wedge designs (SWD) are increasingly used to evaluate the impact of changes to the process of care within health care systems. However, to generate definitive evidence, a correct sample size calculation is crucial to ensure such studies are properly powered. The seminal work of Hussey and Hughes (Contemp Clin Trials 28(2):182–91, 2004) provides an analytical formula for power calculations with normal outcomes using a linear model and simple random effects. However, minimal development and evaluation have been done for power calculation with non-normal outcomes on their natural scale (e.g., logit, log). For example, binary endpoints are common, and logistic regression is the natural multilevel model for such clustered data. Methods We propose a power calculation formula for SWD with either normal or non-normal outcomes in the context of generalized linear mixed models by adopting the Laplace approximation detailed in Breslow and Clayton (J Am Stat Assoc 88(421):9–25, 1993) to obtain the covariance matrix of the estimated parameters. Results We compare the performance of our proposed method with simulation-based sample size calculation and demonstrate its use on a study of patient-delivered partner therapy for STI treatment and a study that assesses the impact of providing additional benchmark prevalence information in a radiologic imaging report. To facilitate adoption of our methods we also provide a function embedded in the R package “swCRTdesign” for sample size and power calculation for multilevel stepped-wedge designs. Conclusions Our method requires minimal computational power. Therefore, the proposed procedure facilitates rapid dynamic updates of sample size calculations and can be used to explore a wide range of design options or assumptions.


2021 ◽  
Vol 31 (5) ◽  
Author(s):  
Jacob Vorstrup Goldman ◽  
Sumeetpal S. Singh

AbstractWe propose a novel blocked version of the continuous-time bouncy particle sampler of Bouchard-Côté et al. (J Am Stat Assoc 113(522):855–867, 2018) which is applicable to any differentiable probability density. This alternative implementation is motivated by blocked Gibbs sampling for state-space models (Singh et al. in Biometrika 104(4):953–969, 2017) and leads to significant improvement in terms of effective sample size per second, and furthermore, allows for significant parallelization of the resulting algorithm. The new algorithms are particularly efficient for latent state inference in high-dimensional state-space models, where blocking in both space and time is necessary to avoid degeneracy of MCMC. The efficiency of our blocked bouncy particle sampler, in comparison with both the standard implementation of the bouncy particle sampler and the particle Gibbs algorithm of Andrieu et al. (J R Stat Soc Ser B Stat Methodol 72(3):269–342, 2010), is illustrated numerically for both simulated data and a challenging real-world financial dataset.


2021 ◽  
Author(s):  
Sergio Ibarra ◽  
Edmilson Dias de Freitas

<p>Brazilis is the country with highest number of COVID-19 cases and deaths in the sotuhern hsmisphere, third behind India and  U.S globally. Some studies have analized the relationship between mobility, meteorology and air pollution, finding that staying out-of-home increases cases about 5 days and deaths about two weeks after the exposure. (Ibarra-Espinosa, et al., 2021). In this work we will extend the analyses presented by Ibarra-Espinosa et al., (2021), by including more Brazilian cities. Specifically, the metropolitan region of Rio de Janeiro si cosndierer a MEgacity and monitors meteorology and air pollution, necessary to the analyses. The metropolitan regions of Porto Alegre, Belo horizonte and Curutiba as well. The method consists in applying a semiparametric model (Dominici et al, 2004), but in this case, controllying all the environmental factors and their interactions and the parameter consists in the mobility alone. We will compare local mobility index, as Google Residential Mobility Index (RMI), as done by Ibarra-Espinosa et al., (2021). Due to the high dispersion of the data, OVID-19 will be modeled by quasi-poisson and negative binomial distribution, with generalzied additive models (Wood., 2017; Zeileis et al., 2008; R Core Team, 2021).</p><p> </p><p>Ibarra-Espinosa, S., de Freitas, E.D., Ropkins, K., Dominici, F., Rehbein, A., 2021. Association between COVID-19, mobility and environment in São Paulo, Brazil. medRxiv. https://doi.org/10.1101/2021.02.08.21250113</p><p>Dominici F, McDermott A, Hastie TJ. 2004. Improved semiparametric time series models of air pollution and mortality. J Am Stat Assoc 99: 938–948.</p><p><span>R Core Team. 2021. R: A Language and Environment for Statistical Computing.</span></p><p>Wood S. 2017. <em>Generalized Additive Models: An Introduction with R</em>. Chapman and Hall/CRC.</p><p><span>Zeileis A, Kleiber C, Jackman S. 2008. </span>Regression Models for Count Data in R. J Stat Software, Artic 27:1–25; doi:10.18637/jss.v027.i08.</p>


Author(s):  
Gao-Fan Ha ◽  
Qiuyan Zhang ◽  
Zhidong Bai ◽  
You-Gan Wang

In this paper, a ridgelized Hotelling’s [Formula: see text] test is developed for a hypothesis on a large-dimensional mean vector under certain moment conditions. It generalizes the main result of Chen et al. [A regularized Hotelling’s [Formula: see text] test for pathway analysis in proteomic studies, J. Am. Stat. Assoc. 106(496) (2011) 1345–1360.] by relaxing their Gaussian assumption. This is achieved by establishing an exact four-moment theorem that is a simplified version of Tao and Vu’s [Random matrices: universality of local statistics of eigenvalues, Ann. Probab. 40(3) (2012) 1285–1315] work. Simulation results demonstrate the superiority of the proposed test over the traditional Hotelling’s [Formula: see text] test and its several extensions in high-dimensional situations.


2021 ◽  
Vol 25 (109) ◽  
pp. 71-79
Author(s):  
Freddy Carrasco Choque ◽  
Mario Villegas Yarleque ◽  
Janet Del Rocio Sanchez Castro

La actividad agrícola en la región de Piura, es una actividad fundamental para su desarrollo, la implementación de pronósticos es una herramienta útil para los agentes económicos para una planificación y toma de decisiones acertadas. En el estudio interesan dos resultados, el primero identificar, estimar y validar un modelo ajustado para pronosticar la producción de plátano y el segundo realizar el pronóstico de la producción de plátano para el periodo de octubre de 2020 hasta octubre de 2022. Para concretizar los objetivos se realizó el análisis univariante con la metodología de Box y Jenkins. Los datos provienen del Banco Central de Reserva del Perú, se consideraron datos mensuales desde julio de 2000 hasta septiembre de 2020. Luego del cumplimiento de los supuestos, el mejor modelo ajustado para representar la producción del plátano y realizar pronósticos es un modelo autorregresivo integrado de promedio móvil o ARIMA. El pronóstico de la producción del plátano tiene una tendencia decreciente para los próximos años. Palabras Clave: Pronostico, Series de tiempo, Modelos ARIMA, Producción agrícola. Referencias [1]A. A. S. Syed, A. Sajad, y U. J. Arshad, “Growth, Variability and Forecasting of Wheat and Sugarcane Production in Khyber Pakhtunkhwa, Pakistan,” Agric. Res. Technol. Open Access J., 2018. [2]Instituo Nacional de Estadistica e Informatica, “Producción Nacional - INEI,” 2019. [3]M. Laberry, “III Foro Nacional del Cultivo de Arroz,” 2016. [4]L. Torres, “Análisis Económico del Cambio Climático en la Agricultura de la Región Piura. Caso: Principales Productos Agroexportables,” Consorc. Investig. Econ. y Soc. - CIES, 2010. [5]Instituto Nacional de Estadistica e Informatica, “Producto Bruto Interno Por Departamentos,” 2019. [6]D. Llico, “La minería, pesca y agricultura de Piura,” monografias.com, 2013. [7]H. Moyazzem, A. Faruq, y K. Ajit, “Forecasting of Banana Production in Bangladesh,” Am. J. Agric. Biol. Sci., 2016. [8]J. Ruiz, G. Hernández, y R. Zulueta, “Análisis de series de tiempo en el pronóstico de la producción de caña de azúcar,” Fac. Econ. - Univ. Veracruzana - Mex., 2010. [9]V. Erossa, Proyectos de inversión en ingeniería: su metodología. 2004. [10]A. Contreras, C. Atziry, M. José, y S. Diana, “Análisis de series de tiempo en el pronóstico de la demanda de almacenamiento de productos perecederos,” Estud. Gerenciales 32 p.387-396 - Mex., 2016. [11]G. Mendoza, “Pronosticar y métodos de pronóstico.,” 2003. [12]A. Muñoz y F. Parra, Econometria aplicada, Ediciones. 2007. [13]M. A. Hamjah, “Forecasting major fruit crops productions in Bangladesh using Box-Jenkins ARIMA model.,” J. Econ. Sustain., vol. Dev., 5: 9, 2014. [14]M. Casinillo y I. Manching, “Modeling the monthly production of banana using the box and Jenkins analysis.,” Am. J. Agric. Biol. Sci., 2016. [15]N. Suleman y S. Sarpong, “Forecasting Milled Rice Production in Ghana Using Box- Jenkins Approach,” Int. J. Agric. Manag. Dev. (IJAMAD)., 2011. [16]W. Merlin, “Modelo univariante de pronóstico del número de unidades de transfusión de sangre en el hospital regional Manuel Nuñez Butrón - Puno periodo 2006- 2015-I,” Universidad Nacional del Altiplano - Puno, 2015. [17]L. Laurente, “Proyección de la producción de papa en puno. una aplicación de la metodología de Box-Jenkins,” Semest. Econ. - FIE - UNA Puno, 2018.[18]Banco Central de Reserva del Perú, “Gerencia Central de Estudios Económicos,” 2019. [Online]. Available: https://estadisticas.bcrp.gob.pe/estadisticas/series/mensuales/resultados/PN01784AM/html. [19]R. Hernández, C. Fernández, y M. del P. Baptista, Metodologia de la Investigación, vol. 6ta Ed. 2014. [20]Banco Central de Reserva del Perú, “PIURA: Síntesis de Actividad Económica.” 2020, [Online]. Available: https://www.bcrp.gob.pe/estadisticas/informacion-regional/piura/piura.html. [21]I. Moumouni et al., “What happens between technico-institutional support and adoption of organic farming? A case study from Benin,” Org. Agric., p. DOI 10.1007/s13165-013-0039-x., 2013. [22]U. Yule, “On a Method of Investigating Periodicities in Disturbed Series, with Special Reference to Wolfer’s Sunspot Numbers,” Philos. Trans. R. Soc. London, 1926. [23]E. Slutsky, “The Summation of Random Causes as the Source of Cyclical Processes,” Econom. 4 105-46, 1937., 1927. [24]H. Wold, “A Study of the Analysis of Stationary Time Serie,” Uppsala: Almqvist and Wiksells., vol. 2nd ed.-19, 1938. [25]G. Box y G. M. Jenkins, “Time Series Analysis, Forecasting and Control,” San Fr. Holden- Day, California, USA., 1976. [26]D. Gujarati y D. Porter, Econometría. 2010. [27]G. Box y D. Pierce, “Distribution of Residual Autocorrelations in Autoregressive Integrated Moving Average Time Series Models,” J. Am. Stat. Assoc., vol. 65, p, 1970. [28]G. Ljung y G. Box, “On a measure of lack of fit in time series models.,” Biometrika, vol. V65: 297-3, 1978. [29]C. Jarque y A. Bera, “A Test for Normality of Observations and Regression Residuals,” Int. Stat. Inst., vol. Vol. 55, N, 1978. [30]D. A. Dickey y W. A. Fuller, “Distribution of the Estimators for Autoregressive Time Series with a Unit Root,” J. Am. Stat. Assoc., vol. 74, p, 1979. [31]P. C. B. Phillips y P. Perron, “Testing for a Unit Root in Time Series Regression,” Biometrika, vol. 75, pp. 335–346,1988.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Juming Pan

Abstract Background Model averaging has attracted increasing attention in recent years for the analysis of high-dimensional data. By weighting several competing statistical models suitably, model averaging attempts to achieve stable and improved prediction. In this paper, we develop a two-stage model averaging procedure to enhance accuracy and stability in prediction for high-dimensional linear regression. First we employ a high-dimensional variable selection method such as LASSO to screen redundant predictors and construct a class of candidate models, then we apply the jackknife cross-validation to optimize model weights for averaging. Results In simulation studies, the proposed technique outperforms commonly used alternative methods under high-dimensional regression setting, in terms of minimizing the mean of the squared prediction error. We apply the proposed method to a riboflavin data, the result show that such method is quite efficient in forecasting the riboflavin production rate, when there are thousands of genes and only tens of subjects. Conclusions Compared with a recent high-dimensional model averaging procedure (Ando and Li in J Am Stat Assoc 109:254–65, 2014), the proposed approach enjoys three appealing features thus has better predictive performance: (1) More suitable methods are applied for model constructing and weighting. (2) Computational flexibility is retained since each candidate model and its corresponding weight are determined in the low-dimensional setting and the quadratic programming is utilized in the cross-validation. (3) Model selection and averaging are combined in the procedure thus it makes full use of the strengths of both techniques. As a consequence, the proposed method can achieve stable and accurate predictions in high-dimensional linear models, and can greatly help practical researchers analyze genetic data in medical research.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
William R. P. Denault ◽  
Astanand Jugessur

Abstract Background We present here a computational shortcut to improve a powerful wavelet-based method by Shim and Stephens (Ann Appl Stat 9(2):665–686, 2015. 10.1214/14-AOAS776) called WaveQTL that was originally designed to identify DNase I hypersensitivity quantitative trait loci (dsQTL). Results WaveQTL relies on permutations to evaluate the significance of an association. We applied a recent method by Zhou and Guan (J Am Stat Assoc 113(523):1362–1371, 2017. 10.1080/01621459.2017.1328361) to boost computational speed, which involves calculating the distribution of Bayes factors and estimating the significance of an association by simulations rather than permutations. We called this simulation-based approach “fast functional wavelet” (FFW), and tested it on a publicly available DNA methylation (DNAm) dataset on colorectal cancer. The simulations confirmed a substantial gain in computational speed compared to the permutation-based approach in WaveQTL. Furthermore, we show that FFW controls the type I error satisfactorily and has good power for detecting differentially methylated regions. Conclusions Our approach has broad utility and can be applied to detect associations between different types of functions and phenotypes. As more and more DNAm datasets are being made available through public repositories, an attractive application of FFW would be to re-analyze these data and identify associations that might have been missed by previous efforts. The full R package for FFW is freely available at GitHub https://github.com/william-denault/ffw.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Cong Li ◽  
Jianguo Sun

AbstractThis paper discusses variable or covariate selection for high-dimensional quadratic Cox model. Although many variable selection methods have been developed for standard Cox model or high-dimensional standard Cox model, most of them cannot be directly applied since they cannot take into account the important and existing hierarchical model structure. For the problem, we present a penalized log partial likelihood-based approach and in particular, generalize the regularization algorithm under marginality principle (RAMP) proposed in Hao et al. (J Am Stat Assoc 2018;113:615–25) under the context of linear models. An extensive simulation study is conducted and suggests that the presented method works well in practical situations. It is then applied to an Alzheimer’s Disease study that motivated this investigation.


2020 ◽  
Vol 54 (3) ◽  
pp. 781-806
Author(s):  
Sahar Sabbaghan ◽  
Cecil Eng Huang Chua ◽  
Lesley A. Gardner

AbstractDiagnostic theories are fundamental to Information Systems practice and are represented in trees. One way of creating diagnostic trees is by employing independent experts to construct such trees and compare them. However, good measures of similarity to compare diagnostic trees have not been identified. This paper presents an analysis of the suitability of various measures of association to determine the similarity of two diagnostic trees using bootstrap simulations. We find that three measures of association, Goodman and Kruskal’s Lambda, Cohen’s Kappa, and Goodman and Kruskal’s Gamma (J Am Stat Assoc 49(268):732–764, 1954) each behave differently depending on what is inconsistent between the two trees thus providing both measures for assessing alignment between two trees developed by independent experts as well as identifying the causes of the differences.


2019 ◽  
Vol 14 (1) ◽  
Author(s):  
Qing Yang ◽  
Xinming An ◽  
Wei Pan

Abstract Background Any empirical data can be approximated to one of Pearson distributions using the first four moments of the data (Elderton WP, Johnson NL. Systems of Frequency Curves. 1969; Pearson K. Philos Trans R Soc Lond Ser A. 186:343–414 1895; Solomon H, Stephens MA. J Am Stat Assoc. 73(361):153–60 1978). Thus, Pearson distributions made statistical analysis possible for data with unknown distributions. There are both extant, old-fashioned in-print tables (Pearson ES, Hartley HO. Biometrika Tables for Statisticians, vol. II. 1972) and contemporary computer programs (Amos DE, Daniel SL. Tables of percentage points of standardized pearson distributions. 1971; Bouver H, Bargmann RE. Tables of the standardized percentage points of the pearson system of curves in terms of β1 and β2. 1974; Bowman KO, Shenton LR. Biometrika. 66(1):147–51 1979; Davis CS, Stephens MA. Appl Stat. 32(3):322–7 1983; Pan W. J Stat Softw. 31(Code Snippet 2):1–6 2009) available for obtaining percentage points of Pearson distributions corresponding to certain pre-specified percentages (or probability values; e.g., 1.0%, 2.5%, 5.0%, etc.), but they are little useful in statistical analysis because we have to rely on unwieldy second difference interpolation to calculate a probability value of a Pearson distribution corresponding to a given percentage point, such as an observed test statistic in hypothesis testing. Results The present study develops a macro program to identify the appropriate type of Pearson distribution based on either input of dataset or the values of four moments and then compute and graph probability values of Pearson distributions for any given percentage points. Conclusions The SAS macro program returns accurate approximations to Pearson distributions and can efficiently facilitate researchers to conduct statistical analysis on data with unknown distributions.


Sign in / Sign up

Export Citation Format

Share Document