Evaluating generalised additive mixed modelling strategies for dynamic speech analysis

Mapping Intimacies ◽

10.31234/osf.io/jb4wh ◽

2020 ◽

Author(s):

Marton Soskuthy

Keyword(s):

Type I Error ◽

Simulated Data ◽

Real Data ◽

Speech Analysis ◽

Data Sets ◽

Type I ◽

Data Set ◽

Modelling Strategies ◽

Additive Mixed Models ◽

Pitch Contours

Generalised additive mixed models (GAMMs) are increasingly popular in dynamic speech analysis, where the focus is on measurements with temporal or spatial structure such as formant, pitch or tongue contours. GAMMs provide a range of tools for dealing with the non-linear contour shapes and complex hierarchical organisation characteristic of such data sets. This, however, means that analysts are faced with non-trivial choices, many of which have a serious impact on the statistical validity of their analyses. This paper presents type I and type II error simulations to help researchers make informed decisions about modelling strategies when using GAMMs to analyse phonetic data. The simulations are based on two real data sets containing F2 and pitch contours, and a simulated data set modelled after the F2 data. They reflect typical scenarios in dynamic speech analysis. The main emphasis is on (i) dealing with dependencies within contours and higher-level units using random structures and other tools, and (ii) strategies for significance testing using GAMMs. The paper concludes with a small set of recommendations for fitting GAMMs, and provides advice on diagnosing issues and tailoring GAMMs to specific data sets. It is also accompanied by a GitHub repository including a tutorial on running type I error simulations for existing data sets: https://github.com/soskuthy/gamm_strategies.

Download Full-text

A novel gene-set association test based on variance-gamma distribution

Statistical Methods in Medical Research ◽

10.1177/0962280218791205 ◽

2018 ◽

Vol 28 (9) ◽

pp. 2868-2875

Author(s):

Zhongxue Chen ◽

Qingzhong Liu ◽

Kai Wang

Keyword(s):

Gamma Distribution ◽

Type I Error ◽

Null Distribution ◽

Real Data ◽

Association Test ◽

P Value ◽

Type I ◽

Test Statistic ◽

Data Set ◽

Variance Gamma

Several gene- or set-based association tests have been proposed recently in the literature. Powerful statistical approaches are still highly desirable in this area. In this paper we propose a novel statistical association test, which uses information of the burden component and its complement from the genotypes. This new test statistic has a simple null distribution, which is a special and simplified variance-gamma distribution, and its p-value can be easily calculated. Through a comprehensive simulation study, we show that the new test can control type I error rate and has superior detecting power compared with some popular existing methods. We also apply the new approach to a real data set; the results demonstrate that this test is promising.

Download Full-text

A comparative analysis of cell-type adjustment methods for epigenome-wide association studies based on simulated and real data sets

Briefings in Bioinformatics ◽

10.1093/bib/bby068 ◽

2018 ◽

Vol 20 (6) ◽

pp. 2055-2065 ◽

Cited By ~ 1

Author(s):

Johannes Brägelmann ◽

Justo Lorenzo Bermejo

Keyword(s):

Statistical Power ◽

Type I Error ◽

Association Studies ◽

Real Data ◽

Error Rates ◽

Data Sets ◽

Type I ◽

Cell Type ◽

Type I Error Rates

Abstract Technological advances and reduced costs of high-density methylation arrays have led to an increasing number of association studies on the possible relationship between human disease and epigenetic variability. DNA samples from peripheral blood or other tissue types are analyzed in epigenome-wide association studies (EWAS) to detect methylation differences related to a particular phenotype. Since information on the cell-type composition of the sample is generally not available and methylation profiles are cell-type specific, statistical methods have been developed for adjustment of cell-type heterogeneity in EWAS. In this study we systematically compared five popular adjustment methods: the factored spectrally transformed linear mixed model (FaST-LMM-EWASher), the sparse principal component analysis algorithm ReFACTor, surrogate variable analysis (SVA), independent SVA (ISVA) and an optimized version of SVA (SmartSVA). We used real data and applied a multilayered simulation framework to assess the type I error rate, the statistical power and the quality of estimated methylation differences according to major study characteristics. While all five adjustment methods improved false-positive rates compared with unadjusted analyses, FaST-LMM-EWASher resulted in the lowest type I error rate at the expense of low statistical power. SVA efficiently corrected for cell-type heterogeneity in EWAS up to 200 cases and 200 controls, but did not control type I error rates in larger studies. Results based on real data sets confirmed simulation findings with the strongest control of type I error rates by FaST-LMM-EWASher and SmartSVA. Overall, ReFACTor, ISVA and SmartSVA showed the best comparable statistical power, quality of estimated methylation differences and runtime.

Download Full-text

A Growth Model for Multilevel Ordinal Data

Journal of Educational and Behavioral Statistics ◽

10.3102/10769986030004369 ◽

2005 ◽

Vol 30 (4) ◽

pp. 369-396 ◽

Cited By ~ 8

Author(s):

Eisuke Segawa

Keyword(s):

Latent Variable ◽

Ordinal Data ◽

Linear Models ◽

Growth Models ◽

Simulated Data ◽

Real Data ◽

Analytic Structure ◽

Data Sets ◽

Data Set ◽

Time Points

Multi-indicator growth models were formulated as special three-level hierarchical generalized linear models to analyze growth of a trait latent variable measured by ordinal items. Items are nested within a time-point, and time-points are nested within subject. These models are special because they include factor analytic structure. This model can analyze not only data with item- and time-level missing observations, but also data with time points freely specified over subjects. Furthermore, features useful for longitudinal analyses, “autoregressive error degree one” structure for the trait residuals and estimated time-scores, were included. The approach is Bayesian with Markov Chain and Monte Carlo, and the model is implemented in WinBUGS. They are illustrated with two simulated data sets and one real data set with planned missing items within a scale.

Download Full-text

Testing equality of means in partially paired data with incompleteness in single response

Statistical Methods in Medical Research ◽

10.1177/0962280218765007 ◽

2018 ◽

Vol 28 (5) ◽

pp. 1508-1522 ◽

Cited By ~ 1

Author(s):

Qianya Qi ◽

Li Yan ◽

Lili Tian

Keyword(s):

Type I Error ◽

Real Data ◽

The Cancer Genome Atlas ◽

P Value ◽

Type I ◽

Paired Data ◽

Data Set ◽

Equality Of Means ◽

Breast Cancer Study ◽

Single Response

In testing differentially expressed genes between tumor and healthy tissues, data are usually collected in paired form. However, incomplete paired data often occur. While extensive statistical researches exist for paired data with incompleteness in both arms, hardly any recent work can be found on paired data with incompleteness in single arm. This paper aims to fill this gap by proposing some new methods, namely, P-value pooling methods and a nonparametric combination test. Simulation studies are conducted to investigate the performance of the proposed methods in terms of type I error and power at small to moderate sample sizes. A real data set from The Cancer Genome Atlas (TCGA) breast cancer study is analyzed using the proposed methods.

Download Full-text

ESTIMATION OF EXTREME QUANTILES: EMPIRICAL TOOLS FOR METHODS ASSESSMENT AND COMPARISON

International Journal of Reliability Quality and Safety Engineering ◽

10.1142/s0218539300000079 ◽

2000 ◽

Vol 07 (01) ◽

pp. 75-94 ◽

Cited By ~ 3

Author(s):

J. DIEBOLT ◽

M.-A. EL-AROUI ◽

V. DURBEC ◽

B. VILLAIN

Keyword(s):

Goodness Of Fit ◽

Simulated Data ◽

Real Data ◽

Data Sets ◽

Data Set ◽

Extreme Quantiles ◽

Maintenance Policies ◽

Simulated Data Sets ◽

Industrial Context

When extreme quantiles have to be estimated from a given data set, the classical parametric approach can lead to very poor estimations. This has led to the introduction of specific methods for estimating extreme quantiles (MEEQ's) in a nonparametric spirit, e.g., Pickands excess method, methods based on Hill's estimate of the Pareto index, exponential tail (ET) and quadratic tail (QT) methods. However, no practical technique for assessing and comparing these MEEQ's when they are to be used on a given data set is available. This paper is a first attempt to provide such techniques. We first compare the estimations given by the main MEEQ's on several simulated data sets. Then we suggest goodness-of-fit (Gof) tests to assess the MEEQ's by measuring the quality of their underlying approximations. It is shown that Gof techniques bring very relevant tools to assess and compare ET and excess methods. Other empirical criterions for comparing MEEQ's are also proposed and studied through Monte-Carlo analyses. Finally, these assessment and comparison techniques are experimented on real-data sets issued from an industrial context where extreme quantiles are needed to define maintenance policies.

Download Full-text

Goodness-of-fit test for skew normality based on energy statistics

Random Operators and Stochastic Equations ◽

10.1515/rose-2020-2042 ◽

2020 ◽

Vol 28 (3) ◽

pp. 227-236

Author(s):

Logan Opperman ◽

Wei Ning

Keyword(s):

Goodness Of Fit ◽

Type I Error ◽

Real Data ◽

Testing Procedure ◽

Data Sets ◽

Type I ◽

Goodness Of Fit Test ◽

Power Comparisons

AbstractIn this paper, we propose a goodness-of-fit test based on the energy statistic for skew normality. Simulations indicate that the Type-I error of the proposed test can be controlled reasonably well for given nominal levels. Power comparisons to other existing methods under different settings show the advantage of the proposed test. Such a test is applied to two real data sets to illustrate the testing procedure.

Download Full-text

MARS: leveraging allelic heterogeneity to increase power of association testing

Genome Biology ◽

10.1186/s13059-021-02353-8 ◽

2021 ◽

Vol 22 (1) ◽

Cited By ~ 1

Author(s):

Farhad Hormozdiari ◽

Junghyun Jung ◽

Eleazar Eskin ◽

Jong Wha J. Joo

Keyword(s):

Type I Error ◽

Association Studies ◽

Simulated Data ◽

Real Data ◽

Association Test ◽

Type I ◽

Genome Wide Association Studies ◽

Association Testing ◽

Causal Status ◽

Causal Variants

AbstractIn standard genome-wide association studies (GWAS), the standard association test is underpowered to detect associations between loci with multiple causal variants with small effect sizes. We propose a statistical method, Model-based Association test Reflecting causal Status (MARS), that finds associations between variants in risk loci and a phenotype, considering the causal status of variants, only requiring the existing summary statistics to detect associated risk loci. Utilizing extensive simulated data and real data, we show that MARS increases the power of detecting true associated risk loci compared to previous approaches that consider multiple variants, while controlling the type I error.

Download Full-text

Assessing Treatment Effects with Pharmacometric Models: A New Method that Addresses Problems with Standard Assessments

The AAPS Journal ◽

10.1208/s12248-021-00596-8 ◽

2021 ◽

Vol 23 (3) ◽

Author(s):

Estelle Chasseloup ◽

Adrien Tessier ◽

Mats O. Karlsson

Keyword(s):

Multiple Testing ◽

Type I Error ◽

Drug Effect ◽

Model Averaging ◽

Clinical Trial Data ◽

Real Data ◽

Data Sets ◽

Type I ◽

Data Types ◽

Inflated Type

AbstractLongitudinal pharmacometric models offer many advantages in the analysis of clinical trial data, but potentially inflated type I error and biased drug effect estimates, as a consequence of model misspecifications and multiple testing, are main drawbacks. In this work, we used real data to compare these aspects for a standard approach (STD) and a new one using mixture models, called individual model averaging (IMA). Placebo arm data sets were obtained from three clinical studies assessing ADAS-Cog scores, Likert pain scores, and seizure frequency. By randomly (1:1) assigning patients in the above data sets to “treatment” or “placebo,” we created data sets where any significant drug effect was known to be a false positive. Repeating the process of random assignment and analysis for significant drug effect many times (N = 1000) for each of the 40 to 66 placebo-drug model combinations, statistics of the type I error and drug effect bias were obtained. Across all models and the three data types, the type I error was (5th, 25th, 50th, 75th, 95th percentiles) 4.1, 11.4, 40.6, 100.0, 100.0 for STD, and 1.6, 3.5, 4.3, 5.0, 6.0 for IMA. IMA showed no bias in the drug effect estimates, whereas in STD bias was frequently present. In conclusion, STD is associated with inflated type I error and risk of biased drug effect estimates. IMA demonstrated controlled type I error and no bias.

Download Full-text

Overcoming confounding plate effects in differential expression analyses of single-cell RNA-seq data

10.1101/073973 ◽

2016 ◽

Cited By ~ 3

Author(s):

Aaron T. L. Lun ◽

John C. Marioni

Keyword(s):

Single Cell ◽

Error Control ◽

Type I Error ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Simulated Data ◽

Real Data ◽

Type I ◽

Rna Seq ◽

Cell Counts

AbstractAn increasing number of studies are using single-cell RNA-sequencing (scRNA-seq) to characterize the gene expression profiles of individual cells. One common analysis applied to scRNA-seq data involves detecting differentially expressed (DE) genes between cells in different biological groups. However, many experiments are designed such that the cells to be compared are processed in separate plates or chips, meaning that the groupings are confounded with systematic plate effects. This confounding aspect is frequently ignored in DE analyses of scRNA-seq data. In this article, we demonstrate that failing to consider plate effects in the statistical model results in loss of type I error control. A solution is proposed whereby counts are summed from all cells in each plate and the count sums for all plates are used in the DE analysis. This restores type I error control in the presence of plate effects without compromising detection power in simulated data. Summation is also robust to varying numbers and library sizes of cells on each plate. Similar results are observed in DE analyses of real data where the use of count sums instead of single-cell counts improves specificity and the ranking of relevant genes. This suggests that summation can assist in maintaining statistical rigour in DE analyses of scRNA-seq data with plate effects.

Download Full-text

A Simplification and Implementation of Random-effects Meta-analyses Based on the Exact Distribution of Cochran’s Q

Methods of Information in Medicine ◽

10.3414/me13-01-0073 ◽

2014 ◽

Vol 53 (01) ◽

pp. 54-61 ◽

Cited By ~ 6

Author(s):

M. Preuß ◽

A. Ziegler

Keyword(s):

Distribution Function ◽

Cumulative Distribution Function ◽

Type I Error ◽

Meta Analysis ◽

Real Data ◽

R Package ◽

Cumulative Distribution ◽

Data Sets ◽

Type I ◽

Simulation Studies

SummaryBackground: The random-effects (RE) model is the standard choice for meta-analysis in the presence of heterogeneity, and the stand ard RE method is the DerSimonian and Laird (DSL) approach, where the degree of heterogeneity is estimated using a moment-estimator. The DSL approach does not take into account the variability of the estimated heterogeneity variance in the estimation of Cochran’s Q. Biggerstaff and Jackson derived the exact cumulative distribution function (CDF) of Q to account for the variability of Ť 2.Objectives: The first objective is to show that the explicit numerical computation of the density function of Cochran’s Q is not required. The second objective is to develop an R package with the possibility to easily calculate the classical RE method and the new exact RE method.Methods: The novel approach was validated in extensive simulation studies. The different approaches used in the simulation studies, including the exact weights RE meta-analysis, the I 2 and T 2 estimates together with their confidence intervals were implemented in the R package metaxa.Results: The comparison with the classical DSL method showed that the exact weights RE meta-analysis kept the nominal type I error level better and that it had greater power in case of many small studies and a single large study. The Hedges RE approach had inflated type I error levels. Another advantage of the exact weights RE meta-analysis is that an exact confidence interval for T 2is readily available. The exact weights RE approach had greater power in case of few studies, while the restricted maximum likelihood (REML) approach was superior in case of a large number of studies. Differences between the exact weights RE meta-analysis and the DSL approach were observed in the re-analysis of real data sets. Application of the exact weights RE meta-analysis, REML, and the DSL approach to real data sets showed that conclusions between these methods differed.Conclusions: The simplification does not require the calculation of the density of Cochran’s Q, but only the calculation of the cumulative distribution function, while the previous approach required the computation of both the density and the cumulative distribution function. It thus reduces computation time, improves numerical stability, and reduces the approximation error in meta-analysis. The different approaches, including the exact weights RE meta-analysis, the I 2 and T 2estimates together with their confidence intervals are available in the R package metaxa, which can be used in applications.

Download Full-text