scholarly journals rGAI: An R package for fitting the generalised abundance index to seasonal count data

Author(s):  
Emily Dennis ◽  
Calliste Fagard-Jenkin ◽  
Byron Morgan

1. The Generalised Abundance Index (GAI) provides a useful tool for estimating relative population sizes and trends of seasonal invertebrates from species’ count data, and offers potential for inferring which external factors may influence phenology and demography through parametric descriptions of seasonal variation. 2. We provide an R package that extends previous software with the ability to include covariates when fitting parametric GAI models, where seasonal variation is described by either a mixture of Normal distributions or a stopover model which provides estimates of lifespan. The package also generalises the model to allow any number of broods/generations in the target population within a defined season. The option to perform bootstrapping, either parametrically or non-parametrically, is also provided. 3. The new package allows models to be far more flexible when describing seasonal variation, which may be dependent on site-specific environmental factors or consist of many broods/generations which may overlap, as demonstrated by two case studies. 4. Our open-source software, available at \href{https://github.com/calliste-fagard-jenkin/GAI}{https://github.com/calliste-fagard-jenkin/rGAI}, makes this extension widely and freely available, allowing the complexity of GAI models used by ecologists and applied statisticians to increase accordingly.

2021 ◽  
Vol 53 (1) ◽  
pp. 162-188
Author(s):  
Krzysztof Bartoszek ◽  
Torkel Erhardsson

AbstractExplicit bounds are given for the Kolmogorov and Wasserstein distances between a mixture of normal distributions, by which we mean that the conditional distribution given some $\sigma$ -algebra is normal, and a normal distribution with properly chosen parameter values. The bounds depend only on the first two moments of the first two conditional moments given the $\sigma$ -algebra. The proof is based on Stein’s method. As an application, we consider the Yule–Ornstein–Uhlenbeck model, used in the field of phylogenetic comparative methods. We obtain bounds for both distances between the distribution of the average value of a phenotypic trait over n related species, and a normal distribution. The bounds imply and extend earlier limit theorems by Bartoszek and Sagitov.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Arnaud Liehrmann ◽  
Guillem Rigaill ◽  
Toby Dylan Hocking

Abstract Background Histone modification constitutes a basic mechanism for the genetic regulation of gene expression. In early 2000s, a powerful technique has emerged that couples chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq). This technique provides a direct survey of the DNA regions associated to these modifications. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed or adapted to analyze the massive amount of data it generates. Many of these algorithms were built around natural assumptions such as the Poisson distribution to model the noise in the count data. In this work we start from these natural assumptions and show that it is possible to improve upon them. Results Our comparisons on seven reference datasets of histone modifications (H3K36me3 & H3K4me3) suggest that natural assumptions are not always realistic under application conditions. We show that the unconstrained multiple changepoint detection model with alternative noise assumptions and supervised learning of the penalty parameter reduces the over-dispersion exhibited by count data. These models, implemented in the R package CROCS (https://github.com/aLiehrmann/CROCS), detect the peaks more accurately than algorithms which rely on natural assumptions. Conclusion The segmentation models we propose can benefit researchers in the field of epigenetics by providing new high-quality peak prediction tracks for H3K36me3 and H3K4me3 histone modifications.


2021 ◽  
Vol 2 (1) ◽  
Author(s):  
Anthony Orlando

Background: Results from a clinical trial can either support the efficacy and safety of a new compound or fail to provide such evidence. One reason for ‘non[1]positive’ result is due to the underlying assumption of normality and homogeneity of variances, which are quite often violated when analyzing data from clinical trials, despite randomization. A question of interest is can we obtain more informative results when using mixture of normal distributions or linear models (MLMs) in such cases. Introduction: MLM can be used when traditional methods fail. MLMs “search” within the variability in data to identify components or subgroups of individuals (also known as latent classes) who have common intercepts and common slopes of change in a variable/endpoint of interest but whose intercepts and slopes are different from other subsets of patients. Thus, MLMs can be used to identify subgroups of patients exhibiting differential response to treatment within each treatment arm. The purpose of our study was to examine the usefulness of using MLM in such circumstances. Methods: Data of 155 subjects taken from a Multicenter, randomized, double blind, placebo controlled trial that evaluated the efficacy of Cpn10, administered twice weekly subcutaneously to treat Rheumatoid Arthritis was taken to evaluate the usefulness of MLM. The primary efficacy measure ACR20 was analyzed using a 3-step process: first, MLM was used to estimate RA duration using a 3-component model. The second step took the results of the first step to inform the logistic model and its analyses. Model was fitted with an intercept, MLM components, treatment arm, RA duration (linear and quadratic), dose response (modeled as an interaction effect), age and baseline weight. LOCF was used to impute for missing data. Data was analyzed using MLM and SAS v 9.0. Results: The model was a good fit to the data with a likelihood ratio significant at p=0.026, and a significant increase in the -2log L. We also observed low p-values for those variables that were non normal. Overall and for the 75 mg dose, Cpn 10 was efficacious relative to placebo, p<0.050. We also observed that dose response was significant at p><0.15 Conclusion: The use of MLM adds value because it can be used to understand the disease experience or the value of treatment when traditional statistical methods cannot. Key words: Mixture of linear models, normality, entropy.


2017 ◽  
Vol 28 (1) ◽  
pp. 309-320 ◽  
Author(s):  
Scott Powers ◽  
Valerie McGuire ◽  
Leslie Bernstein ◽  
Alison J Canchola ◽  
Alice S Whittemore

Personal predictive models for disease development play important roles in chronic disease prevention. The performance of these models is evaluated by applying them to the baseline covariates of participants in external cohort studies, with model predictions compared to subjects' subsequent disease incidence. However, the covariate distribution among participants in a validation cohort may differ from that of the population for which the model will be used. Since estimates of predictive model performance depend on the distribution of covariates among the subjects to which it is applied, such differences can cause misleading estimates of model performance in the target population. We propose a method for addressing this problem by weighting the cohort subjects to make their covariate distribution better match that of the target population. Simulations show that the method provides accurate estimates of model performance in the target population, while un-weighted estimates may not. We illustrate the method by applying it to evaluate an ovarian cancer prediction model targeted to US women, using cohort data from participants in the California Teachers Study. The methods can be implemented using open-source code for public use as the R-package RMAP (Risk Model Assessment Package) available at http://stanford.edu/~ggong/rmap/ .


2020 ◽  
Vol 45 (4) ◽  
pp. 823-831 ◽  
Author(s):  
Søren Wichmann

The terms “language” and “dialect” are ingrained, but linguists nevertheless tend to agree that it is impossible to apply a non-arbitrary distinction such that two speech varieties can be identified as either distinct languages or two dialects of one and the same language. A database of lexical information for more than 7,500 speech varieties, however, unveils a strong tendency for linguistic distances to be bimodally distributed. For a given language group the linguistic distances pertaining to either cluster can be teased apart, identifying a mixture of normal distributions within the data and then separating them fitting curves and finding the point where they cross. The thresholds identified are remarkably consistent across data sets, qualifying their mean as a universal criterion for distinguishing between language and dialect pairs. The mean of the thresholds identified translates into a temporal distance of around one to one-and-a-half millennia (1,075–1,635 years).


2015 ◽  
Author(s):  
Ethan Linck ◽  
Eli S Bridge ◽  
Jonah M Duckles ◽  
Adolfo G Navarro-Sigüenza ◽  
Sievert Rohwer

Natural history museum collections (NHCs) represent a rich and largely untapped source of data on demography and population movements. NHC specimen records can be corrected to a crude measure of collecting effort and reflect relative population densities with a method known as abundance indices. We plot abundance index values from georeferenced NHC data in a 12-month series for the new world migratory passerine Passerina ciris across its molting and wintering range in Mexico and Central America. We illustrate a statistically significant change in abundance index values across regions and months that suggests a quasi-circular movement around its non-breeding range, and use enhanced vegetation index (EVI) analysis of remote sensing plots to demonstrate non-random association of specimen record density with areas of high primary productivity. We demonstrate how abundance indices from NHC specimen records can be applied to infer previously unknown migratory behavior, and be integrated with remote sensing data to allow for a deeper understanding of demography and behavioral ecology across space and time.


BMJ ◽  
2020 ◽  
pp. m4704 ◽  
Author(s):  
Wei Wang ◽  
Qianhui Wu ◽  
Juan Yang ◽  
Kaige Dong ◽  
Xinghui Chen ◽  
...  

AbstractObjectiveTo provide global, regional, and national estimates of target population sizes for coronavirus disease 2019 (covid-19) vaccination to inform country specific immunisation strategies on a global scale.DesignDescriptive study.Setting194 member states of the World Health Organization.PopulationTarget populations for covid-19 vaccination based on country specific characteristics and vaccine objectives (maintaining essential core societal services; reducing severe covid-19; reducing symptomatic infections and stopping virus transmission).Main outcome measureSize of target populations for covid-19 vaccination. Estimates use country specific data on population sizes stratified by occupation, age, risk factors for covid-19 severity, vaccine acceptance, and global vaccine production. These data were derived from a multipronged search of official websites, media sources, and academic journal articles.ResultsTarget population sizes for covid-19 vaccination vary markedly by vaccination goal and geographical region. Differences in demographic structure, presence of underlying conditions, and number of essential workers lead to highly variable estimates of target populations at regional and country levels. In particular, Europe has the highest share of essential workers (63.0 million, 8.9%) and people with underlying conditions (265.9 million, 37.4%); these two categories are essential in maintaining societal functions and reducing severe covid-19, respectively. In contrast, South East Asia has the highest share of healthy adults (777.5 million, 58.9%), a key target for reducing community transmission. Vaccine hesitancy will probably impact future covid-19 vaccination programmes; based on a literature review, 68.4% (95% confidence interval 64.2% to 72.6%) of the global population is willing to receive covid-19 vaccination. Therefore, the adult population willing to be vaccinated is estimated at 3.7 billion (95% confidence interval 3.2 to 4.1 billion).ConclusionsThe distribution of target groups at country and regional levels highlights the importance of designing an equitable and efficient plan for vaccine prioritisation and allocation. Each country should evaluate different strategies and allocation schemes based on local epidemiology, underlying population health, projections of available vaccine doses, and preference for vaccination strategies that favour direct or indirect benefits.


2017 ◽  
Vol 52 (3) ◽  
pp. 1081-1109 ◽  
Author(s):  
Yong Chen ◽  
Michael Cliff ◽  
Haibei Zhao

We develop an estimation approach based on a modified expectation-maximization (EM) algorithm and a mixture of normal distributions associated with skill groups to assess performance in hedge funds. By allowing luck to affect both skilled and unskilled funds, we estimate the number of skill groups, the fraction of funds from each group, and the mean and variability of skill within each group. For each individual fund, we propose a performance measure combining the fund’s estimated alpha with the cross-sectional distribution of fund skill. In out-of-sample tests, an investment strategy using our performance measure outperforms those using estimated alpha and t-statistic.


Sign in / Sign up

Export Citation Format

Share Document