Recent and Planned Developments of the Program OxCal

Radiocarbon ◽  
2013 ◽  
Vol 55 (2) ◽  
pp. 720-730 ◽  
Author(s):  
Christopher Bronk Ramsey ◽  
Sharen Lee

OxCal is a widely used software package for the calibration of radiocarbon dates and the statistical analysis of 14C and other chronological information. The program aims to make statistical methods easily available to researchers and students working in a range of different disciplines. This paper will look at the recent and planned developments of the package. The recent additions to the statistical methods are primarily aimed at providing more robust models, in particular through model averaging for deposition models and through different multiphase models. The paper will look at how these new models have been implemented and explore the implications for researchers who might benefit from their use. In addition, a new approach to the evaluation of marine reservoir offsets will be presented. As the quantity and complexity of chronological data increase, it is also important to have efficient methods for the visualization of such extensive data sets and methods for the presentation of spatial and geographical data embedded within planned future versions of OxCal will also be discussed.

Radiocarbon ◽  
2010 ◽  
Vol 52 (3) ◽  
pp. 953-961 ◽  
Author(s):  
Christopher Bronk Ramsey ◽  
Michael Dee ◽  
Sharen Lee ◽  
Takeshi Nakagawa ◽  
Richard A Staff

Calibration is a core element of radiocarbon dating and is undergoing rapid development on a number of different fronts. This is most obvious in the area of 14C archives suitable for calibration purposes, which are now demonstrating much greater coherence over the earlier age range of the technique. Of particular significance to this end is the development of purely terrestrial archives such as those from the Lake Suigetsu sedimentary profile and Kauri tree rings from New Zealand, in addition to the groundwater records from speleothems. Equally important, however, is the development of statistical tools that can be used with, and help develop, such calibration data. In the context of sedimentary deposition, age-depth modeling provides a very useful way to analyze series of measurements from cores, with or without the presence of additional varve information. New methods are under development, making use of model averaging, that generate more robust age models. In addition, all calibration requires a coherent approach to outliers, for both single samples and where entire data sets might be offset relative to the calibration curve. This paper looks at current developments in these areas.


2009 ◽  
Vol 4 (3) ◽  
pp. 308-309 ◽  
Author(s):  
Nicole A. Lazar

In their article, Vul, Harris, Winkielman, and Pashler (2009) , (this issue) raise the issue of nonindependent analysis in behavioral neuroimaging, whereby correlations are artificially inflated as a result of spurious statistical procedures. In this comment, I note that the phenomenon in question is a type of selection bias and hence is neither new nor unique to fMRI. The use of massive, complex data sets (common in modern applications) to answer increasingly intricate scientific questions presents many potential pitfalls to valid statistical analysis. Strong collaboration between statisticians and scientists and the development of statistical methods specific to the types of data encountered in practice can help researchers avoid these pitfalls.


2020 ◽  
Vol 19 (6) ◽  
pp. 1047-1057 ◽  
Author(s):  
Yafeng Zhu ◽  
Lukas M. Orre ◽  
Yan Zhou Tran ◽  
Georgios Mermelekas ◽  
Henrik J. Johansson ◽  
...  

Quantitative proteomics by mass spectrometry is widely used in biomarker research and basic biology research for investigation of phenotype level cellular events. Despite the wide application, the methodology for statistical analysis of differentially expressed proteins has not been unified. Various methods such as t test, linear model and mixed effect models are used to define changes in proteomics experiments. However, none of these methods consider the specific structure of MS-data. Choices between methods, often originally developed for other types of data, are based on compromises between features such as statistical power, general applicability and user friendliness. Furthermore, whether to include proteins identified with one peptide in statistical analysis of differential protein expression varies between studies. Here we present DEqMS, a robust statistical method developed specifically for differential protein expression analysis in mass spectrometry data. In all data sets investigated there is a clear dependence of variance on the number of PSMs or peptides used for protein quantification. DEqMS takes this feature into account when assessing differential protein expression. This allows for a more accurate data-dependent estimation of protein variance and inclusion of single peptide identifications without increasing false discoveries. The method was tested in several data sets including E. coli proteome spike-in data, using both label-free and TMT-labeled quantification. Compared with previous statistical methods used in quantitative proteomics, DEqMS showed consistently better accuracy in detecting altered protein levels compared with other statistical methods in both label-free and labeled quantitative proteomics data. DEqMS is available as an R package in Bioconductor.


2019 ◽  
Author(s):  
Stanley E. Lazic ◽  
Jack R. Mellor ◽  
Michael C. Ashby ◽  
Marcus R. Munafo

AbstractPseudoreplication occurs when the number of measured values or data points exceeds the number of genuine replicates, and when the statistical analysis treats all data points as independent and thus fully contributing to the result. By artificially inflating the sample size, pseudoreplication contributes to irreproducibility, and it is a pervasive problem in biological research. In some fields, more than half of published experiments have pseudoreplication – making it one of the biggest threats to inferential validity. Researchers may be reluctant to use appropriate statistical methods if their hypothesis is about the pseudoreplicates and not the genuine replicates; for example, when an intervention is applied to pregnant female rodents (genuine replicates) but the hypothesis is about the effect on the multiple offspring (pseudoreplicates). We propose using a Bayesian predictive approach, which enables researchers to make valid inferences about biological entities of interest, even if they are pseudoreplicates, and show the benefits of this approach using twoin vivodata sets.


Children ◽  
2021 ◽  
Vol 8 (2) ◽  
pp. 143
Author(s):  
Julie Sommet ◽  
Enora Le Roux ◽  
Bérengère Koehl ◽  
Zinedine Haouari ◽  
Damir Mohamed ◽  
...  

Background: Many pediatric studies describe the association between biological parameters (BP) and severity of sickle cell disease (SCD) using different methods to collect or to analyze BP. This article assesses the methods used for collection and subsequent statistical analysis of BP, and how these impact prognostic results in SCD children cohort studies. Methods: Firstly, we identified the collection and statistical methods used in published SCD cohort studies. Secondly, these methods were applied to our cohort of 375 SCD children, to evaluate the association of BP with cerebral vasculopathy (CV). Results: In 16 cohort studies, BP were collected either once or several times during follow-up. The identified methods in the statistical analysis were: (1) one baseline value per patient (2) last known value; (3) mean of all values; (4) modelling of all values in a two-stage approach. Applying these four different statistical methods to our cohort, the results and interpretation of the association between BP and CV were different depending on the method used. Conclusion: The BP prognostic value depends on the chosen statistical analysis method. Appropriate statistical analyses of prognostic factors in cohort studies should be considered and should enable valuable and reproducible conclusions.


2021 ◽  
pp. 000276422110216
Author(s):  
Kazimierz M. Slomczynski ◽  
Irina Tomescu-Dubrow ◽  
Ilona Wysmulek

This article proposes a new approach to analyze protest participation measured in surveys of uneven quality. Because single international survey projects cover only a fraction of the world’s nations in specific periods, researchers increasingly turn to ex-post harmonization of different survey data sets not a priori designed as comparable. However, very few scholars systematically examine the impact of the survey data quality on substantive results. We argue that the variation in source data, especially deviations from standards of survey documentation, data processing, and computer files—proposed by methodologists of Total Survey Error, Survey Quality Monitoring, and Fitness for Intended Use—is important for analyzing protest behavior. In particular, we apply the Survey Data Recycling framework to investigate the extent to which indicators of attending demonstrations and signing petitions in 1,184 national survey projects are associated with measures of data quality, controlling for variability in the questionnaire items. We demonstrate that the null hypothesis of no impact of measures of survey quality on indicators of protest participation must be rejected. Measures of survey documentation, data processing, and computer records, taken together, explain over 5% of the intersurvey variance in the proportions of the populations attending demonstrations or signing petitions.


Genetics ◽  
2000 ◽  
Vol 154 (1) ◽  
pp. 381-395
Author(s):  
Pavel Morozov ◽  
Tatyana Sitnikova ◽  
Gary Churchill ◽  
Francisco José Ayala ◽  
Andrey Rzhetsky

Abstract We propose models for describing replacement rate variation in genes and proteins, in which the profile of relative replacement rates along the length of a given sequence is defined as a function of the site number. We consider here two types of functions, one derived from the cosine Fourier series, and the other from discrete wavelet transforms. The number of parameters used for characterizing the substitution rates along the sequences can be flexibly changed and in their most parameter-rich versions, both Fourier and wavelet models become equivalent to the unrestricted-rates model, in which each site of a sequence alignment evolves at a unique rate. When applied to a few real data sets, the new models appeared to fit data better than the discrete gamma model when compared with the Akaike information criterion and the likelihood-ratio test, although the parametric bootstrap version of the Cox test performed for one of the data sets indicated that the difference in likelihoods between the two models is not significant. The new models are applicable to testing biological hypotheses such as the statistical identity of rate variation profiles among homologous protein families. These models are also useful for determining regions in genes and proteins that evolve significantly faster or slower than the sequence average. We illustrate the application of the new method by analyzing human immunoglobulin and Drosophilid alcohol dehydrogenase sequences.


Sign in / Sign up

Export Citation Format

Share Document