scholarly journals A prior-based approach for hypothesis comparison and its utility to discern among temporal scenarios of divergence

2018 ◽  
Author(s):  
Eugenia Zarza ◽  
Robert B. O’Hara ◽  
Annette Klussmann-Kolb ◽  
Markus Pfenninger

AbstractOne of the major problems in evolutionary biology is to elucidate the relationships between historical events and the tempo and mode of lineage divergence. The development of relaxed molecular clock models and the increasing availability of DNA sequences resulted in more accurate estimations of taxa divergence times. However, finding the link between competing historical events and divergence is still challenging. Here we investigate assigning constrained-age priors to nodes of interest in a time-calibrated phylogeny as a means of hypothesis comparison. These priors are equivalent to historic scenarios for lineage origin. The hypothesis that best explains the data can be selected by comparing the likelihood values of the competing hypotheses, modelled with different priors. A simulation approach was taken to evaluate the performance of the prior-based method and to compare it with an unconstrained approach. We explored the effect of DNA sequence length and the temporal placement and span of competing hypotheses (i.e. historic scenarios) on selection of the correct hypothesis and the strength of the inference. Competing hypotheses were compared applying a posterior simulation analogue of the Akaike Information Criterion and Bayes factors (obtained after calculation of the marginal likelihood with three estimators: Harmonic Mean, Stepping Stone and Path Sampling). We illustrate the potential application of the prior-based method on an empirical data set to compare competing geological hypotheses explaining the biogeographic patterns in Pleurodeles newts. The correct hypothesis was selected on average 89% times. The best performance was observed with DNA sequence length of 3500-10000 bp. The prior-based method is most reliable when the hypotheses compared are not temporally too close. The strongest inferences were obtained when using the Stepping Stone and Path Sampling estimators. The prior-based approach proved effective in discriminating between competing hypotheses when used on empirical data. The unconstrained analyses performed well but it probably requires additional computational effort. Researchers applying this approach should rely only on inferences with moderate to strong support. The prior-based approach could be applied on biogeographical and phylogeographical studies where robust methods for historical inferences are still lacking.

Demography ◽  
2021 ◽  
Vol 58 (1) ◽  
pp. 51-74
Author(s):  
Lee Fiorio ◽  
Emilio Zagheni ◽  
Guy Abel ◽  
Johnathan Hill ◽  
Gabriel Pestre ◽  
...  

Abstract Georeferenced digital trace data offer unprecedented flexibility in migration estimation. Because of their high temporal granularity, many migration estimates can be generated from the same data set by changing the definition parameters. Yet despite the growing application of digital trace data to migration research, strategies for taking advantage of their temporal granularity remain largely underdeveloped. In this paper, we provide a general framework for converting digital trace data into estimates of migration transitions and for systematically analyzing their variation along a quasi-continuous time scale, analogous to a survival function. From migration theory, we develop two simple hypotheses regarding how we expect our estimated migration transition functions to behave. We then test our hypotheses on simulated data and empirical data from three platforms in two internal migration contexts: geotagged Tweets and Gowalla check-ins in the United States, and cell-phone call detail records in Senegal. Our results demonstrate the need for evaluating the internal consistency of migration estimates derived from digital trace data before using them in substantive research. At the same time, however, common patterns across our three empirical data sets point to an emergent research agenda using digital trace data to study the specific functional relationship between estimates of migration and time and how this relationship varies by geography and population characteristics.


2020 ◽  
pp. 002216782093422
Author(s):  
Tracey Woolrych ◽  
Michelle J. Eady ◽  
Corinne A. Green

Culture is important for the development of social skills in children, including empathy. Although empathy has long been linked with prosocial behaviors and attitudes, there is little research that links culture with development of empathy in children. This project sought to investigate and identify specific culturally related empathy elements in a sample of Dene and Inuit children from Northern Canada. Across seven different grade (primary) schools, 92 children aged 7 to 9 years participated in the study. Children’s drawings, and interviews about those pictures, were uniquely employed as empirical data which allowed researchers to gain access to the children’s perspective about what aspects of culture were important to them. Using empathy as the theoretical framework, a thematic analysis was conducted in a top-down deductive approach. The research paradigm elicited a rich data set revealing three major themes: sharing; knowledge of self and others; and acceptance of differences. The identified themes were found to have strong links with empathy constructs such as sharing, helping, perspective-taking, and self–other knowledges, revealing the important role that culture may play in the development of empathy. Findings from this study can help researchers explore and identify specific cultural elements that may contribute to the development of empathy in children.


2019 ◽  
pp. 089443931988844
Author(s):  
Ranjith Vijayakumar ◽  
Mike W.-L. Cheung

Machine learning methods have become very popular in diverse fields due to their focus on predictive accuracy, but little work has been conducted on how to assess the replicability of their findings. We introduce and adapt replication methods advocated in psychology to the aims and procedural needs of machine learning research. In Study 1, we illustrate these methods with the use of an empirical data set, assessing the replication success of a predictive accuracy measure, namely, R 2 on the cross-validated and test sets of the samples. We introduce three replication aims. First, tests of inconsistency examine whether single replications have successfully rejected the original study. Rejection will be supported if the 95% confidence interval (CI) of R 2 difference estimates between replication and original does not contain zero. Second, tests of consistency help support claims of successful replication. We can decide apriori on a region of equivalence, where population values of the difference estimates are considered equivalent for substantive reasons. The 90% CI of a different estimate lying fully within this region supports replication. Third, we show how to combine replications to construct meta-analytic intervals for better precision of predictive accuracy measures. In Study 2, R 2 is reduced from the original in a subset of replication studies to examine the ability of the replication procedures to distinguish true replications from nonreplications. We find that when combining studies sampled from same population to form meta-analytic intervals, random-effects methods perform best for cross-validated measures while fixed-effects methods work best for test measures. Among machine learning methods, regression was comparable to many complex methods, while support vector machine performed most reliably across a variety of scenarios. Social scientists who use machine learning to model empirical data can use these methods to enhance the reliability of their findings.


2003 ◽  
Vol 26 (4) ◽  
pp. 482-483 ◽  
Author(s):  
Gary Feng

Parameters in E-Z Reader models are estimated on the basis of a simple data set consisting of 30 means. Because of heavy aggregation, the data have a severe problem of multicolinearity and are unable to adequately constrain parameter values. This could give the model more power than the empirical data warrant. Future models should exploit the richness of eye movement data and avoid excessive aggregation.


2018 ◽  
Vol 2018 ◽  
pp. 1-13 ◽  
Author(s):  
Suk-Hwan Lee

A large number of studies have examined DNA storage to achieve information hiding in DNA sequences with DNA computing technology. However, most data hiding methods are irreversible in that the original DNA sequence cannot be recovered from the watermarked DNA sequence. This study presents reversible data hiding methods based on multilevel histogram shifting to prevent biological mutations, preserve sequence length, increase watermark capacity, and facilitate blind detection/recovery. The main features of our method are as follows. First, we encode a sequence of nucleotide bases with four-character symbols into integer values using the numeric order. Second, we embed multiple bits in each integer value by multilevel histogram shifting of noncircular type (NHS) and circular type (CHS). Third, we prevent the generation of false start/stop codons by verifying whether a start/stop codon is included in an integer value or between adjacent integer values. The results of our experiments confirmed that the NHS- and CHS-based methods have higher watermark capacities than conventional methods in terms of supplementary data used for decoding. Moreover, unlike conventional methods, our methods do not generate false start/stop codons.


2015 ◽  
Author(s):  
Kassian Kobert ◽  
Leonidas Salichos ◽  
Antonis Rokas ◽  
Alexandros Stamatakis

AbstractWe present, implement, and evaluate an approach to calculate the internode certainty and tree certainty on a given reference tree from a collection of partial gene trees. Previously, the calculation of these values was only possible from a collection of gene trees with exactly the same taxon set as the reference tree. An application to sets of partial gene trees requires mathematical corrections in the internode certainty and tree certainty calculations. We implement our methods in RAxML and test them on empirical data sets. These tests imply that the inclusion of partial trees does matter. However, in order to provide meaningful measurements, any data set should also include trees containing the full species set.


2016 ◽  
Author(s):  
Matthew J Vavrek

Cluster analysis is one of the most commonly used methods in palaeoecological studies, particularly in studies investigating biogeographic patterns. Although a number of different clustering methods are widely used, the approach and underlying assumptions of many of these methods are quite different. For example, methods may be hierarchical or non-hierarchical in their approaches, and may use Euclidean distance or non-Euclidean indices to cluster the data. In order to assess the effectiveness of the different clustering methods as compared to one another, a simulation was designed that could assess each method over a range of both cluster distinctiveness and sampling intensity. Additionally, a non-hierarchical, non-Euclidean, iterative clustering method implemented in the R Statistical Language is described. This method, Non-Euclidean Relational Clustering (NERC), creates distinct clusters by dividing the data set in order to maximize the average similarity within each cluster, identifying clusters in which each data point is on average more similar to those within its own group than to those in any other group. While all the methods performed well with clearly differentiated and well-sampled datasets, when data are less than ideal the linkage methods perform poorly compared to non-Euclidean based k-means and the NERC method. Based on this analysis, Unweighted Pair Group Method with Arithmetic Mean and neighbor joining methods are less reliable with incomplete datasets like those found in palaeobiological analyses, and the k-means and NERC methods should be used in their place.


2020 ◽  
Author(s):  
Yifei Yan ◽  
Ansley Gnanapragasam ◽  
Swneke Bailey

ABSTRACTMotivationChromatin immuno-precipitation sequencing (ChIP-Seq) of histone post-translational modifications coupled with de novo motif elucidation and enrichment analyses can identify transcription factors responsible for orchestrating transitions between cell-and disease-states. However, the identified regulatory elements can span several kilobases (kb) in length, which complicates motif-based analyses. Restricting the length of the target DNA sequence(s) can reduce false positives. Therefore, we present HisTrader, a computational tool to identify the regions accessible to transcription factors, nucleosome free regions (NFRs), within histone modification peaks to reduce the DNA sequence length required for motif analyses.ResultsHisTrader accurately identifies NFRs from H3K27Ac ChIP-seq profiles of the lung cancer cell line A549, which are validated by the presence of DNaseI hypersensitivity. In addition, HisTrader reveals that multiple NFRs are common within individual regulatory elements; an easily overlooked feature that should be considered to improve sensitivity of motif analyses using histone modification ChIP-seq data.Availability and implementationThe HisTrader script is open-source and available on GitHub (https://github.com/SvenBaileyLab/Histrader) under a GNU general public license (GPLv3). HisTrader is written in PERL and can be run on any platform with PERL installed.


2019 ◽  
Author(s):  
Damiano Righetti ◽  
Meike Vogt ◽  
Niklaus E. Zimmermann ◽  
Nicolas Gruber

Abstract. Marine phytoplankton are responsible for half of the global net primary production and perform multiple other ecological functions and services of the global ocean. These photosynthetic organisms comprise more than 4300 marine species, but their biogeographic patterns and the resulting species diversity are poorly known, mostly owing to severe data limitations. Here, we compile, synthesize, and harmonize marine phytoplankton occurrence records from the two largest biological occurrence archives (Ocean Biogeographic Information System; OBIS, and Global Biodiversity Information Facility; GBIF) and three recent data collections. The resulting PhytoBase data set contains over 1.36 million phytoplankton occurrence records (1.28 million at the level of species) for a total of 1711 species, spanning the principal groups of the Bacillariophyceae, Dinoflagellata, and Haptophyta as well as several other groups. This data compilation increases the amount of marine phytoplankton records available through the single largest contributing archive (OBIS) by 65 %. Data span all ocean basins, latitudes and most seasons. Analyzing the oceanic inventory of sampled phytoplankton species richness at the broadest spatial scales possible, using a resampling procedure, we find that richness tends to saturate in the pantropics at ~ 93 % of all species in our database, at ~64% in temperate waters, and at ~ 35 % in the cold Northern Hemisphere, while the Southern Hemisphere remains underexplored. We provide metadata on the cruise, research institution, depth and date of collection for each record, and we include cell-counts for 195 339 records. We strongly recommend consideration of global spatiotemporal biases in sampling intensity and varying taxonomic sampling scopes between research cruises or institutions when analyzing the occurrence database. Including such information into statistical analysis tools, such as species distribution models may serve to project the diversity, niches, and distribution of species in the contemporary and future ocean, opening the door for a quantification of macroecological phytoplankton patterns. PhytoBase can be downloaded from PANGAEA, https://doi.org/10.1594/PANGAEA.904397 (Righetti et al., 2019a).


Sign in / Sign up

Export Citation Format

Share Document