null distribution
Recently Published Documents


TOTAL DOCUMENTS

325
(FIVE YEARS 63)

H-INDEX

27
(FIVE YEARS 2)

Author(s):  
Carolina Barata ◽  
Rui Borges ◽  
Carolin Kosiol

For over a decade, experimental evolution has been combined with high-throughput sequencing techniques in so-called Evolve-and-Resequence (E&R) experiments. This allows testing for selection in populations kept in the laboratory under given experimental conditions. However, identifying signatures of adaptation in E&R datasets is far from trivial, and it is still necessary to develop more efficient and statistically sound methods for detecting selection in genome-wide data. Here, we present Bait-ER - a fully Bayesian approach based on the Moran model of allele evolution to estimate selection coefficients from E&R experiments. The model has overlapping generations, a feature that describes several experimental designs found in the literature. We tested our method under several different demographic and experimental conditions to assess its accuracy and precision, and it performs well in most scenarios. However, some care must be taken when analysing specific allele trajectories, particularly those where drift largely dominates and starting frequencies are low. We compare our method with other available software and report that ours has generally high accuracy even for very difficult trajectories. Furthermore, our approach avoids the computational burden of simulating an empirical null distribution, outperforming available software in terms of computational time and facilitating its use on genome-wide data. We implemented and released our method in a new open-source software package that can be accessed at https://github.com/mrborges23/Bait-ER.


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Junhui Qiu ◽  
Qi Zhou ◽  
Weicai Ye ◽  
Qianjun Chen ◽  
Yun-Juan Bao

Abstract Background The gene-specific sweep is a selection process where an advantageous mutation along with the nearby neutral sites in a gene region increases the frequency in the population. It has been demonstrated to play important roles in ecological differentiation or phenotypic divergence in microbial populations. Therefore, identifying gene-specific sweeps in microorganisms will not only provide insights into the evolutionary mechanisms, but also unravel potential genetic markers associated with biological phenotypes. However, current methods were mainly developed for detecting selective sweeps in eukaryotic data of sparse genotypes and are not readily applicable to prokaryotic data. Furthermore, some challenges have not been sufficiently addressed by the methods, such as the low spatial resolution of sweep regions and lack of consideration of the spatial distribution of mutations. Results We proposed a novel gene-centric and spatial-aware approach for identifying gene-specific sweeps in prokaryotes and implemented it in a python tool SweepCluster. Our method searches for gene regions with a high level of spatial clustering of pre-selected polymorphisms in genotype datasets assuming a null distribution model of neutral selection. The pre-selection of polymorphisms is based on their genetic signatures, such as elevated population subdivision, excessive linkage disequilibrium, or significant phenotype association. Performance evaluation using simulation data showed that the sensitivity and specificity of the clustering algorithm in SweepCluster is above 90%. The application of SweepCluster in two real datasets from the bacteria Streptococcus pyogenes and Streptococcus suis showed that the impact of pre-selection was dramatic and significantly reduced the uninformative signals. We validated our method using the genotype data from Vibrio cyclitrophicus, the only available dataset of gene-specific sweeps in bacteria, and obtained a concordance rate of 78%. We noted that the concordance rate could be underestimated due to distinct reference genomes and clustering strategies. The application to the human genotype datasets showed that SweepCluster is also applicable to eukaryotic data and is able to recover 80% of a catalog of known sweep regions. Conclusion SweepCluster is applicable to a broad category of datasets. It will be valuable for detecting gene-specific sweeps in diverse genotypic data and provide novel insights on adaptive evolution.


2021 ◽  
Vol 19 (1) ◽  
pp. 2-21
Author(s):  
Manish Goyal ◽  
Narinder Kumar

One of the fundamental problems in testing of equality of populations is of testing the equality of scale parameters. The subsequent usages for scale are dispersion, spread and variability. In this paper, we proposed non-parametric tests based on U-Statistics for the testing of equality of scale parameters. The null distribution of proposed tests is developed and its Pitman efficiency is worked out to compare proposed tests with respect to some existing tests. Simulation study is carried out to compute the asymptotic power of proposed tests. An illustrative example is also provided.


2021 ◽  
Author(s):  
Kenichi Imai ◽  
Ryo Ikeno ◽  
Hajime Tanaka ◽  
Norio Takada

The emergence of SARS-CoV-2 Delta variants has escalated COVID-19 cases globally due to their high transmissibility. Since saliva is crucial for SARS-CoV-2 transmission, we hypothesized that a higher viral load of Delta variants in saliva than their parental wild-type strains contributed to the high transmissibility in the first place. However, studies have not reported this particular comparison done with viral copy numbers. Twenty-two genetically confirmed -positive saliva samples for wild-type strain and 32 Delta variants were statistically compared for viral copy number per milliliter determined by real-time qPCR combined with synthesized viral RNA and Poisson's null distribution equation between the groups of wild and variant strains and between whole saliva and centrifugal supernatant in each group. We found that the copy number of the Delta variants was 15.1 times higher than wild-type strains of the whole saliva. In addition, the viral load of both strains in the whole saliva was higher than the pertinent supernatant, indicating that most viruses in the whole saliva are associated with host cells. Meanwhile, more than a million virions per milliliter of the viral load of the variants in the supernatants were 4.0 times higher but not significant than wild-type strains. Humanity must share our findings; the simple but concrete note that Delta variant viral load is abundant in the saliva is critical for preventing the spread of infection.


Author(s):  
A. V. Rubanovich ◽  
V. A. Saenko

Marginal screening (MS) is the computationally simple and commonly used for the dimension reduction procedures. In it, a linear model is constructed for several top predictors, chosen according to the absolute value of marginal correlations with the dependent variable. Importantly, when kpredictors out of mprimary covariates are selected, the standard regression analysis may yield false-positive results if m>> k(Freedman's paradox). In this work, we provide analytical expressions describing null distribution of the test statistics for model selection via MS. Using the theory of order statistics, we show that under MS, the common F-statistic is distributed as a mean of ktop variables out of mindependent random variables having a 21χdistribution. Based on this finding, we estimated critical p-values for multiple regression models after MS, comparisons with which of those obtained in real studies will help researchers to avoid false-positive result. Analytical solutions obtained in the work are implemented in a free Excel spreadsheet program.


2021 ◽  
Vol 99 (Supplement_3) ◽  
pp. 12-13
Author(s):  
Daiane C Becker Scalez ◽  
Samir Id-Lahoucine ◽  
Pablo A S Fonseca ◽  
Joaquim Casellas ◽  
Angela Cánovas

Abstract Transmission ratio distortion (TRD) is a process when one allele from either parent is preferentially transmitted to the offspring. The identification of genomic regions affected by TRD might help in the detection of lethal alleles or potential genes affecting reproduction. Here, we investigated TRD in crossbreed beef cattle population aiming to identify genomic regions showing altered deviations in segregation that could be affecting reproduction performance. A total of 237 genotyped animals were used including 46 sires, 80 dams, and 111 parent-offspring (trios). The predominant breeds of these animals were Angus (61.83%), Simmental (18.99%), Gelbvieh (6.12%), Charolais (3.65%), Hereford (2.46%) and Limousin (1.57%). After excluding SNPs with minor allele frequency lower than 0.05 and call-rate lower than 0.90, a total of 369,902 autosomal SNPs were retained for further analyses. The SNP-by-SNP analysis was performed within a Bayesian framework using TRDscanv.2.0 software, using 100,000 iterations, with 10,000 iterations being discarded as burn-in. As table 1 shows, 33 SNPs were identified with TRD, considering a Bayes Factor (BF)≥100 and the approximate empirical null distribution of TRD at 0.01% margin error. Among them, 26 SNPs were parent-unspecific and 7 SNPs were parent-specific TRD. For parent-specific TRD, 214 were identified for sire- and 162 for dam-TRD (BF≥100). Among them, 4 SNPs were detected with sire- and dam-TRD in opposite direction of preference of transmission. Preliminary functional and positional analysis was performed using the list of TRD regions with BF≥100 and the approximate empirical null distribution of TRD at 0.01% margin error. For sire-TRD, 14% of the identified QTL (n = 254) were related to non-return rate. For dam-TRD, 21 regions related to conception rate were found (1.5%) and 13 regions related to stillbirth (0.93%). Haplotype analysis is in progress to identify additional candidate regions and alleles with TRD to better understand this phenomenon in a crossbreed beef population.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Olivier B. Simon ◽  
Isabelle Buard ◽  
Donald C. Rojas ◽  
Samantha K. Holden ◽  
Benzi M. Kluger ◽  
...  

AbstractGraph theory-based approaches are efficient tools for detecting clustering and group-wise differences in high-dimensional data across a wide range of fields, such as gene expression analysis and neural connectivity. Here, we examine data from a cross-sectional, resting-state magnetoencephalography study of 89 Parkinson’s disease patients, and use minimum-spanning tree (MST) methods to relate severity of Parkinsonian cognitive impairment to neural connectivity changes. In particular, we implement the two-sample multivariate-runs test of Friedman and Rafsky (Ann Stat 7(4):697–717, 1979) and find it to be a powerful paradigm for distinguishing highly significant deviations from the null distribution in high-dimensional data. We also generalize this test for use with greater than two classes, and show its ability to localize significance to particular sub-classes. We observe multiple indications of altered connectivity in Parkinsonian dementia that may be of future use in diagnosis and prediction.


2021 ◽  
Vol 9 (10) ◽  
pp. 1084
Author(s):  
Jean-Marc Guarini ◽  
Shawn Hinz ◽  
Jennifer Coston-Guarini

Early detection of environmental disturbances affecting shellfish stock condition is highly desirable for aquaculture activities. In this article, a new biophysical model-based early warning system (EWS) is described, that assesses bivalve stock condition by diagnosing signs of persistent physiological dysfunctioning. The biophysical model represents valve gape dynamics, controlled by active contractions of the adductor muscle countering the passive action of the hinge ligament; the dynamics combine continuous convergence to a steady-state interspersed with discrete closing events. A null simulation was introduced to describe undisturbed conditions. The diagnostic compares valve gape measurements and simulations. Indicators are inferred from the model parameters, and disturbances are assessed when their estimates deviate from their null distribution. Instead of focusing only on discrete events, our EWS exploits the complete observed dynamics within successive time intervals defined by the variation scales. When applied to a valvometry data series, collected in controlled conditions from scallops (Pecten maximus), the EWS indicated that one among four individuals exhibited signs its physiological condition was degrading. This was detected neither during experiments nor during the initial data analysis, suggesting the utility of an approach that quantifies physiological mechanisms underlying functional responses. Practical implementations of biological-EWS at farming sites are then discussed.


2021 ◽  
Author(s):  
James R Whiting ◽  
Josephine R Paris ◽  
Mijke van der Zee ◽  
Bonnie A Fraser

The repeatability of evolution at the genetic level has been demonstrated to vary along a continuum from complete parallelism to divergence. In order to better understand why this continuum exists within and among systems, hypotheses must be tested using high-confidence sets of candidate loci for repeatability. Despite this, few methods have been developed to scan SNP data for signatures specifically associated with repeatability, as opposed to local adaptation. Here we present AF-vapeR (Allele Frequency Vector Analysis of Parallel Evolutionary Responses), an approach designed to identify genome regions exhibiting highly correlated allele frequency changes within haplotypes and among replicated allele frequency change vectors. The method divides the genome into windows of an equivalent number of SNPs, and within each window performs eigen decomposition over normalised allele frequency change vectors (AFV), each derived from a replicated pair of populations/species. Properties of the resulting eigenvalue distribution can be used to compare regions of the genome for those exhibiting strong parallelism, and can also be compared against a null distribution derived from randomly permuted AFV. We demonstrate the utility of this approach to detect different modes of parallel evolution using simulations, and also demonstrate a reduction in error rate compared with intersecting FST outliers. Lastly, we apply AF-vapeR to three previously published datasets (stickleback, guppies, and Galapagos finches) which comprise a range of sampling and sequencing strategies, and lineage ages. We highlight known parallel regions whilst also identifying novel candidates. The main benefits of this approach include a reduced false-negative rate under many conditions, an emphasis on signals associated specifically with repeatable evolution as opposed to local adaptation, and an opportunity to identify different modes of parallel evolution at the first instance.


2021 ◽  
Author(s):  
Zhuo Zhen Chen ◽  
Wei-Cheih Wang ◽  
Lloyd Johnson ◽  
Jaimie Dufresne ◽  
Peter Bowden ◽  
...  

Abstract INTODUCTIONThere is an urgent need for a simple and sensitive method to elucidate the human plasma proteome to find markers of disease, or therapeutic factors. Human plasma proteome may be obtained from tryptic peptides that results from native digestion using commonly available, sensitive and robust analytical instruments such as linear quadrupole, tandem mass spectrometers. METHODSThe human plasma proteome was elucidated from three independent human EDTA plasma populations analyzed by precipitation with acetonitrile (ACN) for quaternary amine (QA) micro-chromatography prior to native tryptic digestion for nano liquid chromatography, electrospray ionization and tandem mass spectrometry (LC-ESI-MS/MS). The LC-ESI-MS/MS results from authentic plasma and blank injection MS/MS noise controls were parsed into SQL Server along with the fit of the MS/MS spectra from the rigorous X!TANDEM for analysis with the R statistical system. A total of 13,408 gene symbols from tryptic (TRYP) and/or phosphor/tryptic (STYP) peptides showed ≥ 10 peptides with an FDR q ≤ 0.01 from fit of MS/MS spectra by X!TANDEM and were resolved from the null distribution of background noise showed a Chi Square value of χ2 ≥ 9 (p ≤ 0.005). RESULTSNative digestion of human EDTA plasma permitted the identification and quantification of ~ 13,408 protein gene symbols in plasma that showed low FDR (q≤0.01) from the fit of peptide MS/MS spectra and where observation frequency was resolved from the null distribution of random MS/MS spectra of source noise from recordings of blank injections. There was good agreement between the orbital ion trap (OIT) and the sensitive linear ion trap (LIT) as well as the tryptic versus phospho/tryptic peptides. A distinct subset of human cellular proteins showed a variety of specific interaction domains that formed a highly interconnected network in the plasma. DISCUSIONThe agreement between the fit of the peptide MS/MS spectra by the rigorous X!TANDEM algorithm versus random MS/MS spectra controls from blank noise injections demonstrated the reliability of the experimental approach. The highly interconnected network in the plasma confirmed that digestion of plasma under native conditions permitted the identification and quantification of the proteins in a population of human plasma samples. CONCLUSIONIt was feasible to identify more than ten thousand proteins from human plasma with high confidence using a simple linear ion trap after precipitation, quaternary amine chromatography, native digestion and nano spray analysis with a linear quadrupole ion trap.


Sign in / Sign up

Export Citation Format

Share Document