scholarly journals Estimates of introgression as a function of pairwise distances

2017 ◽  
Author(s):  
Bastian Pfeifer ◽  
Durrell D Kapan

AbstractBackgroundResearch over the last 10 years highlights the increasing importance of hybridization between species as a major force structuring the evolution of genomes and potentially providing raw material for adaptation by natural and/or sexual selection. Fueled by research in a few model systems where phenotypic hybrids are easily identified, research into hybridization and introgression (the flow of genes between species) has exploded with the advent of whole-genome sequencing and emerging methods to detect the signature of hybridization at the whole-genome or chromosome level. Amongst these are a general class of methods that utilize patterns of single-nucleotide polymorphisms (SNPs) across a tree as markers of hybridization. These methods have been applied to a variety of genomic systems ranging from butterflies to Neanderthal’s to detect introgression, however, when employed at a fine genomic scale these methods do not perform well to quantify introgression in small sample windows.ResultsWe introduce a novel method to detect introgression by combining two widely used statistics: pairwise nucleotide diversity dxy and Patterson’s D. The resulting statistic, the Basic distance fraction (Bdf), accounts for genetic distance across possible topologies and is designed to simultaneously detect and quantify introgression. We also relate our new method to the recently published fd and incorporate these statistics into the powerful genomics R-package PopGenome, freely available on CRAN. The supplemental material contains a wide range of simulation studies and a detailed manual how to perform the statistics within the PopGenome framework.ConclusionWe present a new distance based statistic Bdf that avoids the pitfalls of Patterson’s D when applied to small genomic regions and accurately quantifies the fraction of introgression (f) for a wide range of simulation scenarios.

2019 ◽  
Author(s):  
Chiao-Lin Chen ◽  
Jonathan Rodiger ◽  
Verena Chung ◽  
Raghuvir Viswanatha ◽  
Stephanie E. Mohr ◽  
...  

ABSTRACTCRISPR-Cas9 is a powerful genome editing technology in which a single guide RNA (sgRNA) confers target site specificity to achieve Cas9-mediated genome editing. Numerous sgRNA design tools have been developed based on reference genomes for humans and model organisms. However, existing resources are not optimal as genetic mutations or single nucleotide polymorphisms (SNPs) within the targeting region affect the efficiency of CRISPR-based approaches by interfering with guide-target complementarity. To facilitate identification of sgRNAs (1) in non-reference genomes, (2) across varying genetic backgrounds, or (3) for specific targeting of SNP-containing alleles, for example, disease relevant mutations, we developed a web tool, SNP-CRISPR (https://www.flyrnai.org/tools/snp_crispr/). SNP-CRISPR can be used to design sgRNAs based on public variant data sets or user-identified variants. In addition, the tool computes efficiency and specificity scores for sgRNA designs targeting both the variant and the reference. Moreover, SNP-CRISPR provides the option to upload multiple SNPs and target single or multiple nearby base changes simultaneously with a single sgRNA design. Given these capabilities, SNP-CRISPR has a wide range of potential research applications in model systems and for design of sgRNAs for disease-associated variant correction.


2019 ◽  
Vol 10 (2) ◽  
pp. 489-494 ◽  
Author(s):  
Chiao-Lin Chen ◽  
Jonathan Rodiger ◽  
Verena Chung ◽  
Raghuvir Viswanatha ◽  
Stephanie E. Mohr ◽  
...  

CRISPR-Cas9 is a powerful genome editing technology in which a single guide RNA (sgRNA) confers target site specificity to achieve Cas9-mediated genome editing. Numerous sgRNA design tools have been developed based on reference genomes for humans and model organisms. However, existing resources are not optimal as genetic mutations or single nucleotide polymorphisms (SNPs) within the targeting region affect the efficiency of CRISPR-based approaches by interfering with guide-target complementarity. To facilitate identification of sgRNAs (1) in non-reference genomes, (2) across varying genetic backgrounds, or (3) for specific targeting of SNP-containing alleles, for example, disease relevant mutations, we developed a web tool, SNP-CRISPR (https://www.flyrnai.org/tools/snp_crispr/). SNP-CRISPR can be used to design sgRNAs based on public variant data sets or user-identified variants. In addition, the tool computes efficiency and specificity scores for sgRNA designs targeting both the variant and the reference. Moreover, SNP-CRISPR provides the option to upload multiple SNPs and target single or multiple nearby base changes simultaneously with a single sgRNA design. Given these capabilities, SNP-CRISPR has a wide range of potential research applications in model systems and for design of sgRNAs for disease-associated variant correction.


2016 ◽  
Author(s):  
Brian J. Knaus ◽  
Niklaus J. Grünwald

AbstractSoftware to call single nucleotide polymorphisms or related genetic variants has converged on the variant call format (VCF) as the output format of choice. This has created a need for tools to work with VCF files. While an increasing number of software exists to read VCF data, many only extract the genotypes without including the data associated with each genotype that describes its quality. We created the R package vcfR to address this issue. We developed a VCF file exploration tool implemented in the R language because R provides an interactive experience and an environment that is commonly used for genetic data analysis. Functions to read and write VCF files into R as well as functions to extract portions of the data and to plot summary statistics of the data are implemented. VcfR further provides the ability to visualize how various parameterizations of the data affect the results. Additional tools are included to integrate sequence (FASTA) and annotation data (GFF) for visualization of genomic regions such as chromosomes. Conversion functions translate data from the vcfR data structure to formats used by other R genetics packages. Computationally intensive functions are implemented in C++ to improve performance. Use of these tools is intended to facilitate VCF data exploration, including intuitive methods for data quality control and easy export to other R packages for further analysis. VcfR thus provides essential, novel tools currently not available in R.


2015 ◽  
Author(s):  
Keegan D. Korthauer ◽  
Li-Fang Chu ◽  
Michael A. Newton ◽  
Yuan Li ◽  
James Thomson ◽  
...  

AbstractThe ability to quantify cellular heterogeneity is a major advantage of single-cell technologies. Although understanding such heterogeneity is of primary interest in a number of studies, for convenience, statistical methods often treat cellular heterogeneity as a nuisance factor. We present a novel method to characterize differences in expression in the presence of distinct expression states within and among biological conditions. Using simulated and case study data, we demonstrate that the modeling framework is able to detect differential expression patterns of interest under a wide range of settings. Compared to existing approaches, scDD has higher power to detect subtle differences in gene expression distributions that are more complex than a mean shift, and is able to characterize those differences. The freely available R package scDD implements the approach.


Author(s):  
Frieder Hadlich ◽  
Henry Reyer ◽  
Michael Oster ◽  
Nares Trakooljul ◽  
Eduard Muráni ◽  
...  

AbstractCommercial and customized microarrays are valuable tools for the analysis of holistic expression patterns, but require the integration of the latest genomic information. This study provides a comprehensive workflow implemented in an R package (rePROBE) to assign the entire probes and to annotate the probe sets based on up-to-date genomic and transcriptomic information. The rePROBE R package is freely available at https://github.com/friederhadlich/rePROBE. It can be applied to available gene expression microarray platforms and addresses both public and custom databases. The revised probe assignment and updated probe-set annotation were applied to commercial microarrays available for different livestock species, i.e. ChiGene-1_0-st (Gallus gallus, 443,579 probes; 18,530 probe sets), PorGene-1_1-st (Sus scrofa, 592,005; 25,779) and BovGene-1_0-st (Bos taurus, 530,717; 24,759) as well as human (Homo sapiens, HuGene-1_0-st) and mouse (Mus musculus, HT_MG-430_PM) microarrays. Using current specie-specific transcriptomic information (RefSeq, Ensembl and partially non-redundant nucleotide sequences) and genomic information, the applied workflow revealed 297,574 probes for chickens (pig: 384,715; cattle: 363,077; human: 481,168; mouse: 324,942) assigned to 15,689 probe sets (pig: 21,673; cattle: 21,238; human: 23,495; mouse: 32,494). These are representative of 12,641 unique genes that were both annotated and positioned (pig: 15,758; cattle: 18,046; human: 20,167; mouse: 16,335). Additionally, the workflow collects information on the number of single nucleotide polymorphisms (SNPs) within respective targeted genomic regions and thus provides a detailed basis for comprehensive analyses such as quantitative trait locus (eQTL) expression studies to identify quantitative and functional traits.


2020 ◽  
Author(s):  
Samuel B. Fernandes ◽  
Alexander E. Lipka

AbstractMotivationAdvances in genotyping and phenotyping techniques have enabled the acquisition of a great amount of data. Consequently, there is an interest in multivariate statistical analyses that identify genomic regions likely to contain causal mutations affecting multiple traits (i.e., pleiotropy). As the demand for multivariate analyses increases, it is imperative that optimal tools are available to compare different implementations of these analyses. To facilitate the testing and validation of these multivariate approaches, we developed simplePHENOTYPES, an R package that simulates pleiotropy, partial pleiotropy, and spurious pleiotropy in a wide range of genetic architectures, including additive, dominance and epistatic models.ResultsWe illustrate simplePHENOTYPES’ ability to simulate thousands of phenotypes in less than one minute. We then provide a vignette illustrating how to simulate a set of correlated traits in simplePHENOTYPES. Finally, we demonstrate the use of results from simplePHENOTYPES in a standard GWAS software, as well as the numerical equivalence of simulated phenotypes from simplePHENOTYPES and other packages with similar capabilities.ConclusionssimplePHENOTYPES is a CRAN package that makes it possible to simulate multiple traits controlled by loci with varying degrees of pleiotropy. Its ability to interface with both commonly-used marker data formats and downstream quantitative genetics software and packages should facilitate a rigorous assessment of both existing and emerging statistical GWAS and GS approaches. simplePHENOTYPES is also available at https://github.coin/sainuelbfernandes/siinplePHENOTYPES.


2016 ◽  
Author(s):  
Don Klinkenberg ◽  
Jantien Backer ◽  
Xavier Didelot ◽  
Caroline Colijn ◽  
Jacco Wallinga

AbstractWhole-genome sequencing (WGS) of pathogens from host samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events which can be used for further epidemiologic analyses, such as identification of risk factors for infectivity and transmission. However, the relationship between transmission events and WGS data is obscured by uncertainty arising from four largely unobserved processes: transmission, case observation, within-host pathogen dynamics and mutation. To properly resolve transmission events, these processes need to be taken into account. Recent years have seen much progress in theory and method development, but applications are tailored to specific datasets with matching model assumptions and code, or otherwise make simplifying assumptions that break up the dependency between the four processes. To obtain a method with wider applicability, we have developed a novel approach to reconstruct transmission trees with WGS data. Our approach combines elementary models for transmission, case observation, within-host pathogen dynamics, and mutation. We use Bayesian inference with MCMC for which we have designed novel proposal steps to efficiently traverse the posterior distribution, taking account of all unobserved processes at once. This allows for efficient sampling of transmission trees from the posterior distribution, and robust estimation of consensus transmission trees. We implemented the proposed method in a new R package phybreak. The method performs well in tests of both new and published simulated data. We apply the model to to five datasets on densely sampled infectious disease outbreaks, covering a wide range of epidemiological settings. Using only sampling times and sequences as data, our analyses confirmed the original results or improved on them: the more realistic infection times place more confidence in the inferred transmission trees.Author SummaryIt is becoming easier and cheaper to obtain whole genome sequences of pathogen samples during outbreaks of infectious diseases. If all hosts during an outbreak are sampled, and these samples are sequenced, the small differences between the sequences (single nucleotide polymorphisms, SNPs) give information on the transmission tree, i.e. who infected whom, and when. However, correctly inferring this tree is not straightforward, because SNPs arise from unobserved processes including infection events, as well as pathogen growth and mutation within the hosts. Several methods have been developed in recent years, but none so generic and easily accessible that it can easily be applied to new settings and datasets. We have developed a new model and method to infer transmission trees without putting prior limiting constraints on the order of unobserved events. The method is easily accessible in an R package implementation. We show that the method performs well on new and previously published simulated data. We illustrate applicability to a wide range of infectious diseases and settings by analysing five published datasets on densely sampled infectious disease outbreaks, confirming or improving the original results.


2019 ◽  
pp. 40-46 ◽  
Author(s):  
V.V. Savchenko ◽  
A.V. Savchenko

We consider the task of automated quality control of sound recordings containing voice samples of individuals. It is shown that in this task the most acute is the small sample size. In order to overcome this problem, we propose the novel method of acoustic measurements based on relative stability of the pitch frequency within a voice sample of short duration. An example of its practical implementation using aninter-periodic accumulation of a speech signal is considered. An experimental study with specially developed software provides statistical estimates of the effectiveness of the proposed method in noisy environments. It is shown that this method rejects the audio recording as unsuitable for a voice biometric identification with a probability of 0,95 or more for a signal to noise ratio below 15 dB. The obtained results are intended for use in the development of new and modifying existing systems of collecting and automated quality control of biometric personal data. The article is intended for a wide range of specialists in the field of acoustic measurements and digital processing of speech signals, as well as for practitioners who organize the work of authorized organizations in preparing for registration samples of biometric personal data.


2015 ◽  
Vol 2 (1) ◽  
pp. 6-12
Author(s):  
Agus Sugiarta ◽  
Houtman P. Siregar ◽  
Dedy Loebis

Automation of process control in chemical plant is an inspiring application field of mechatronicengineering. In order to understand the complexity of the automation and its application requireknowledges of chemical engineering, mechatronic and other numerous interconnected studies.The background of this paper is an inherent problem of overheating due to lack of level controlsystem. The objective of this research is to control the dynamic process of desired level more tightlywhich is able to stabilize raw material supply into the chemical plant system.The chemical plant is operated within a wide range of feed compositions and flow rates whichmake the process control become difficult. This research uses modelling for efficiency reason andanalyzes the model by PID control algorithm along with its simulations by using Matlab.


2020 ◽  
Vol 2020 (17) ◽  
pp. 34-1-34-7
Author(s):  
Matthew G. Finley ◽  
Tyler Bell

This paper presents a novel method for accurately encoding 3D range geometry within the color channels of a 2D RGB image that allows the encoding frequency—and therefore the encoding precision—to be uniquely determined for each coordinate. The proposed method can thus be used to balance between encoding precision and file size by encoding geometry along a normal distribution; encoding more precisely where the density of data is high and less precisely where the density is low. Alternative distributions may be followed to produce encodings optimized for specific applications. In general, the nature of the proposed encoding method is such that the precision of each point can be freely controlled or derived from an arbitrary distribution, ideally enabling this method for use within a wide range of applications.


Sign in / Sign up

Export Citation Format

Share Document