diploid individual
Recently Published Documents

Abstract The patterns of genetic variation within and among individuals and populations can be used to make inferences about the evolutionary forces that generated those patterns. Numerous population genetic approaches have been developed in order to infer evolutionary history. Here, we present the ‘Two-Two (TT)’ and the ‘Two-Two-outgroup (TTo)’ methods; two closely related approaches for estimating divergence time based in coalescent theory. They rely on sequence data from two haploid genomes (or a single diploid individual) from each of two populations. Under a simple population-divergence model, we derive the probabilities of the possible sample configurations. These probabilities form a set of equations that can be solved to obtain estimates of the model parameters, including population split-times, directly from the sequence data. This transparent and computationally efficient approach to infer population divergence time makes it possible to estimate time scaled in generations (assuming a mutation rate), and not as a compound parameter of genetic drift. Using simulations under a range of demographic scenarios, we show that the method is relatively robust to migration and that the TTo-method can alleviate biases that can appear from drastic ancestral population size changes. We illustrate the utility of the approaches with some examples, including estimating split times for pairs of human populations as well as providing further evidence for the complex relationship among Neandertals and Denisovans and their ancestors.

Download Full-text

Estimating divergence times from DNA sequences

10.1101/2020.10.16.342600 ◽

2020 ◽

Author(s):

Per Sjödin ◽

James McKenna ◽

Mattias Jakobsson

Keyword(s):

Dna Sequences ◽

Sequence Data ◽

Divergence Time ◽

Ancestral Population ◽

Population Divergence ◽

Model Parameters ◽

Human Populations ◽

Computationally Efficient ◽

Evolutionary Forces ◽

Diploid Individual

ABSTRACTThe patterns of genetic variation within and among individuals and populations can be used to make inferences about the evolutionary forces that generated those patterns. Numerous population genetic approaches have been developed in order to infer evolutionary history. Here, we present the ‘Two-Two (TT)’ and the ‘Two-Two-outgroup (TTo)’ methods; two closely related approaches for estimating divergence time based in coalescent theory. They rely on sequence data from two haploid genomes (or a single diploid individual) from each of two populations. Under a simple population-divergence model, we derive the probabilities of the possible sample configurations. These probabilities form a set of equations that can be solved to obtain estimates of the model parameters, including population split-times, directly from the sequence data. This transparent and computationally efficient approach to infer population divergence time makes it possible to estimate time scaled in generations (assuming a mutation rate), and not as a compound parameter of genetic drift. Using simulations under a range of demographic scenarios, we show that the method is relatively robust to migration and that the TTo-method can alleviate biases that can appear from drastic ancestral population size changes. We illustrate the utility of the approaches with some examples, including estimating split times for pairs of human populations as well as providing further evidence for the complex relationship among Neandertals and Denisovans and their ancestors.

Download Full-text

Testcrosses are an efficient strategy for identifying cis regulatory variation: Bayesian analysis of allele specific expression (BASE)

10.1101/2020.10.01.322362 ◽

2020 ◽

Author(s):

Brecca Miller ◽

Alison Morse ◽

Jacqueline E. Borgert ◽

Zihao Liu ◽

Kelsey Sinclair ◽

...

Keyword(s):

Hypothesis Test ◽

Bioinformatics Pipeline ◽

Specific Expression ◽

Regulatory Variation ◽

Reduction Techniques ◽

Direct Cross ◽

Allele Specific ◽

Diploid Individual ◽

Prohibitive Cost ◽

Mouse Dataset

ABSTRACTAllelic imbalance (AI) occurs when alleles in a diploid individual are differentially expressed and indicates cis acting regulatory variation. What is the distribution of allelic effects in a natural population? Are all alleles the same? Are all alleles distinct? Tests of allelic effect are performed by crossing individuals and comparing expression between alleles directly in the F1. However, a crossing scheme that compares alleles pairwise is a prohibitive cost for more than a handful of alleles as the number of crosses is at least (n2-n)/2 where n is the number of alleles. We show here that a testcross design followed by a hypothesis test of AI between testcrosses can be used to infer differences between non-tester alleles, allowing n alleles to be compared with n crosses. Using a mouse dataset where both testcrosses and direct comparisons have been performed, we show that ∼75% of the predicted differences between non-tester alleles are validated in a background of ∼10% differences in AI. The testing for AI involves several complex bioinformatics steps. BASE is a complete bioinformatics pipeline that incorporates state-of-the-art error reduction techniques and a flexible Bayesian approach to estimating AI and formally comparing levels of AI between conditions. The modular structure of BASE has been packaged in Galaxy, made available in Nextflow and sbatch. (https://github.com/McIntyre-Lab/BASE_2020). In the mouse data, the direct test identifies more cis effects than the testcross. Cis-by-trans interactions with trans-acting factors on the X contributing to observed cis effects in autosomal genes in the direct cross remains a possible explanation for the discrepancy.

Download Full-text

Inferring number of populations and changes in connectivity under the n-island model

10.1101/2020.09.03.282251 ◽

2020 ◽

Author(s):

Armando Arredondo ◽

Beatriz Mourato ◽

Khoa Nguyen ◽

Simon Boitard ◽

Willy Rodríguez ◽

...

Keyword(s):

Population Structure ◽

Demographic History ◽

A Priori ◽

Simulated Data ◽

Fixed Number ◽

Ancestral Population ◽

Demographic Model ◽

Automated Method ◽

History Of ◽

Diploid Individual

AbstractInferring the demographic history of species is one of the greatest challenges in populations genetics. This history is often represented as a history of size changes, thus ignoring population structure. Alternatively, structure is defined a priori as a population tree and not inferred. Here we propose a framework based on the IICR (Inverse Instantaneous Coalescence Rate), which can be estimated using the PSMC method of Li and Durbin (2011) for a single diploid individual. For an isolated population, the IICR matches the population size history, which is how the PSMC outputs are generally interpreted. However, it is increasingly acknowledged that the IICR is a function of the demographic model and sampling scheme. Our automated method fits observed IICR curves of diploid individuals with IICR curves obtained under piecewise-stationary symmetrical island models, in which we assume a fixed number of time periods during which gene flow is constant. We infer the number of islands, their sizes, the periods at which connectivity changes and the corresponding rates of connectivity. Validation with simulated data showed that the method can accurately recover most of the scenario parameters. Our application to a set of five human PSMCs yielded demographic histories that are in agreement with previous studies using similar methods and with recent research suggesting ancient human structure. They are in contrast with the widely accepted view of human evolution consisting of one ancestral population branching into three large continental and panmictic populations with varying degrees of connectivity and no population structure within each continent.

Download Full-text

Fully-sensitive seed finding in sequence graphs using a hybrid index

Bioinformatics ◽

10.1093/bioinformatics/btz341 ◽

2019 ◽

Vol 35 (14) ◽

pp. i81-i89 ◽

Cited By ~ 4

Author(s):

Ali Ghaffaari ◽

Tobias Marschall

Keyword(s):

Simulated Data ◽

Genome Project ◽

Hybrid Index ◽

Read Mapping ◽

Combinatorial Explosion ◽

Complex Graph ◽

Seed Index ◽

Diploid Individual ◽

Index Size ◽

Genome Graph

Abstract Motivation Sequence graphs are versatile data structures that are, for instance, able to represent the genetic variation found in a population and to facilitate genome assembly. Read mapping to sequence graphs constitutes an important step for many applications and is usually done by first finding exact seed matches, which are then extended by alignment. Existing methods for finding seed hits prune the graph in complex regions, leading to a loss of information especially in highly polymorphic regions of the genome. While such complex graph structures can indeed lead to a combinatorial explosion of possible alleles, the query set of reads from a diploid individual realizes only two alleles per locus—a property that is not exploited by extant methods. Results We present the Pan-genome Seed Index (PSI), a fully-sensitive hybrid method for seed finding, which takes full advantage of this property by combining an index over selected paths in the graph with an index over the query reads. This enables PSI to find all seeds while eliminating the need to prune the graph. We demonstrate its performance with different parameter settings on both simulated data and on a whole human genome graph constructed from variants in the 1000 Genome Project dataset. On this graph, PSI outperforms GCSA2 in terms of index size, query time and sensitivity. Availability and implementation The C++ implementation is publicly available at: https://github.com/cartoonist/psi.

Download Full-text

Fully-sensitive Seed Finding in Sequence Graphs Using a Hybrid Index

10.1101/587717 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ali Ghaffaari ◽

Tobias Marschall

Keyword(s):

Simulated Data ◽

Genome Project ◽

Hybrid Index ◽

Read Mapping ◽

Data Set ◽

Combinatorial Explosion ◽

Complex Graph ◽

Diploid Individual ◽

Project Data ◽

Genome Graph

AbstractMotivationSequence graphs are versatile data structures that are, for instance, able to represent the genetic variation found in a population and to facilitate genome assembly. Read mapping to sequence graphs constitutes an important step for many applications and is usually done by first finding exact seed matches, which are then extended by alignment. Existing methods for finding seed hits prune the graph in complex regions, leading to a loss of information especially in highly polymorphic regions of the genome. While such complex graph structures can indeed lead to a combinatorial explosion of possible alleles, the query set of reads from a diploid individual realizes only two alleles per locus—a property that is not exploited by extant methods.ResultsWe present thePan-genomeSeedIndex (PSI), a fully-sensitive hybrid method for seed finding, which takes full advantage of this property by combining an index over selected paths in the graph with an index over the query reads. This enables PSI to find all seeds while eliminating the need to prune the graph. We demonstrate its performance with different parameter settings on both simulated data and on a whole human genome graph constructed from variants in the 1000 Genome Project data set. On this graph, PSI outperforms GCSA2 in terms of index size, query time, and sensitivity.AvailabilityThe C++ implementation is publicly available at:https://github.com/cartoonist/psi.

Download Full-text

A High-Quality De novo Genome Assembly from a Single Mosquito Using PacBio Sequencing

Genes ◽

10.3390/genes10010062 ◽

2019 ◽

Vol 10 (1) ◽

pp. 62 ◽

Cited By ~ 49

Author(s):

Sarah Kingan ◽

Haynes Heaton ◽

Juliana Cudini ◽

Christine Lambert ◽

Primo Baybayan ◽

...

Keyword(s):

Single Molecule ◽

Genome Assembly ◽

Population Genomics ◽

De Novo ◽

Anopheles Coluzzii ◽

High Quality ◽

De Novo Genome Assembly ◽

Core Technology ◽

Conserved Genes ◽

Diploid Individual

A high-quality reference genome is a fundamental resource for functional genetics, comparative genomics, and population genomics, and is increasingly important for conservation biology. PacBio Single Molecule, Real-Time (SMRT) sequencing generates long reads with uniform coverage and high consensus accuracy, making it a powerful technology for de novo genome assembly. Improvements in throughput and concomitant reductions in cost have made PacBio an attractive core technology for many large genome initiatives, however, relatively high DNA input requirements (~5 µg for standard library protocol) have placed PacBio out of reach for many projects on small organisms that have lower DNA content, or on projects with limited input DNA for other reasons. Here we present a high-quality de novo genome assembly from a single Anopheles coluzzii mosquito. A modified SMRTbell library construction protocol without DNA shearing and size selection was used to generate a SMRTbell library from just 100 ng of starting genomic DNA. The sample was run on the Sequel System with chemistry 3.0 and software v6.0, generating, on average, 25 Gb of sequence per SMRT Cell with 20 h movies, followed by diploid de novo genome assembly with FALCON-Unzip. The resulting curated assembly had high contiguity (contig N50 3.5 Mb) and completeness (more than 98% of conserved genes were present and full-length). In addition, this single-insect assembly now places 667 (>90%) of formerly unplaced genes into their appropriate chromosomal contexts in the AgamP4 PEST reference. We were also able to resolve maternal and paternal haplotypes for over 1/3 of the genome. By sequencing and assembling material from a single diploid individual, only two haplotypes were present, simplifying the assembly process compared to samples from multiple pooled individuals. The method presented here can be applied to samples with starting DNA amounts as low as 100 ng per 1 Gb genome size. This new low-input approach puts PacBio-based assemblies in reach for small highly heterozygous organisms that comprise much of the diversity of life.

Download Full-text

A High-Quality De Novo Genome Assembly from a Single Mosquito using PacBio Sequencing

10.1101/499954 ◽

2018 ◽

Author(s):

Sarah B. Kingan ◽

Haynes Heaton ◽

Juliana Cudini ◽

Christine C. Lambert ◽

Primo Baybayan ◽

...

Keyword(s):

Single Molecule ◽

Genome Assembly ◽

Population Genomics ◽

De Novo ◽

Anopheles Coluzzii ◽

High Quality ◽

De Novo Genome Assembly ◽

Core Technology ◽

Conserved Genes ◽

Diploid Individual

AbstractA high-quality reference genome is a fundamental resource for functional genetics, comparative genomics, and population genomics, and is increasingly important for conservation biology. PacBio Single Molecule, Real-Time (SMRT) sequencing generates long reads with uniform coverage and high consensus accuracy, making it a powerful technology for de novo genome assembly. Improvements in throughput and concomitant reductions in cost have made PacBio an attractive core technology for many large genome initiatives, however, relatively high DNA input requirements (∼5 µg for standard library protocol) have placed PacBio out of reach for many projects on small organisms that have lower DNA content, or on projects with limited input DNA for other reasons. Here we present a high-quality de novo genome assembly from a single Anopheles coluzzii mosquito. A modified SMRTbell library construction protocol without DNA shearing and size selection was used to generate a SMRTbell library from just 100 ng of starting genomic DNA. The sample was run on the Sequel System with chemistry 3.0 and software v6.0, generating, on average, 25 Gb of sequence per SMRT Cell with 20 hour movies, followed by diploid de novo genome assembly with FALCON-Unzip. The resulting curated assembly had high contiguity (contig N50 3.5 Mb) and completeness (more than 98% of conserved genes are present and full-length). In addition, this single-insect assembly now places 667 (>90%) of formerly unplaced genes into their appropriate chromosomal contexts in the AgamP4 PEST reference. We were also able to resolve maternal and paternal haplotypes for over 1/3 of the genome. By sequencing and assembling material from a single diploid individual, only two haplotypes are present, simplifying the assembly process compared to samples from multiple pooled individuals. The method presented here can be applied to samples with starting DNA amounts as low as 100 ng per 1 Gb genome size. This new low-input approach puts PacBio-based assemblies in reach for small highly heterozygous organisms that comprise much of the diversity of life.

Download Full-text

A scientific note on an anomalous diploid individual of Euglossa melanotricha (Apidae, Euglossini) with both female and male phenotypes

Apidologie ◽

10.1007/s13592-014-0339-5 ◽

2014 ◽

Vol 46 (4) ◽

pp. 495-498 ◽

Cited By ~ 3

Author(s):

Karen M. Suzuki ◽

Douglas C. Giangarelli ◽

Dhiego G. Ferreira ◽

Wilson Frantine-Silva ◽

Solange C. Augusto ◽

...

Keyword(s):

Scientific Note ◽

Diploid Individual

Download Full-text

A FAST AND ACCURATE ALGORITHM FOR DIPLOID INDIVIDUAL HAPLOTYPE RECONSTRUCTION

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720013500108 ◽

2013 ◽

Vol 11 (04) ◽

pp. 1350010 ◽

Cited By ~ 3

Author(s):

JINGLI WU ◽

BINBIN LIANG

Keyword(s):

High Efficiency ◽

Chromosome 1 ◽

International Hapmap Project ◽

Haplotype Reconstruction ◽

Significant Information ◽

Sequencing Technologies ◽

Research Fields ◽

Diploid Individual ◽

Reconstruction Rate ◽

Experimental Comparisons

Haplotypes can provide significant information in many research fields, including molecular biology and medical therapy. However, haplotyping is much more difficult than genotyping by using only biological techniques. With the development of sequencing technologies, it becomes possible to obtain haplotypes by combining sequence fragments. The haplotype reconstruction problem of diploid individual has received considerable attention in recent years. It assembles the two haplotypes for a chromosome given the collection of fragments coming from the two haplotypes. Fragment errors significantly increase the difficulty of the problem, and which has been shown to be NP-hard. In this paper, a fast and accurate algorithm, named FAHR, is proposed for haplotyping a single diploid individual. Algorithm FAHR reconstructs the SNP sites of a pair of haplotypes one after another. The SNP fragments that cover some SNP site are partitioned into two groups according to the alleles of the corresponding SNP site, and the SNP values of the pair of haplotypes are ascertained by using the fragments in the group that contains more SNP fragments. The experimental comparisons were conducted among the FAHR, the Fast Hare and the DGS algorithms by using the haplotypes on chromosome 1 of 60 individuals in CEPH samples, which were released by the International HapMap Project. Experimental results under different parameter settings indicate that the reconstruction rate of the FAHR algorithm is higher than those of the Fast Hare and the DGS algorithms, and the running time of the FAHR algorithm is shorter than those of the Fast Hare and the DGS algorithms. Moreover, the FAHR algorithm has high efficiency even for the reconstruction of long haplotypes and is very practical for realistic applications.

Download Full-text

diploid individualRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Estimating divergence times from DNA sequences

Estimating divergence times from DNA sequences

Testcrosses are an efficient strategy for identifying cis regulatory variation: Bayesian analysis of allele specific expression (BASE)

Inferring number of populations and changes in connectivity under the n-island model

Fully-sensitive seed finding in sequence graphs using a hybrid index

Fully-sensitive Seed Finding in Sequence Graphs Using a Hybrid Index

A High-Quality De novo Genome Assembly from a Single Mosquito Using PacBio Sequencing

A High-Quality De Novo Genome Assembly from a Single Mosquito using PacBio Sequencing

A scientific note on an anomalous diploid individual of Euglossa melanotricha (Apidae, Euglossini) with both female and male phenotypes

A FAST AND ACCURATE ALGORITHM FOR DIPLOID INDIVIDUAL HAPLOTYPE RECONSTRUCTION

diploid individual
Recently Published Documents