diploid individual
Recently Published Documents


TOTAL DOCUMENTS

15
(FIVE YEARS 7)

H-INDEX

4
(FIVE YEARS 2)

Genetics ◽  
2021 ◽  
Author(s):  
Per Sjödin ◽  
James McKenna ◽  
Mattias Jakobsson

Abstract The patterns of genetic variation within and among individuals and populations can be used to make inferences about the evolutionary forces that generated those patterns. Numerous population genetic approaches have been developed in order to infer evolutionary history. Here, we present the ‘Two-Two (TT)’ and the ‘Two-Two-outgroup (TTo)’ methods; two closely related approaches for estimating divergence time based in coalescent theory. They rely on sequence data from two haploid genomes (or a single diploid individual) from each of two populations. Under a simple population-divergence model, we derive the probabilities of the possible sample configurations. These probabilities form a set of equations that can be solved to obtain estimates of the model parameters, including population split-times, directly from the sequence data. This transparent and computationally efficient approach to infer population divergence time makes it possible to estimate time scaled in generations (assuming a mutation rate), and not as a compound parameter of genetic drift. Using simulations under a range of demographic scenarios, we show that the method is relatively robust to migration and that the TTo-method can alleviate biases that can appear from drastic ancestral population size changes. We illustrate the utility of the approaches with some examples, including estimating split times for pairs of human populations as well as providing further evidence for the complex relationship among Neandertals and Denisovans and their ancestors.


2020 ◽  
Author(s):  
Per Sjödin ◽  
James McKenna ◽  
Mattias Jakobsson

ABSTRACTThe patterns of genetic variation within and among individuals and populations can be used to make inferences about the evolutionary forces that generated those patterns. Numerous population genetic approaches have been developed in order to infer evolutionary history. Here, we present the ‘Two-Two (TT)’ and the ‘Two-Two-outgroup (TTo)’ methods; two closely related approaches for estimating divergence time based in coalescent theory. They rely on sequence data from two haploid genomes (or a single diploid individual) from each of two populations. Under a simple population-divergence model, we derive the probabilities of the possible sample configurations. These probabilities form a set of equations that can be solved to obtain estimates of the model parameters, including population split-times, directly from the sequence data. This transparent and computationally efficient approach to infer population divergence time makes it possible to estimate time scaled in generations (assuming a mutation rate), and not as a compound parameter of genetic drift. Using simulations under a range of demographic scenarios, we show that the method is relatively robust to migration and that the TTo-method can alleviate biases that can appear from drastic ancestral population size changes. We illustrate the utility of the approaches with some examples, including estimating split times for pairs of human populations as well as providing further evidence for the complex relationship among Neandertals and Denisovans and their ancestors.


2020 ◽  
Author(s):  
Brecca Miller ◽  
Alison Morse ◽  
Jacqueline E. Borgert ◽  
Zihao Liu ◽  
Kelsey Sinclair ◽  
...  

ABSTRACTAllelic imbalance (AI) occurs when alleles in a diploid individual are differentially expressed and indicates cis acting regulatory variation. What is the distribution of allelic effects in a natural population? Are all alleles the same? Are all alleles distinct? Tests of allelic effect are performed by crossing individuals and comparing expression between alleles directly in the F1. However, a crossing scheme that compares alleles pairwise is a prohibitive cost for more than a handful of alleles as the number of crosses is at least (n2-n)/2 where n is the number of alleles. We show here that a testcross design followed by a hypothesis test of AI between testcrosses can be used to infer differences between non-tester alleles, allowing n alleles to be compared with n crosses. Using a mouse dataset where both testcrosses and direct comparisons have been performed, we show that ∼75% of the predicted differences between non-tester alleles are validated in a background of ∼10% differences in AI. The testing for AI involves several complex bioinformatics steps. BASE is a complete bioinformatics pipeline that incorporates state-of-the-art error reduction techniques and a flexible Bayesian approach to estimating AI and formally comparing levels of AI between conditions. The modular structure of BASE has been packaged in Galaxy, made available in Nextflow and sbatch. (https://github.com/McIntyre-Lab/BASE_2020). In the mouse data, the direct test identifies more cis effects than the testcross. Cis-by-trans interactions with trans-acting factors on the X contributing to observed cis effects in autosomal genes in the direct cross remains a possible explanation for the discrepancy.


2020 ◽  
Author(s):  
Armando Arredondo ◽  
Beatriz Mourato ◽  
Khoa Nguyen ◽  
Simon Boitard ◽  
Willy Rodríguez ◽  
...  

AbstractInferring the demographic history of species is one of the greatest challenges in populations genetics. This history is often represented as a history of size changes, thus ignoring population structure. Alternatively, structure is defined a priori as a population tree and not inferred. Here we propose a framework based on the IICR (Inverse Instantaneous Coalescence Rate), which can be estimated using the PSMC method of Li and Durbin (2011) for a single diploid individual. For an isolated population, the IICR matches the population size history, which is how the PSMC outputs are generally interpreted. However, it is increasingly acknowledged that the IICR is a function of the demographic model and sampling scheme. Our automated method fits observed IICR curves of diploid individuals with IICR curves obtained under piecewise-stationary symmetrical island models, in which we assume a fixed number of time periods during which gene flow is constant. We infer the number of islands, their sizes, the periods at which connectivity changes and the corresponding rates of connectivity. Validation with simulated data showed that the method can accurately recover most of the scenario parameters. Our application to a set of five human PSMCs yielded demographic histories that are in agreement with previous studies using similar methods and with recent research suggesting ancient human structure. They are in contrast with the widely accepted view of human evolution consisting of one ancestral population branching into three large continental and panmictic populations with varying degrees of connectivity and no population structure within each continent.


2019 ◽  
Vol 35 (14) ◽  
pp. i81-i89 ◽  
Author(s):  
Ali Ghaffaari ◽  
Tobias Marschall

Abstract Motivation Sequence graphs are versatile data structures that are, for instance, able to represent the genetic variation found in a population and to facilitate genome assembly. Read mapping to sequence graphs constitutes an important step for many applications and is usually done by first finding exact seed matches, which are then extended by alignment. Existing methods for finding seed hits prune the graph in complex regions, leading to a loss of information especially in highly polymorphic regions of the genome. While such complex graph structures can indeed lead to a combinatorial explosion of possible alleles, the query set of reads from a diploid individual realizes only two alleles per locus—a property that is not exploited by extant methods. Results We present the Pan-genome Seed Index (PSI), a fully-sensitive hybrid method for seed finding, which takes full advantage of this property by combining an index over selected paths in the graph with an index over the query reads. This enables PSI to find all seeds while eliminating the need to prune the graph. We demonstrate its performance with different parameter settings on both simulated data and on a whole human genome graph constructed from variants in the 1000 Genome Project dataset. On this graph, PSI outperforms GCSA2 in terms of index size, query time and sensitivity. Availability and implementation The C++ implementation is publicly available at: https://github.com/cartoonist/psi.


2019 ◽  
Author(s):  
Ali Ghaffaari ◽  
Tobias Marschall

AbstractMotivationSequence graphs are versatile data structures that are, for instance, able to represent the genetic variation found in a population and to facilitate genome assembly. Read mapping to sequence graphs constitutes an important step for many applications and is usually done by first finding exact seed matches, which are then extended by alignment. Existing methods for finding seed hits prune the graph in complex regions, leading to a loss of information especially in highly polymorphic regions of the genome. While such complex graph structures can indeed lead to a combinatorial explosion of possible alleles, the query set of reads from a diploid individual realizes only two alleles per locus—a property that is not exploited by extant methods.ResultsWe present thePan-genomeSeedIndex (PSI), a fully-sensitive hybrid method for seed finding, which takes full advantage of this property by combining an index over selected paths in the graph with an index over the query reads. This enables PSI to find all seeds while eliminating the need to prune the graph. We demonstrate its performance with different parameter settings on both simulated data and on a whole human genome graph constructed from variants in the 1000 Genome Project data set. On this graph, PSI outperforms GCSA2 in terms of index size, query time, and sensitivity.AvailabilityThe C++ implementation is publicly available at:https://github.com/cartoonist/psi.


Genes ◽  
2019 ◽  
Vol 10 (1) ◽  
pp. 62 ◽  
Author(s):  
Sarah Kingan ◽  
Haynes Heaton ◽  
Juliana Cudini ◽  
Christine Lambert ◽  
Primo Baybayan ◽  
...  

A high-quality reference genome is a fundamental resource for functional genetics, comparative genomics, and population genomics, and is increasingly important for conservation biology. PacBio Single Molecule, Real-Time (SMRT) sequencing generates long reads with uniform coverage and high consensus accuracy, making it a powerful technology for de novo genome assembly. Improvements in throughput and concomitant reductions in cost have made PacBio an attractive core technology for many large genome initiatives, however, relatively high DNA input requirements (~5 µg for standard library protocol) have placed PacBio out of reach for many projects on small organisms that have lower DNA content, or on projects with limited input DNA for other reasons. Here we present a high-quality de novo genome assembly from a single Anopheles coluzzii mosquito. A modified SMRTbell library construction protocol without DNA shearing and size selection was used to generate a SMRTbell library from just 100 ng of starting genomic DNA. The sample was run on the Sequel System with chemistry 3.0 and software v6.0, generating, on average, 25 Gb of sequence per SMRT Cell with 20 h movies, followed by diploid de novo genome assembly with FALCON-Unzip. The resulting curated assembly had high contiguity (contig N50 3.5 Mb) and completeness (more than 98% of conserved genes were present and full-length). In addition, this single-insect assembly now places 667 (>90%) of formerly unplaced genes into their appropriate chromosomal contexts in the AgamP4 PEST reference. We were also able to resolve maternal and paternal haplotypes for over 1/3 of the genome. By sequencing and assembling material from a single diploid individual, only two haplotypes were present, simplifying the assembly process compared to samples from multiple pooled individuals. The method presented here can be applied to samples with starting DNA amounts as low as 100 ng per 1 Gb genome size. This new low-input approach puts PacBio-based assemblies in reach for small highly heterozygous organisms that comprise much of the diversity of life.


2018 ◽  
Author(s):  
Sarah B. Kingan ◽  
Haynes Heaton ◽  
Juliana Cudini ◽  
Christine C. Lambert ◽  
Primo Baybayan ◽  
...  

AbstractA high-quality reference genome is a fundamental resource for functional genetics, comparative genomics, and population genomics, and is increasingly important for conservation biology. PacBio Single Molecule, Real-Time (SMRT) sequencing generates long reads with uniform coverage and high consensus accuracy, making it a powerful technology for de novo genome assembly. Improvements in throughput and concomitant reductions in cost have made PacBio an attractive core technology for many large genome initiatives, however, relatively high DNA input requirements (∼5 µg for standard library protocol) have placed PacBio out of reach for many projects on small organisms that have lower DNA content, or on projects with limited input DNA for other reasons. Here we present a high-quality de novo genome assembly from a single Anopheles coluzzii mosquito. A modified SMRTbell library construction protocol without DNA shearing and size selection was used to generate a SMRTbell library from just 100 ng of starting genomic DNA. The sample was run on the Sequel System with chemistry 3.0 and software v6.0, generating, on average, 25 Gb of sequence per SMRT Cell with 20 hour movies, followed by diploid de novo genome assembly with FALCON-Unzip. The resulting curated assembly had high contiguity (contig N50 3.5 Mb) and completeness (more than 98% of conserved genes are present and full-length). In addition, this single-insect assembly now places 667 (>90%) of formerly unplaced genes into their appropriate chromosomal contexts in the AgamP4 PEST reference. We were also able to resolve maternal and paternal haplotypes for over 1/3 of the genome. By sequencing and assembling material from a single diploid individual, only two haplotypes are present, simplifying the assembly process compared to samples from multiple pooled individuals. The method presented here can be applied to samples with starting DNA amounts as low as 100 ng per 1 Gb genome size. This new low-input approach puts PacBio-based assemblies in reach for small highly heterozygous organisms that comprise much of the diversity of life.


Apidologie ◽  
2014 ◽  
Vol 46 (4) ◽  
pp. 495-498 ◽  
Author(s):  
Karen M. Suzuki ◽  
Douglas C. Giangarelli ◽  
Dhiego G. Ferreira ◽  
Wilson Frantine-Silva ◽  
Solange C. Augusto ◽  
...  

2013 ◽  
Vol 11 (04) ◽  
pp. 1350010 ◽  
Author(s):  
JINGLI WU ◽  
BINBIN LIANG

Haplotypes can provide significant information in many research fields, including molecular biology and medical therapy. However, haplotyping is much more difficult than genotyping by using only biological techniques. With the development of sequencing technologies, it becomes possible to obtain haplotypes by combining sequence fragments. The haplotype reconstruction problem of diploid individual has received considerable attention in recent years. It assembles the two haplotypes for a chromosome given the collection of fragments coming from the two haplotypes. Fragment errors significantly increase the difficulty of the problem, and which has been shown to be NP-hard. In this paper, a fast and accurate algorithm, named FAHR, is proposed for haplotyping a single diploid individual. Algorithm FAHR reconstructs the SNP sites of a pair of haplotypes one after another. The SNP fragments that cover some SNP site are partitioned into two groups according to the alleles of the corresponding SNP site, and the SNP values of the pair of haplotypes are ascertained by using the fragments in the group that contains more SNP fragments. The experimental comparisons were conducted among the FAHR, the Fast Hare and the DGS algorithms by using the haplotypes on chromosome 1 of 60 individuals in CEPH samples, which were released by the International HapMap Project. Experimental results under different parameter settings indicate that the reconstruction rate of the FAHR algorithm is higher than those of the Fast Hare and the DGS algorithms, and the running time of the FAHR algorithm is shorter than those of the Fast Hare and the DGS algorithms. Moreover, the FAHR algorithm has high efficiency even for the reconstruction of long haplotypes and is very practical for realistic applications.


Sign in / Sign up

Export Citation Format

Share Document