scholarly journals A method for allocating low-coverage sequencing resources by targeting haplotypes rather than individuals

2017 ◽  
Author(s):  
Roger Ros-Freixedes ◽  
Serap Gonen ◽  
Gregor Gorjanc ◽  
John M Hickey

AbstractBackgroundThis paper describes a heuristic method for allocating low-coverage sequencing resources by targeting haplotypes rather than individuals. Low-coverage sequencing assembles high-coverage sequence information for every individual by accumulating data from the genome segments that they share with many other individuals into consensus haplotypes. Deriving the consensus haplotypes accurately is critical for achieving a high phasing and imputation accuracy. In order to enable accurate phasing and imputation of sequence information for the whole population we allocate the available sequencing resources among individuals with existing phased genomic data by targeting the sequencing coverage of their haplotypes.ResultsOur method, called AlphaSeqOpt, prioritizes haplotypes using a score function that is based on the frequency of the haplotypes in the sequencing set relative to the target coverage. AlphaSeqOpt has two steps: (1) selection of an initial set of individuals by iteratively choosing the individuals that have the maximum score conditional to the current set, and (2) refinement of the set through several rounds of exchanges of individuals. AlphaSeqOpt is very effective for distributing a fixed amount of sequencing resources evenly across haplotypes, which results in a reduction of the proportion of haplotypes that are sequenced below the target coverage. AlphaSeqOpt can provide a greater proportion of haplotypes sequenced at the target coverage by sequencing less individuals, as compared with other methods that use a score function based on the haplotypes population frequency. A refinement of the initially selected set can provide a larger more diverse set with more unique individuals, which is beneficial in the context of low-coverage sequencing. We extend the method with an approach to filter rare haplotypes based on their flanking haplotypes, so that only those that are likely to derive from a recombination event are targeted.ConclusionsWe present a method for allocating sequencing resources so that a greater proportion of haplotypes are sequenced at a coverage that is sufficiently high for population-based imputation with low-coverage sequencing. The haplotype score function, the refinement step, and the new approach of filtering rare haplotypes make AlphaSeqOpt more effective for that purpose than methods reported previously for reducing sequencing redundancy.

2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Ruoyun Hui ◽  
Eugenia D’Atanasio ◽  
Lara M. Cassidy ◽  
Christiana L. Scheib ◽  
Toomas Kivisild

Abstract Although ancient DNA data have become increasingly more important in studies about past populations, it is often not feasible or practical to obtain high coverage genomes from poorly preserved samples. While methods of accurate genotype imputation from > 1 × coverage data have recently become a routine, a large proportion of ancient samples remain unusable for downstream analyses due to their low coverage. Here, we evaluate a two-step pipeline for the imputation of common variants in ancient genomes at 0.05–1 × coverage. We use the genotype likelihood input mode in Beagle and filter for confident genotypes as the input to impute missing genotypes. This procedure, when tested on ancient genomes, outperforms a single-step imputation from genotype likelihoods, suggesting that current genotype callers do not fully account for errors in ancient sequences and additional quality controls can be beneficial. We compared the effect of various genotype likelihood calling methods, post-calling, pre-imputation and post-imputation filters, different reference panels, as well as different imputation tools. In a Neolithic Hungarian genome, we obtain ~ 90% imputation accuracy for heterozygous common variants at coverage 0.05 × and > 97% accuracy at coverage 0.5 ×. We show that imputation can mitigate, though not eliminate reference bias in ultra-low coverage ancient genomes.


Animals ◽  
2021 ◽  
Vol 11 (3) ◽  
pp. 899
Author(s):  
Fotis Pappas ◽  
Christos Palaiokostas

Incorporation of genomic technologies into fish breeding programs is a modern reality, promising substantial advances regarding the accuracy of selection, monitoring the genetic diversity and pedigree record verification. Single nucleotide polymorphism (SNP) arrays are the most commonly used genomic tool, but the investments required make them unsustainable for emerging species, such as Arctic charr (Salvelinus alpinus), where production volume is low. The requirement to genotype a large number of animals for breeding practices necessitates cost effective genotyping approaches. In the current study, we used double digest restriction site-associated DNA (ddRAD) sequencing of either high or low coverage to genotype Arctic charr from the Swedish national breeding program and performed analytical procedures to assess their utility in a range of tasks. SNPs were identified and used for deciphering the genetic structure of the studied population, estimating genomic relationships and implementing an association study for growth-related traits. Missing information and underestimation of heterozygosity in the low coverage set were limiting factors in genetic diversity and genomic relationship analyses, where high coverage performed notably better. On the other hand, the high coverage dataset proved to be valuable when it comes to identifying loci that are associated with phenotypic traits of interest. In general, both genotyping strategies offer sustainable alternatives to hybridization-based genotyping platforms and show potential for applications in aquaculture selective breeding.


2021 ◽  
Author(s):  
Anastasia Malashina

Abstract We estimate the n-gram entropies of English- language texts, using dictionaries and taking into account punctuation, and find a heuristic method for estimating the marginal entropy. We propose a method for evaluating the coverage of empirically generated dictionaries and an ap- proach to address the disadvantage of low coverage. In ad- dition, we compare the probability of obtaining a meaning- ful text by directly iterating through all possible n-grams of the alphabet and conclude that this is only possible for very short text segments.


2021 ◽  
Author(s):  
Michael Schneider ◽  
Asis Shrestha ◽  
Agim Ballvora ◽  
Jens Leon

Abstract BackgroundThe identification of environmentally specific alleles and the observation of evolutional processes is a goal of conservation genomics. By generational changes of allele frequencies in populations, questions regarding effective population size, gene flow, drift, and selection can be addressed. The observation of such effects often is a trade-off of costs and resolution, when a decent sample of genotypes should be genotyped for many loci. Pool genotyping approaches can derive a high resolution and precision in allele frequency estimation, when high coverage sequencing is utilized. Still, pool high coverage pool sequencing of big genomes comes along with high costs.ResultsHere we present a reliable method to estimate a barley population’s allele frequency at low coverage sequencing. Three hundred genotypes were sampled from a barley backcross population to estimate the entire population’s allele frequency. The allele frequency estimation accuracy and yield were compared for three next generation sequencing methods. To reveal accurate allele frequency estimates on a low coverage sequencing level, a haplotyping approach was performed. Low coverage allele frequency of positional connected single polymorphisms were aggregated to a single haplotype allele frequency, resulting in two to 271 times higher depth and increased precision. We compared different haplotyping tactics, showing that gene and chip marker-based haplotypes perform on par or better than simple contig haplotype windows. The comparison of multiple pool samples and the referencing against an individual sequencing approach revealed whole genome pool resequencing having the highest correlation to individual genotyping (up to 0.97), while transcriptomics and genotyping by sequencing indicated higher error rates and lower correlations.ConclusionUsing the proposed method allows to identify the allele frequency of populations with high accuracy at low cost. This is particularly interesting for conservation genomics in species with big genomes, like barley or wheat. Whole genome low coverage resequencing at 10x coverage can deliver a highly accurate estimation of the allele frequency, when a loci-based haplotyping approach is applied. Using annotated haplotypes allows to capitalize from biological background and statistical robustness.


2016 ◽  
Vol 879 ◽  
pp. 2170-2174 ◽  
Author(s):  
Junko Yamashita ◽  
Norio Nunomura

Computational density functional theory (DFT) model of the adsorption of chlorine atoms onto the perfect Al (111) surface has been performed. The structural and electronic properties of chlorine atoms adsorbed on the surface are investigated within a supercell approach for chlorine coverages of 0.25, 0.33, 0.5 and 1 ML respectively. It is found that the adsorbates prefer on-top sites over bridge, hcp and fcc sites in low coverage while fcc sites in high coverage, and the binding energy decrease with increase of coverage due to the interactions of chlorine atoms. The discussion of geometrical and electronic analysis by plotting differential charge density distribution and projected density of states (PDOS) are presented.


2018 ◽  
Author(s):  
Susanne Tilk ◽  
Alan Bergland ◽  
Aaron Goodman ◽  
Paul Schmidt ◽  
Dmitri Petrov ◽  
...  

AbstractEvolve-and-resequence (E+R) experiments leverage next-generation sequencing technology to track the allele frequency dynamics of populations as they evolve. While previous work has shown that adaptive alleles can be detected by comparing frequency trajectories from many replicate populations, this power comes at the expense of high-coverage (>100x) sequencing of many pooled samples, which can be cost-prohibitive. Here, we show that accurate estimates of allele frequencies can be achieved with very shallow sequencing depths (<5x) via inference of known founder haplotypes in small genomic windows. This technique can be used to efficiently estimate frequencies for any number of bi-allelic SNPs in populations of any model organism founded with sequenced homozygous strains. Using both experimentally-pooled and simulated samples of Drosophila melanogaster, we show that haplotype inference can improve allele frequency accuracy by orders of magnitude for up to 50 generations of recombination, and is robust to moderate levels of missing data, as well as different selection regimes. Finally, we show that a simple linear model generated from these simulations can predict the accuracy of haplotype-derived allele frequencies in other model organisms and experimental designs. To make these results broadly accessible for use in E+R experiments, we introduce HAF-pipe, an open-source software tool for calculating haplotype-derived allele frequencies from raw sequencing data. Ultimately, by reducing sequencing costs without sacrificing accuracy, our method facilitates E+R designs with higher replication and resolution, and thereby, increased power to detect adaptive alleles.


2012 ◽  
Vol 3 ◽  
pp. 285-293 ◽  
Author(s):  
Laurent Nony ◽  
Franck Bocquet ◽  
Franck Para ◽  
Frédéric Chérioux ◽  
Eric Duverger ◽  
...  

We investigated the adsorption of 4-methoxy-4′-(3-sulfonatopropyl)stilbazolium (MSPS) on different ionic (001) crystal surfaces by means of noncontact atomic force microscopy. MSPS is a zwitterionic molecule with a strong electric dipole moment. When deposited onto the substrates at room temperature, MSPS diffuses to step edges and defect sites and forms disordered assemblies of molecules. Subsequent annealing induces two different processes: First, at high coverage, the molecules assemble into a well-organized quadratic lattice, which is perfectly aligned with the <110> directions of the substrate surface (i.e., rows of equal charges) and which produces a Moiré pattern due to coincidences with the substrate lattice constant. Second, at low coverage, we observe step edges decorated with MSPS molecules that run along the <110> direction. These polar steps most probably minimize the surface energy as they counterbalance the molecular dipole by presenting oppositely charged ions on the rearranged step edge.


2000 ◽  
Vol 11 (05) ◽  
pp. 1067-1076
Author(s):  
ŞAKIR ERKOÇ ◽  
ŞENAY KATIRCIOĞLU

We have investigated the decomposition of C 60 molecules with low and high coverages on Si(100)(2×1) surface at elevated temperatures. We also investigated the decomposition of an isolated C 60 molecule. We employed molecular-dynamics simulation using a model potential. It has been found that C 60 decomposes on Si(100) surface after 1000 K in the case of low coverage (0.11), however in high coverage case (0.67), C 60 molecules decompose after 900 K. On the other hand, isolated C 60 molecule decomposes after 7500 K, interestingly it shows a phase change from 3D to 2D at higher temperatures.


PLoS ONE ◽  
2011 ◽  
Vol 6 (9) ◽  
pp. e25244 ◽  
Author(s):  
Katharina Kranzer ◽  
Nienke van Schaik ◽  
Unice Karmue ◽  
Keren Middelkoop ◽  
Elaine Sebastian ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document