scholarly journals Evaluation of consensus strategies for haplotype phasing

Author(s):  
Ziad Al Bkhetan ◽  
Gursharan Chana ◽  
Kotagiri Ramamohanarao ◽  
Karin Verspoor ◽  
Benjamin Goudey

Abstract Haplotype phasing is a critical step for many genetic applications but incorrect estimates of phase can negatively impact downstream analyses. One proposed strategy to improve phasing accuracy is to combine multiple independent phasing estimates to overcome the limitations of any individual estimate. However, such a strategy is yet to be thoroughly explored. This study provides a comprehensive evaluation of consensus strategies for haplotype phasing. We explore the performance of different consensus paradigms, and the effect of specific constituent tools, across several datasets with different characteristics and their impact on the downstream task of genotype imputation. Based on the outputs of existing phasing tools, we explore two different strategies to construct haplotype consensus estimators: voting across outputs from multiple phasing tools and multiple outputs of a single non-deterministic tool. We find that the consensus approach from multiple tools reduces SE by an average of 10% compared to any constituent tool when applied to European populations and has the highest accuracy regardless of population ethnicity, sample size, variant density or variant frequency. Furthermore, the consensus estimator improves the accuracy of the downstream task of genotype imputation carried out by the widely used Minimac3, pbwt and BEAGLE5 tools. Our results provide guidance on how to produce the most accurate phasing estimates and the trade-offs that a consensus approach may have. Our implementation of consensus haplotype phasing, consHap, is available freely at https://github.com/ziadbkh/consHap. Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.

Author(s):  
Ziad Al Bkhetan ◽  
Gursharan Chana ◽  
Kotagiri Ramamohanarao ◽  
Karin Verspoor ◽  
Benjamin Goudey

AbstractMotivationHaplotype phasing is a critical step for many genetic applications but incorrect estimates of phase can negatively impact downstream analyses. One proposed strategy to improve phasing accuracy is to combine multiple independent phasing estimates to overcome the limitations of any individual estimate. As such a strategy is yet to be thoroughly explored, this study provides a comprehensive evaluation of consensus strategies for haplotype phasing, exploring their performance, along with their constituent tools, across a range of real and simulated datasets with different data characteristics and on the downstream task of genotype imputation.ResultsBased on the outputs of existing phasing tools, we explore two different strategies to construct haplotype consensus estimators: voting across outputs from multiple phasing tools and multiple outputs of a single non-deterministic tool. We find the consensus approach from multiple tools reduces switch error by an average of 10% compared to any constituent tool when applied to European populations and has the highest accuracy regardless of population ethnicity, sample size, SNP-density or SNP frequency. Furthermore, a consensus provides a small improvement indirectly the downstream task of genotype imputation regardless of which genotype imputation tools were used. Our results provide guidance on how to produce the most accurate phasing estimates and the tradeoffs that a consensus approach may have.AvailabilityOur implementation of consensus haplotype phasing, consHap, is available freely at https://github.com/ziadbkh/consHap.


2021 ◽  
Vol 14 (5) ◽  
pp. 785-798
Author(s):  
Daokun Hu ◽  
Zhiwen Chen ◽  
Jianbing Wu ◽  
Jianhua Sun ◽  
Hao Chen

Persistent memory (PM) is increasingly being leveraged to build hash-based indexing structures featuring cheap persistence, high performance, and instant recovery, especially with the recent release of Intel Optane DC Persistent Memory Modules. However, most of them are evaluated on DRAM-based emulators with unreal assumptions, or focus on the evaluation of specific metrics with important properties sidestepped. Thus, it is essential to understand how well the proposed hash indexes perform on real PM and how they differentiate from each other if a wider range of performance metrics are considered. To this end, this paper provides a comprehensive evaluation of persistent hash tables. In particular, we focus on the evaluation of six state-of-the-art hash tables including Level hashing, CCEH, Dash, PCLHT, Clevel, and SOFT, with real PM hardware. Our evaluation was conducted using a unified benchmarking framework and representative workloads. Besides characterizing common performance properties, we also explore how hardware configurations (such as PM bandwidth, CPU instructions, and NUMA) affect the performance of PM-based hash tables. With our in-depth analysis, we identify design trade-offs and good paradigms in prior arts, and suggest desirable optimizations and directions for the future development of PM-based hash tables.


2008 ◽  
Vol 125 (2) ◽  
pp. 163-171 ◽  
Author(s):  
Michael Nothnagel ◽  
David Ellinghaus ◽  
Stefan Schreiber ◽  
Michael Krawczak ◽  
Andre Franke

2020 ◽  
Vol 7 ◽  
Author(s):  
Sondre A. Engebraaten ◽  
Jonas Moen ◽  
Oleg A. Yakimenko ◽  
Kyrre Glette

Multi-function swarms are swarms that solve multiple tasks at once. For example, a quadcopter swarm could be tasked with exploring an area of interest while simultaneously functioning as ad-hoc relays. With this type of multi-function comes the challenge of handling potentially conflicting requirements simultaneously. Using the Quality-Diversity algorithm MAP-elites in combination with a suitable controller structure, a framework for automatic behavior generation in multi-function swarms is proposed. The framework is tested on a scenario with three simultaneous tasks: exploration, communication network creation and geolocation of Radio Frequency (RF) emitters. A repertoire is evolved, consisting of a wide range of controllers, or behavior primitives, with different characteristics and trade-offs in the different tasks. This repertoire enables the swarm to online transition between behaviors featuring different trade-offs of applications depending on the situational requirements. Furthermore, the effect of noise on the behavior characteristics in MAP-elites is investigated. A moderate number of re-evaluations is found to increase the robustness while keeping the computational requirements relatively low. A few selected controllers are examined, and the dynamics of transitioning between these controllers are explored. Finally, the study investigates the importance of individual sensor or controller inputs. This is done through ablation, where individual inputs are disabled and their impact on the performance of the swarm controllers is assessed and analyzed.


Policy Papers ◽  
2010 ◽  
Vol 19 (25) ◽  
Author(s):  

Reform package. Comprehensive reform of Fund governance—encompassing quotas, ministerial engagement and oversight, the size and composition of the Executive Board, voting rules, management selection, and staff diversity—is essential to enhancing the Fund’s long-term legitimacy and effectiveness. Although the elements of such a reform are being discussed sequentially, and some could be taken up sooner, most will need to be decided as a package, given the linkages and trade-offs.


2021 ◽  
Author(s):  
Quanshun Mei ◽  
Chuanke Fu ◽  
Jieling Li ◽  
Shuhong Zhao ◽  
Tao Xiang

AbstractSummaryGenetic analysis is a systematic and complex procedure in animal and plant breeding. With fast development of high-throughput genotyping techniques and algorithms, animal and plant breeding has entered into a genomic era. However, there is a lack of software, which can be used to process comprehensive genetic analyses, in the routine animal and plant breeding program. To make the whole genetic analysis in animal and plant breeding straightforward, we developed a powerful, robust and fast R package that includes genomic data format conversion, genomic data quality control and genotype imputation, breed composition analysis, pedigree tracing, analysis and visualization, pedigree-based and genomic-based relationship matrix construction, and genomic evaluation. In addition, to simplify the application of this package, we also developed a shiny toolkit for users.Availability and implementationblupADC is developed primarily in R with core functions written in C++. The development version is maintained at https://github.com/TXiang-lab/blupADC.Supplementary informationSupplementary data are available online


2020 ◽  
Author(s):  
Shabbeer Hassan ◽  
Ida Surakka ◽  
Marja-Riitta Taskinen ◽  
Veikko Salomaa ◽  
Aarno Palotie ◽  
...  

AbstractFounder population size, demographic changes (eg. population bottlenecks or rapid expansion) can lead to variation in recombination rates across different populations. Previous research has shown that using population-specific reference panels has a significant effect on downstream population genomic analysis like haplotype phasing, genotype imputation and association, especially in the context of population isolates. Here, we developed a high-resolution recombination rate mapping at 10kb and 50kb scale using high-coverage (20-30x) whole-genome sequenced 55 family trios from Finland and compared it to recombination rates of non-Finnish Europeans (NFE). We tested the downstream effects of the population-specific recombination rates in statistical phasing and genotype imputation in Finns as compared to the same analyses performed by using the NFE-based recombination rates. We found that Finnish recombination rates have a moderately high correlation (Spearman’s ρ =0.67-0.79) with non-Finnish Europeans, although on average (across all autosomal chromosomes), Finnish rates (2.268±0.4209 cM/Mb) are 12-14% lower than NFE (2.641±0.5032 cM/Mb). Finnish recombination map was found to have no significant effect in haplotype phasing accuracy (switch error rates ~ 2%) and average imputation concordance rates (97-98% for common, 92-96% for low frequency and 78-90% for rare variants). Our results suggest that downstream population genomic analyses like haplotype phasing and genotype imputation mostly depend on population-specific contexts like appropriate reference panels and their sample size, but not on population-specific recombination maps or effective population sizes. Currently, available HapMap recombination maps seem robust for population-specific phasing and imputation pipelines, even in the context of relatively isolated populations like Finland.


Author(s):  
Shabbeer Hassan ◽  
Ida Surakka ◽  
Marja-Riitta Taskinen ◽  
Veikko Salomaa ◽  
Aarno Palotie ◽  
...  

AbstractPrevious research has shown that using population-specific reference panels has a significant effect on downstream population genomic analyses like haplotype phasing, genotype imputation, and association, especially in the context of population isolates. Here, we developed a high-resolution recombination rate mapping at 10 and 50 kb scale using high-coverage (20–30×) whole-genome sequenced data of 55 family trios from Finland and compared it to recombination rates of non-Finnish Europeans (NFE). We tested the downstream effects of the population-specific recombination rates in statistical phasing and genotype imputation in Finns as compared to the same analyses performed by using the NFE-based recombination rates. We found that Finnish recombination rates have a moderately high correlation (Spearman’s ρ = 0.67–0.79) with NFE, although on average (across all autosomal chromosomes), Finnish rates (2.268 ± 0.4209 cM/Mb) are 12–14% lower than NFE (2.641 ± 0.5032 cM/Mb). Finnish recombination map was found to have no significant effect in haplotype phasing accuracy (switch error rates ~2%) and average imputation concordance rates (97–98% for common, 92–96% for low frequency and 78–90% for rare variants). Our results suggest that haplotype phasing and genotype imputation mostly depend on population-specific contexts like appropriate reference panels and their sample size, but not on population-specific recombination maps. Even though recombination rate estimates had some differences between the Finnish and NFE populations, haplotyping and imputation had not been noticeably affected by the recombination map used. Therefore, the currently available HapMap recombination maps seem robust for population-specific phasing and imputation pipelines, even in the context of relatively isolated populations like Finland.


2019 ◽  
Author(s):  
Zhenhua Yu ◽  
Fang Du ◽  
Xuehong Sun ◽  
Ao Li

Abstract Motivation Allele dropout (ADO) and unbalanced amplification of alleles are main technical issues of single-cell sequencing (SCS), and effectively emulating these issues is necessary for reliably benchmarking SCS-based bioinformatics tools. Unfortunately, currently available sequencing simulators are free of whole-genome amplification involved in SCS technique and therefore not suited for generating SCS datasets. We develop a new software package (SCSsim) that can efficiently simulate SCS datasets in a parallel fashion with minimal user intervention. SCSsim first constructs the genome sequence of single cell by mimicking a complement of genomic variations under user-controlled manner, and then amplifies the genome according to MALBAC technique and finally yields sequencing reads from the amplified products based on inferred sequencing profiles. Comprehensive evaluation in simulating different ADO rates, variation detection efficiency and genome coverage demonstrates that SCSsim is a very useful tool in mimicking single-cell sequencing data with high efficiency. Availability and implementation SCSsim is freely available at https://github.com/qasimyu/scssim. Supplementary information Supplementary data are available at Bioinformatics online.


BMJ Open ◽  
2019 ◽  
Vol 9 (1) ◽  
pp. e023938 ◽  
Author(s):  
Radboud J Duintjer Tebbens ◽  
Dominika A Kalkowska ◽  
Kimberly M Thompson

ObjectiveTo explore the extent to which undervaccinated subpopulations may influence the confidence about no circulation of wild poliovirus (WPV) after the last detected case.Design and participantsWe used a hypothetical model to examine the extent to which the existence of an undervaccinated subpopulation influences the confidence about no WPV circulation after the last detected case as a function of different characteristics of the subpopulation (eg, size, extent of isolation). We also used the hypothetical population model to inform the bounds on the maximum possible time required to reach high confidence about no circulation in a completely isolated and unvaccinated subpopulation starting either at the endemic equilibrium or with a single infection in an entirely susceptible population.ResultsIt may take over 3 years to reach 95% confidence about no circulation for this hypothetical population despite high surveillance sensitivity and high vaccination coverage in the surrounding general population if: (1) ability to detect cases in the undervaccinated subpopulation remains exceedingly small, (2) the undervaccinated subpopulation remains small and highly isolated from the general population and (3) the coverage in the undervaccinated subpopulation remains very close to the minimum needed to eradicate. Fully-isolated hypothetical populations of 4000 people or less cannot sustain endemic transmission for more than 5 years, with at least 20 000 people required for a 50% chance of at least 5 years of sustained transmission in a population without seasonality that starts at the endemic equilibrium. Notably, however, the population size required for persistent transmission increases significantly for realistic populations that include some vaccination and seasonality and/or that do not begin at the endemic equilibrium.ConclusionsSignificant trade-offs remain inherent in global polio certification decisions, which underscore the need for making and valuing investments to maximise population immunity and surveillance quality in all remaining possible WPV reservoirs.


Sign in / Sign up

Export Citation Format

Share Document