scholarly journals Hybrid peeling for fast and accurate calling, phasing, and imputation with sequence data of any coverage in pedigrees

2017 ◽  
Author(s):  
Andrew Whalen ◽  
Roger Ros-Freixedes ◽  
David L Wilson ◽  
Gregor Gorjanc ◽  
John M Hickey

AbstractIn this paper we extend multi-locus iterative peeling to be a computationally efficient method for calling, phasing, and imputing sequence data of any coverage in small or large pedigrees. Our method, called hybrid peeling, uses multi-locus iterative peeling to estimate shared chromosome segments between parents and their offspring, and then uses single-locus iterative peeling to aggregate genomic information across multiple generations. Using a synthetic dataset, we first analysed the performance of hybrid peeling for calling and phasing alleles in disconnected families, families which contained only a focal individual and its parents and grandparents. Second, we analysed the performance of hybrid peeling for calling and phasing alleles in the context of the full pedigree. Third, we analysed the performance of hybrid peeling for imputing whole genome sequence data to the remaining individuals in the population. We found that hybrid peeling substantially increase the number of genotypes that were called and phased by leveraging sequence information on related individuals. The calling rate and accuracy increased when the full pedigree was used compared to a reduced pedigree of just parents and grandparents. Finally, hybrid peeling accurately imputed whole genome sequence information to non-sequenced individuals. We believe that this algorithm will enable the generation of low cost and high accuracy whole genome sequence data in many pedigreed populations. We are making this algorithm available as a standalone program called AlphaPeel.


2021 ◽  
Vol 12 ◽  
Author(s):  
Hao Cheng ◽  
Keyu Xu ◽  
Jinghui Li ◽  
Kuruvilla Joseph Abraham

Low-cost genome-wide single-nucleotide polymorphisms (SNPs) are routinely used in animal breeding programs. Compared to SNP arrays, the use of whole-genome sequence data generated by the next-generation sequencing technologies (NGS) has great potential in livestock populations. However, sequencing a large number of animals to exploit the full potential of whole-genome sequence data is not feasible. Thus, novel strategies are required for the allocation of sequencing resources in genotyped livestock populations such that the entire population can be imputed, maximizing the efficiency of whole genome sequencing budgets. We present two applications of linear programming for the efficient allocation of sequencing resources. The first application is to identify the minimum number of animals for sequencing subject to the criterion that each haplotype in the population is contained in at least one of the animals selected for sequencing. The second application is the selection of animals whose haplotypes include the largest possible proportion of common haplotypes present in the population, assuming a limited sequencing budget. Both applications are available in an open source program LPChoose. In both applications, LPChoose has similar or better performance than some other methods suggesting that linear programming methods offer great potential for the efficient allocation of sequencing resources. The utility of these methods can be increased through the development of improved heuristics.



2020 ◽  
Author(s):  
Hao Cheng ◽  
Keyu Xu ◽  
Kuruvilla Joseph Abraham

AbstractBackgroundLow-cost genome-wide single-nucleotide polymorphisms (SNPs) are routinely used in animal breeding programs. Compared to SNP arrays, the use of whole-genome sequence data generated by the next-generation sequencing technologies (NGS) has great potential in livestock populations. However, a large number of animals are required to be sequenced to exploit the full potential of whole-genome sequence data. Thus, novel strategies are desired to allocate sequencing resources in genotyped livestock populations such that the entire population can be sequenced or imputed efficiently.MethodsWe present two applications of linear programming models called LPChoose for sequencing resources allocation. The first application is to identify the minimum number of animals for sequencing while meeting the criteria that each haplotype in the population is contained in at least one of the animals selected for sequencing. The second is to sequence a fixed number of animals whose haplotypes include as large a proportion as possible of the haplotypes present in the population given a limited sequencing budget. In both cases, we assume that all animals have been haplotyped. We present results from approximation algorithms, and motivate the use of approximations through the correspondence of the problems we address with problems in computer science for which there are no known efficient algorithms.ResultsIn both applications LPChoose performed consistently better than some existing methods making similar assumptions.



Author(s):  
Amnon Koren ◽  
Dashiell J Massey ◽  
Alexa N Bracci

Abstract Motivation Genomic DNA replicates according to a reproducible spatiotemporal program, with some loci replicating early in S phase while others replicate late. Despite being a central cellular process, DNA replication timing studies have been limited in scale due to technical challenges. Results We present TIGER (Timing Inferred from Genome Replication), a computational approach for extracting DNA replication timing information from whole genome sequence data obtained from proliferating cell samples. The presence of replicating cells in a biological specimen leads to non-uniform representation of genomic DNA that depends on the timing of replication of different genomic loci. Replication dynamics can hence be observed in genome sequence data by analyzing DNA copy number along chromosomes while accounting for other sources of sequence coverage variation. TIGER is applicable to any species with a contiguous genome assembly and rivals the quality of experimental measurements of DNA replication timing. It provides a straightforward approach for measuring replication timing and can readily be applied at scale. Availability and Implementation TIGER is available at https://github.com/TheKorenLab/TIGER. Supplementary information Supplementary data are available at Bioinformatics online



Data in Brief ◽  
2021 ◽  
pp. 107240
Author(s):  
Wael Ali Mohammed Hadi ◽  
Boby T Edwin ◽  
A Jayakumaran Nair


Data in Brief ◽  
2020 ◽  
Vol 33 ◽  
pp. 106416
Author(s):  
Asset Daniyarov ◽  
Askhat Molkenov ◽  
Saule Rakhimova ◽  
Ainur Akhmetova ◽  
Zhannur Nurkina ◽  
...  


2017 ◽  
Vol 7 (1) ◽  
Author(s):  
Lynsey K. Whitacre ◽  
Jesse L. Hoff ◽  
Robert D. Schnabel ◽  
Sara Albarella ◽  
Francesca Ciotola ◽  
...  


2021 ◽  
Vol 99 (Supplement_3) ◽  
pp. 25-25
Author(s):  
Muhammad Yasir Nawaz ◽  
Rodrigo Pelicioni Savegnago ◽  
Cedric Gondro

Abstract In this study, we detected genome wide footprints of selection in Hanwoo and Angus beef cattle using different allele frequency and haplotype-based methods based on imputed whole genome sequence data. Our dataset included 13,202 Angus and 10,437 Hanwoo animals with 10,057,633 and 13,241,550 imputed SNPs, respectively. A subset of data with 6,873,624 common SNPs between the two populations was used to estimate signatures of selection parameters, both within (runs of homozygosity and extended haplotype homozygosity) and between (allele fixation index, extended haplotype homozygosity) the breeds in order to infer evidence of selection. We observed that correlations between various measures of selection ranged between 0.01 to 0.42. Assuming these parameters were complementary to each other, we combined them into a composite selection signal to identify regions under selection in both beef breeds. The composite signal was based on the average of fractional ranks of individual selection measures for every SNP. We identified some selection signatures that were common between the breeds while others were independent. We also observed that more genomic regions were selected in Angus as compared to Hanwoo. Candidate genes within significant genomic regions may help explain mechanisms of adaptation, domestication history and loci for important traits in Angus and Hanwoo cattle. In the future, we will use the top SNPs under selection for genomic prediction of carcass traits in both breeds.



BMC Genomics ◽  
2018 ◽  
Vol 19 (1) ◽  
Author(s):  
Shuto Hayashi ◽  
Rui Yamaguchi ◽  
Shinichi Mizuno ◽  
Mitsuhiro Komura ◽  
Satoru Miyano ◽  
...  


Sign in / Sign up

Export Citation Format

Share Document