scholarly journals Investigating the impact of reference assembly choice on genomic analyses in a cattle breed

2021 ◽  
Author(s):  
Audald Lloret-Villas ◽  
Meenu Bhati ◽  
Naveen Kumar Kadri ◽  
Ruedi Fries ◽  
Hubert Pausch

AbstractBackgroundReference-guided read alignment and variant genotyping are prone to reference allele bias, particularly for samples that are greatly divergent from the reference genome. A Hereford-based assembly is the widely accepted bovine reference genome. Haplotype-resolved genomes that exceed the current bovine reference genome in quality and continuity have been assembled for different breeds of cattle. Using whole genome sequencing data of 161 Brown Swiss cattle, we compared the accuracy of read mapping and sequence variant genotyping as well as downstream genomic analyses between the bovine reference genome (ARS-UCD1.2) and a highly continuous Angus-based assembly (UOA_Angus_1).ResultsRead mapping accuracy did not differ notably between the ARS-UCD1.2 and UOA_Angus_1 assemblies. We discovered 22,744,517 and 22,559,675 high-quality variants from ARS-UCD1.2 and UOA_Angus_1, respectively. The concordance between sequence- and array-called genotypes was high and the number of variants deviating from Hardy-Weinberg proportions was low at segregating sites for both assemblies. More artefactual INDELs were genotyped from UOA_Angus_1 than ARS-UCD1.2 alignments. Using the composite likelihood ratio test, we detected 40 and 33 signatures of selection from ARS-UCD1.2 and UOA_Angus_1, respectively, but the overlap between both assemblies was low. Using the 161 sequenced Brown Swiss cattle as a reference panel, we imputed sequence variant genotypes into a mapping cohort of 30,499 cattle that had microarray-derived genotypes. The accuracy of imputation (Beagle R2) was very high (0.87) for both assemblies. Genome-wide association studies between imputed sequence variant genotypes and six dairy traits as well as stature produced almost identical results from both assemblies.ConclusionsThe ARS-UCD1.2 and UOA_Angus_1 assemblies are suitable for reference-guided genome analyses in Brown Swiss cattle. Although differences in read mapping and genotyping accuracy between both assemblies are negligible, the choice of the reference genome has a large impact on detecting signatures of selection using the composite likelihood ratio test. We developed a workflow that can be adapted and reused to compare the impact of reference genomes on genome analyses in various breeds, populations and species.

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Audald Lloret-Villas ◽  
Meenu Bhati ◽  
Naveen Kumar Kadri ◽  
Ruedi Fries ◽  
Hubert Pausch

Abstract Background Reference-guided read alignment and variant genotyping are prone to reference allele bias, particularly for samples that are greatly divergent from the reference genome. A Hereford-based assembly is the widely accepted bovine reference genome. Haplotype-resolved genomes that exceed the current bovine reference genome in quality and continuity have been assembled for different breeds of cattle. Using whole genome sequencing data of 161 Brown Swiss cattle, we compared the accuracy of read mapping and sequence variant genotyping as well as downstream genomic analyses between the bovine reference genome (ARS-UCD1.2) and a highly continuous Angus-based assembly (UOA_Angus_1). Results Read mapping accuracy did not differ notably between the ARS-UCD1.2 and UOA_Angus_1 assemblies. We discovered 22,744,517 and 22,559,675 high-quality variants from ARS-UCD1.2 and UOA_Angus_1, respectively. The concordance between sequence- and array-called genotypes was high and the number of variants deviating from Hardy-Weinberg proportions was low at segregating sites for both assemblies. More artefactual INDELs were genotyped from UOA_Angus_1 than ARS-UCD1.2 alignments. Using the composite likelihood ratio test, we detected 40 and 33 signatures of selection from ARS-UCD1.2 and UOA_Angus_1, respectively, but the overlap between both assemblies was low. Using the 161 sequenced Brown Swiss cattle as a reference panel, we imputed sequence variant genotypes into a mapping cohort of 30,499 cattle that had microarray-derived genotypes using a two-step imputation approach. The accuracy of imputation (Beagle R2) was very high (0.87) for both assemblies. Genome-wide association studies between imputed sequence variant genotypes and six dairy traits as well as stature produced almost identical results from both assemblies. Conclusions The ARS-UCD1.2 and UOA_Angus_1 assemblies are suitable for reference-guided genome analyses in Brown Swiss cattle. Although differences in read mapping and genotyping accuracy between both assemblies are negligible, the choice of the reference genome has a large impact on detecting signatures of selection that already reached fixation using the composite likelihood ratio test. We developed a workflow that can be adapted and reused to compare the impact of reference genomes on genome analyses in various breeds, populations and species.


Author(s):  
Adrien Oliva ◽  
Raymond Tobler ◽  
Alan Cooper ◽  
Bastien Llamas ◽  
Yassine Souilmi

Abstract The current standard practice for assembling individual genomes involves mapping millions of short DNA sequences (also known as DNA ‘reads’) against a pre-constructed reference genome. Mapping vast amounts of short reads in a timely manner is a computationally challenging task that inevitably produces artefacts, including biases against alleles not found in the reference genome. This reference bias and other mapping artefacts are expected to be exacerbated in ancient DNA (aDNA) studies, which rely on the analysis of low quantities of damaged and very short DNA fragments (~30–80 bp). Nevertheless, the current gold-standard mapping strategies for aDNA studies have effectively remained unchanged for nearly a decade, during which time new software has emerged. In this study, we used simulated aDNA reads from three different human populations to benchmark the performance of 30 distinct mapping strategies implemented across four different read mapping software—BWA-aln, BWA-mem, NovoAlign and Bowtie2—and quantified the impact of reference bias in downstream population genetic analyses. We show that specific NovoAlign, BWA-aln and BWA-mem parameterizations achieve high mapping precision with low levels of reference bias, particularly after filtering out reads with low mapping qualities. However, unbiased NovoAlign results required the use of an IUPAC reference genome. While relevant only to aDNA projects where reference population data are available, the benefit of using an IUPAC reference demonstrates the value of incorporating population genetic information into the aDNA mapping process, echoing recent results based on graph genome representations.


Author(s):  
Danang Crysnanto ◽  
Hubert Pausch

AbstractBackgroundThe current bovine genomic reference sequence was assembled from the DNA of a Hereford cow. The resulting linear assembly lacks diversity because it does not contain allelic variation. Lack of diversity is a drawback of linear references that causes reference allele bias. High nucleotide diversity and the separation of individuals by hundreds of breeds make cattle ideally suited to investigate the optimal composition of variation-aware references.ResultsWe augment the bovine linear reference sequence (ARS-UCD1.2) with variants filtered for allele frequency in dairy (Brown Swiss, Holstein) and dual-purpose (Fleckvieh, Original Braunvieh) cattle breeds to construct either breed-specific or pan-genome reference graphs using the vg toolkit. We find that read mapping is more accurate to variation-aware than linear references if pre-selected variants are used to construct the genome graphs. Graphs that contain random variants do not improve read mapping over the linear reference sequence. Breed-specific augmented and pan-genome graphs enable almost similar mapping accuracy improvements over the linear reference. We construct a whole-genome graph that contains the Hereford-based reference sequence and 14 million alleles that have alternate allele frequency greater than 0.03 in the Brown Swiss cattle breed. We show that our novel variation-aware reference facilitates accurate read mapping and unbiased sequence variant genotyping for SNPs and Indels.ConclusionsWe developed the first variation-aware reference graph for an agricultural animal: https://doi.org/10.5281/zenodo.3759712. Our novel reference structure improves sequence read mapping and variant genotyping over the linear reference. Our work is a first step towards the transition from linear to variation-aware reference structures in species with high genetic diversity and many sub-populations.


Author(s):  
Reuben M. Buckley ◽  
Brian W. Davis ◽  
Wesley A. Brashear ◽  
Fabiana H. G. Farias ◽  
Kei Kuroki ◽  
...  

AbstractThe domestic cat (Felis catus) numbers over 94 million in the USA alone, occupies households as a companion animal, and, like humans, suffers from cancer and common and rare diseases. However, genome-wide sequence variant information is limited for this species. To empower trait analyses, a new cat genome reference assembly was developed from PacBio long sequence reads that significantly improve sequence representation and assembly contiguity. The whole genome sequences of 54 domestic cats were aligned to the reference to identify single nucleotide variants (SNVs) and structural variants (SVs). Across all cats, 16 SNVs predicted to have deleterious impacts and in a singleton state were identified as high priority candidates for causative mutations. One candidate was a stop gain in the tumor suppressor FBXW7. The SNV is found in cats segregating for feline mediastinal lymphoma and is a candidate for inherited cancer susceptibility. SV analysis revealed a complex deletion coupled with a nearby potential duplication event that was shared privately across three unrelated dwarfism cats and is found within a known dwarfism associated region on cat chromosome B1. This SV interrupted UDP-glucose 6-dehydrogenase (UGDH), a gene involved in the biosynthesis of glycosaminoglycans. Importantly, UGDH has not yet been associated with human dwarfism and should be screened in undiagnosed patients. The new high-quality cat genome reference and the compilation of sequence variation demonstrate the importance of these resources when searching for disease causative alleles in the domestic cat and for identification of feline biomedical models.Author summaryThe practice of genomic medicine is predicated on the availability of a high quality reference genome and an understanding of the impact of genome variation. Such resources have lead to countless discoveries in humans, however by working exclusively within the framework of human genetics, our potential for understanding diseases biology is limited, as similar analyses in other species have often lead to novel insights. The generation of Felis_catus_9.0, a new high quality reference genome for the domestic cat, helps facilitate the expansion of genomic medicine into the felis lineage. Using Felis_catus_9.0 we analyze the landscape of genomic variation from a collection of 54 cats within the context of human gene constraint. The distribution of variant impacts in cats is correlated with patterns of gene constraint in humans, indicating the utility of this reference for identifying novel mutations that cause phenotypes relevant to human and cat health. Moreover, structural variant analysis revealed a novel variant for feline dwarfism in UGDH, a gene that has not been associated with dwarfism in any other species, suggesting a role for UGDH in cases of undiagnosed dwarfism in humans.


2015 ◽  
Vol 177 (6) ◽  
pp. 152.1-152 ◽  
Author(s):  
Th. Mock ◽  
E. Hehenberger ◽  
A. Steiner ◽  
J. Hüsler ◽  
G. Hirsbrunner

2018 ◽  
Vol 19 (1) ◽  
Author(s):  
Farzaneh Salari ◽  
Fatemeh Zare-Mirakabad ◽  
Mehdi Sadeghi ◽  
Hassan Rokni-Zadeh
Keyword(s):  

Blood ◽  
2019 ◽  
Vol 134 (Supplement_1) ◽  
pp. SCI-13-SCI-13
Author(s):  
Scott W. Lowe

p53 action and the consequences of p53 mutation in acute myeloid leukemia TP53 mutations are common in treatment associated myeloid neoplasia (tMN) and complex karyotype acute myeloid leukemia (CK-AML), where they are associated with chemoresistance and one of the worst prognoses of any leukemia genotype. To understand the impact of TP53 mutations on AML biology, we are performing arge scale genomic analyses of p53 mutant AML and have produced a series of animal models that appear to faithfully reflect molecular and biological features of the human disease. We have gone on to explore the biology of particular TP53 mutational configurations drive AML initiation and maintenance, and to identify and understanding the events that cooperate with p53 mutations during leukemogenesis. Disclosures Lowe: Blueprint Medicines: Consultancy, Equity Ownership; ORIC pharmaceuticals: Consultancy, Equity Ownership; Mirimus: Consultancy, Equity Ownership; Constellation Pharma: Consultancy, Equity Ownership; Petra Pharmaceuticals: Consultancy, Equity Ownership; PMV Pharmaceuticals: Consultancy, Equity Ownership; Faeth Therapeutics: Consultancy, Equity Ownership.


2020 ◽  
Vol 103 (9) ◽  
pp. 8541-8553
Author(s):  
A. Maggiolino ◽  
G.E. Dahl ◽  
N. Bartolomeo ◽  
U. Bernabucci ◽  
A. Vitali ◽  
...  

2020 ◽  
Author(s):  
Brendan N. Reid ◽  
Rachel L. Moran ◽  
Christopher J. Kopack ◽  
Sarah W. Fitzpatrick

AbstractResearchers studying non-model organisms have an increasing number of methods available for generating genomic data. However, the applicability of different methods across species, as well as the effect of reference genome choice on population genomic inference, are still difficult to predict in many cases. We evaluated the impact of data type (whole-genome vs. reduced representation) and reference genome choice on data quality and on population genomic and phylogenomic inference across several species of darters (subfamily Etheostomatinae), a highly diverse radiation of freshwater fish. We generated a high-quality reference genome and developed a hybrid RADseq/sequence capture (Rapture) protocol for the Arkansas darter (Etheostoma cragini). Rapture data from 1900 individuals spanning four darter species showed recovery of most loci across darter species at high depth and consistent estimates of heterozygosity regardless of reference genome choice. Loci with baits spanning both sides of the restriction enzyme cut site performed especially well across species. For low-coverage whole-genome data, choice of reference genome affected read depth and inferred heterozygosity. For similar amounts of sequence data, Rapture performed better at identifying fine-scale genetic structure compared to whole-genome sequencing. Rapture loci also recovered an accurate phylogeny for the study species and demonstrated high phylogenetic informativeness across the evolutionary history of the genus Etheostoma. Low cost and high cross-species effectiveness regardless of reference genome suggest that Rapture and similar sequence capture methods may be worthwhile choices for studies of diverse species radiations.


Author(s):  
Karol Konaszewski ◽  
Małgorzata Niesiobędzka

The purpose of the study is to determine the role of the sense of coherence and ego-resiliency as buffers for maladaptive coping among juveniles with different levels of delinquency. The study included 561 juveniles referred by a family court to youth education or probation centers throughout Poland. We used SEM to search for relations between variables and the critical ratio test for differences between groups. The results demonstrate that in both groups, the relationships between the components of the sense of coherence and the emotional style were negative. In both groups, the sense of comprehensibility was significantly associated with the search for social contacts. The impact of ego-resiliency on social-diversion coping was significantly stronger for the group with high compared with low demoralization. The study demonstrate that juveniles with a high degree of delinquency are more prone to emotion-oriented coping. Both groups of juveniles use two types of avoidance style to a similar extent. The results show that the stronger the sense of coherence, the less often juveniles cope with stress by reducing emotional tension and by escaping into substitute activities. Furthermore, our findings reveal the dark side of ego-resiliency.


Sign in / Sign up

Export Citation Format

Share Document