scholarly journals Genomic Prediction Using LD-Based Haplotypes Inferred From High-Density Chip and Imputed Sequence Variants in Chinese Simmental Beef Cattle

2021 ◽  
Vol 12 ◽  
Author(s):  
Hongwei Li ◽  
Bo Zhu ◽  
Ling Xu ◽  
Zezhao Wang ◽  
Lei Xu ◽  
...  

A haplotype is defined as a combination of alleles at adjacent loci belonging to the same chromosome that can be transmitted as a unit. In this study, we used both the Illumina BovineHD chip (HD chip) and imputed whole-genome sequence (WGS) data to explore haploblocks and assess haplotype effects, and the haploblocks were defined based on the different LD thresholds. The accuracies of genomic prediction (GP) for dressing percentage (DP), meat percentage (MP), and rib eye roll weight (RERW) based on haplotype were investigated and compared for both data sets in Chinese Simmental beef cattle. The accuracies of GP using the entire imputed WGS data were lower than those using the HD chip data in all cases. For DP and MP, the accuracy of GP using haploblock approaches outperformed the individual single nucleotide polymorphism (SNP) approach (GBLUP_In_Block) at specific LD levels. Hotelling’s test confirmed that GP using LD-based haplotypes from WGS data can significantly increase the accuracies of GP for RERW, compared with the individual SNP approach (∼1.4 and 1.9% for GHBLUP and GHBLUP+GBLUP, respectively). We found that the accuracies using haploblock approach varied with different LD thresholds. The LD thresholds (r2 ≥ 0.5) were optimal for most scenarios. Our results suggested that LD-based haploblock approach can improve accuracy of genomic prediction for carcass traits using both HD chip and imputed WGS data under the optimal LD thresholds in Chinese Simmental beef cattle.

2017 ◽  
Author(s):  
Roberto Lozano ◽  
Dunia Pino del Carpio ◽  
Teddy Amuge ◽  
Ismail Siraj Kayondo ◽  
Alfred Ozimati Adebo ◽  
...  

AbstractBackgroundGenomic prediction models were, in principle, developed to include all the available marker information; with this approach, these models have shown in various crops moderate to high predictive accuracies. Previous studies in cassava have demonstrated that, even with relatively small training populations and low-density GBS markers, prediction models are feasible for genomic selection. In the present study, we prioritized SNPs in close proximity to genome regions with biological importance for a given trait. We used a number of strategies to select variants that were then included in single and multiple kernel GBLUP models. Specifically, our sources of information were transcriptomics, GWAS, and immunity-related genes, with the ultimate goal to increase predictive accuracies for Cassava Brown Streak Disease (CBSD) severity.ResultsWe used single and multi-kernel GBLUP models with markers imputed to whole genome sequence level to accommodate various sources of biological information; fitting more than one kinship matrix allowed for differential weighting of the individual marker relationships. We applied these GBLUP approaches to CBSD phenotypes (i.e., root infection and leaf severity three and six months after planting) in a Ugandan Breeding Population (n = 955). Three means of exploiting an established RNAseq experiment of CBSD-infected cassava plants were used. Compared to the biology-agnostic GBLUP model, the accuracy of the informed multi-kernel models increased the prediction accuracy only marginally (1.78% to 2.52%).ConclusionsOur results show that markers imputed to whole genome sequence level do not provide enhanced prediction accuracies compared to using standard GBS marker data in cassava. The use of transcriptomics data and other sources of biological information resulted in prediction accuracies that were nominally superior to those obtained from traditional prediction models.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Arthur W. Pightling ◽  
James B. Pettengill ◽  
Yu Wang ◽  
Hugh Rand ◽  
Errol Strain

AbstractAlthough it is assumed that contamination in bacterial whole-genome sequencing causes errors, the influences of contamination on clustering analyses, such as single-nucleotide polymorphism discovery, phylogenetics, and multi-locus sequencing typing, have not been quantified. By developing and analyzing 720 Listeria monocytogenes, Salmonella enterica, and Escherichia coli short-read datasets, we demonstrate that within-species contamination causes errors that confound clustering analyses, while between-species contamination generally does not. Contaminant reads mapping to references or becoming incorporated into chimeric sequences during assembly are the sources of those errors. Contamination sufficient to influence clustering analyses is present in public sequence databases.


2014 ◽  
Vol 92 (8) ◽  
pp. 3258-3269 ◽  
Author(s):  
M. Gunia ◽  
R. Saintilan ◽  
E. Venot ◽  
C. Hozé ◽  
M. N. Fouilloux ◽  
...  

2021 ◽  
Vol 53 (1) ◽  
Author(s):  
Theo Meuwissen ◽  
Irene van den Berg ◽  
Mike Goddard

Abstract Background Whole-genome sequence (WGS) data are increasingly available on large numbers of individuals in animal and plant breeding and in human genetics through second-generation resequencing technologies, 1000 genomes projects, and large-scale genotype imputation from lower marker densities. Here, we present a computationally fast implementation of a variable selection genomic prediction method, that could handle WGS data on more than 35,000 individuals, test its accuracy for across-breed predictions and assess its quantitative trait locus (QTL) mapping precision. Methods The Monte Carlo Markov chain (MCMC) variable selection model (Bayes GC) fits simultaneously a genomic best linear unbiased prediction (GBLUP) term, i.e. a polygenic effect whose correlations are described by a genomic relationship matrix (G), and a Bayes C term, i.e. a set of single nucleotide polymorphisms (SNPs) with large effects selected by the model. Computational speed is improved by a Metropolis–Hastings sampling that directs computations to the SNPs, which are, a priori, most likely to be included into the model. Speed is also improved by running many relatively short MCMC chains. Memory requirements are reduced by storing the genotype matrix in binary form. The model was tested on a WGS dataset containing Holstein, Jersey and Australian Red cattle. The data contained 4,809,520 genotypes on 35,549 individuals together with their milk, fat and protein yields, and fat and protein percentage traits. Results The prediction accuracies of the Jersey individuals improved by 1.5% when using across-breed GBLUP compared to within-breed predictions. Using WGS instead of 600 k SNP-chip data yielded on average a 3% accuracy improvement for Australian Red cows. QTL were fine-mapped by locating the SNP with the highest posterior probability of being included in the model. Various QTL known from the literature were rediscovered, and a new SNP affecting milk production was discovered on chromosome 20 at 34.501126 Mb. Due to the high mapping precision, it was clear that many of the discovered QTL were the same across the five dairy traits. Conclusions Across-breed Bayes GC genomic prediction improved prediction accuracies compared to GBLUP. The combination of across-breed WGS data and Bayesian genomic prediction proved remarkably effective for the fine-mapping of QTL.


Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 266
Author(s):  
Hossein Mehrban ◽  
Masoumeh Naserkheil ◽  
Deuk Hwan Lee ◽  
Chungil Cho ◽  
Taejeong Choi ◽  
...  

The weighted single-step genomic best linear unbiased prediction (GBLUP) method has been proposed to exploit information from genotyped and non-genotyped relatives, allowing the use of weights for single-nucleotide polymorphism in the construction of the genomic relationship matrix. The purpose of this study was to investigate the accuracy of genetic prediction using the following single-trait best linear unbiased prediction methods in Hanwoo beef cattle: pedigree-based (PBLUP), un-weighted (ssGBLUP), and weighted (WssGBLUP) single-step genomic methods. We also assessed the impact of alternative single and window weighting methods according to their effects on the traits of interest. The data was comprised of 15,796 phenotypic records for yearling weight (YW) and 5622 records for carcass traits (backfat thickness: BFT, carcass weight: CW, eye muscle area: EMA, and marbling score: MS). Also, the genotypic data included 6616 animals for YW and 5134 for carcass traits on the 43,950 single-nucleotide polymorphisms. The ssGBLUP showed significant improvement in genomic prediction accuracy for carcass traits (71%) and yearling weight (99%) compared to the pedigree-based method. The window weighting procedures performed better than single SNP weighting for CW (11%), EMA (11%), MS (3%), and YW (6%), whereas no gain in accuracy was observed for BFT. Besides, the improvement in accuracy between window WssGBLUP and the un-weighted method was low for BFT and MS, while for CW, EMA, and YW resulted in a gain of 22%, 15%, and 20%, respectively, which indicates the presence of relevant quantitative trait loci for these traits. These findings indicate that WssGBLUP is an appropriate method for traits with a large quantitative trait loci effect.


Genome ◽  
2005 ◽  
Vol 48 (1) ◽  
pp. 12-17 ◽  
Author(s):  
L D Chaves ◽  
J A Rowe ◽  
K M Reed

Genome characterization and analysis is an imperative step in identifying and selectively breeding for improved traits of agriculturally important species. Expressed sequence tags (ESTs) represent a transcribed portion of the genome and are an effective way to identify genes within a species. Downstream applications of EST projects include DNA microarray construction and interspecies comparisons. In this study, 694 ESTs were sequenced and analyzed from a library derived from a 24-day-old turkey embryo. The 437 unique sequences identified were divided into 76 assembled contigs and 361 singletons. The majority of significant comparative matches occurred between the turkey sequences and sequences reported from the chicken. Whole genome sequence from the chicken was used to identify potential exon–intron boundaries for selected turkey clones and intron-amplifying primers were developed for sequence analysis and single nucleotide polymorphism (SNP) discovery. Identified SNPs were genotyped for linkage analysis on two turkey reference populations. This study significantly increases the number of EST sequences available for the turkey.Key words: turkey, cDNA, expressed sequence tag, single nucleotide polymorphism.


F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 2003 ◽  
Author(s):  
Michael P. Heaton ◽  
Timothy P.L. Smith ◽  
Jacky K. Carnahan ◽  
Veronica Basnayake ◽  
Jiansheng Qiu ◽  
...  

The availability of whole genome sequence (WGS) data has made it possible to discover protein variantsin silico. However, existing bovine WGS databases do not show data in a form conducive to protein variant analysis, and tend to under represent the breadth of genetic diversity in global beef cattle. Thus, our first aim was to use 96 beef sires, sharing minimal pedigree relationships, to create a searchable and publicly viewable set of mapped genomes relevant for 19 popular breeds of U.S. cattle. Our second aim was to identify protein variants encoded by the bovine endothelial PAS domain-containing protein 1 gene (EPAS1), a gene associated with pulmonary hypertension in Angus cattle. The identity and quality of genomic sequences were verified by comparing WGS genotypes to those derived from other methods. The average read depth, genotype scoring rate, and genotype accuracy exceeded 14, 99%, and 99%, respectively. The 96 genomes were used to discover four amino acid variants encoded byEPAS1(E270Q, P362L, A671G, and L701F) and confirm two variants previously associated with disease (A606T and G610S). The sixEPAS1missense mutations were verified with matrix-assisted laser desorption/ionization time-of-flight mass spectrometry assays, and their frequencies were estimated in a separate collection of 1154 U.S. cattle representing 46 breeds. A rooted phylogenetic tree of eight polypeptide sequences provided a framework for evaluating the likely order of mutations and potential impact ofEPAS1alleles on the adaptive response to chronic hypoxia in U.S. cattle. This public, whole genome resource facilitatesin silicoidentification of protein variants in diverse types of U.S. beef cattle, and provides a means of translating WGS data into a practical biological and evolutionary context for generating and testing hypotheses.


2014 ◽  
Vol 80 (7) ◽  
pp. 2125-2132 ◽  
Author(s):  
Narjol Gonzalez-Escalona ◽  
Ruth Timme ◽  
Brian H. Raphael ◽  
Donald Zink ◽  
Shashi K. Sharma

ABSTRACTClostridium botulinumis a genetically diverse Gram-positive bacterium producing extremely potent neurotoxins (botulinum neurotoxins A through G [BoNT/A-G]). The complete genome sequences of three strains harboring only the BoNT/A1 nucleotide sequence are publicly available. Although these strains contain a toxin cluster (HA+OrfX−) associated with hemagglutinin genes, little is known about the genomes of subtype A1 strains (termed HA−OrfX+) that lack hemagglutinin genes in the toxin gene cluster. We sequenced the genomes of three BoNT/A1-producingC. botulinumstrains: two strains with the HA+OrfX−cluster (69A and 32A) and one strain with the HA−OrfX+cluster (CDC297). Whole-genome phylogenic single-nucleotide-polymorphism (SNP) analysis of these strains along with other publicly availableC. botulinumgroup I strains revealed five distinct lineages. Strains 69A and 32A clustered with theC. botulinumtype A1 Hall group, and strain CDC297 clustered with theC. botulinumtype Ba4 strain 657. This study reports the use of whole-genome SNP sequence analysis for discrimination ofC. botulinumgroup I strains and demonstrates the utility of this analysis in quickly differentiatingC. botulinumstrains harboring identical toxin gene subtypes. This analysis further supports previous work showing that strains CDC297 and 657 likely evolved from a common ancestor and independently acquired separate BoNT/A1 toxin gene clusters at distinct genomic locations.


Sign in / Sign up

Export Citation Format

Share Document