Accuracy of genomic prediction using mixed low-density marker panels

2020 ◽  
Vol 60 (8) ◽  
pp. 999
Author(s):  
Lianjie Hou ◽  
Wenshuai Liang ◽  
Guli Xu ◽  
Bo Huang ◽  
Xiquan Zhang ◽  
...  

Low-density single-nucleotide polymorphism (LD-SNP) panel is one effective way to reduce the cost of genomic selection in animal breeding. The present study proposes a new type of LD-SNP panel called mixed low-density (MLD) panel, which considers SNPs with a substantial effect estimated by Bayes method B (BayesB) from many traits and evenly spaced distribution simultaneously. Simulated and real data were used to compare the imputation accuracy and genomic-selection accuracy of two types of LD-SNP panels. The result of genotyping imputation for simulated data showed that the number of quantitative trait loci (QTL) had limited influence on the imputation accuracy only for MLD panels. Evenly spaced (ELD) panel was not affected by QTL. For real data, ELD performed slightly better than did MLD when panel contained 500 and 1000 SNP. However, this advantage vanished quickly as the density increased. The result of genomic selection for simulated data using BayesB showed that MLD performed much better than did ELD when QTL was 100. For real data, MLD also outperformed ELD in growth and carcass traits when using BayesB. In conclusion, the MLD strategy is superior to ELD in genomic selection under most situations.

2021 ◽  
Vol 12 ◽  
Author(s):  
Yang Yang ◽  
Hongli Tian ◽  
Rui Wang ◽  
Lu Wang ◽  
Hongmei Yi ◽  
...  

Molecular marker technology is used widely in plant variety discrimination, molecular breeding, and other fields. To lower the cost of testing and improve the efficiency of data analysis, molecular marker screening is very important. Screening usually involves two phases: the first to control loci quality and the second to reduce loci quantity. To reduce loci quantity, an appraisal index that is very sensitive to a specific scenario is necessary to select loci combinations. In this study, we focused on loci combination screening for plant variety discrimination. A loci combination appraisal index, variety discrimination power (VDP), is proposed, and three statistical methods, probability-based VDP (P-VDP), comparison-based VDP (C-VDP), and ratio-based VDP (R-VDP), are described and compared. The results using the simulated data showed that VDP was sensitive to statistical populations with convergence toward the same variety, and the total probability of discrimination power (TDP) method was effective only for partial populations. R-VDP was more sensitive to statistical populations with convergence toward various varieties than P-VDP and C-VDP, which both had the same sensitivity; TDP was not sensitive at all. With the real data, R-VDP values for sorghum, wheat, maize and rice data begin to show downward tendency when the number of loci is 20, 7, 100, 100 respectively, while in the case of P-VDP and C-VDP (which have the same results), the number is 6, 4, 9, 19 respectively and in the case of TDP, the number is 6, 4, 4, 11 respectively. For the variety threshold setting, R-VDP values of loci combinations with different numbers of loci responded evenly to different thresholds. C-VDP values responded unevenly to different thresholds, and the extent of the response increased as the number of loci decreased. All the methods gave underestimations when data were missing, with systematic errors for TDP, C-VDP, and R-VDP going from smallest to biggest. We concluded that VDP was a better loci combination appraisal index than TDP for plant variety discrimination and the three VDP methods have different applications. We developed the software called VDPtools, which can calculate the values of TDP, P-VDP, C-VDP, and R-VDP. VDPtools is publicly available athttps://github.com/caurwx1/VDPtools.git.


Author(s):  
Achmad Syahrul Choir ◽  
Nur Iriawan ◽  
Brodjol Sutijo Suprih Ulama ◽  
Mohammad Dokhi

MSNBurr and MSTBurr distribution have been developed as Neo-Normal distributions that represent a relaxation of normality. The difference between them is that the MSTBurr’s peak is below MSNBurr’s. In this paper, we propose a MSEPBurr distribution with its peak could be not only lower but also high-er than MSNBurr. Furthermore, we study several properties of MSEPBurr, such as mean, variance, skewness, kurtosis, and quantile. The MSEPBurr parameters are estimated by using the Bayesian approach with the BUGS language implementation for its computation. We employ simulation study and use existing data to illustrate the application of the regression model. In real data, we notice that MSEPBurr has similar performance with MSNBurr and MSTBurr that they outperform Normal and Student-t distribution in Australian athlete data because their skewness can accommodate long left tail excellently. However, their performance is less than the Student-t model in chemical reaction rate data because their skewness can not accommodate long right tail perfectly. Although in general their perfor-mance is the same,  we observe that the MSEPBurr performs better than the MSNBurr and the MSTBurr in some simulated data.


2015 ◽  
Author(s):  
Jonathan T. L. Kang ◽  
Peng Zhang ◽  
Sebastian Zöllner ◽  
Noah A. Rosenberg

Imputation of genotypes in a study sample can make use of sequenced or densely genotyped external reference panels consisting of individuals that are not from the study sample. It can also employ internal reference panels, incorporating a subset of individuals from the study sample itself. Internal panels offer an advantage over external panels, as they can reduce imputation errors arising from genetic dissimilarity between a population of interest and a second, distinct population from which the external reference panel has been constructed. As the cost of next-generation sequencing decreases, internal reference panel selection is becoming increasingly feasible. However, it is not clear how best to select individuals to include in such panels. We introduce a new method for selecting an internal reference panel???minimizing the average distance to the closest leaf (ADCL)???and compare its performance relative to an earlier algorithm: maximizing phylogenetic diversity (PD). Employing both simulated data and sequences from the 1000 Genomes Project, we show that ADCL provides a significant improvement in imputation accuracy, especially for imputation of sites with low-frequency alleles. This improvement in imputation accuracy is robust to changes in reference panel size, marker density, and length of the imputation target region.


Author(s):  
Xi Zeng ◽  
Linghao Zhao ◽  
Chenhang Shen ◽  
Yi Zhou ◽  
Guoliang Li ◽  
...  

Abstract Motivation Virus integration in the host genome is frequently reported to be closely associated with many human diseases, and the detection of virus integration is a critically challenging task. However, most existing tools show limited specificity and sensitivity. Therefore, the objective of this study is to develop a method for accurate detection of virus integration into host genomes. Results Herein, we report a novel method termed HIVID2 that is a significant upgrade of HIVID. HIVID2 performs a paired-end combination (PE-combination) for potentially integrated reads. The resulting sequences are then remapped onto the reference genomes, and both split and discordant chimeric reads are used to identify accurate integration breakpoints with high confidence. HIVID2 represents a great improvement in specificity and sensitivity, and predicts breakpoints closer to the real integrations, compared with existing methods. The advantage of our method was demonstrated using both simulated and real datasets. HIVID2 uncovered novel integration breakpoints in well-known cervical cancer-related genes, including FHIT and LRP1B, which was verified using protein expression data. In addition, HIVID2 allows the user to decide whether to automatically perform advanced analysis using the identified virus integrations. By analyzing the simulated data and real data tests, we demonstrated that HIVID2 is not only more accurate than HIVID but also better than other existing programs with respect to both sensitivity and specificity. We believe that HIVID2 will help in enhancing future research associated with virus integration. Availabilityand implementation HIVID2 can be accessed at https://github.com/zengxi-hada/HIVID2/. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 60 (15) ◽  
pp. 1769
Author(s):  
J. B. S. Ferraz ◽  
X. -L. Wu ◽  
H. Li ◽  
J. Xu ◽  
R. Ferretti ◽  
...  

Context Genomic selection has been of increasing interest in the genetic improvement of Zebu cattle, particularly for quantitative traits that are difficult or expensive to measure, such as carcass traits and meat tenderness. The success of genomic selection depends on several factors, and at its core is the availability of single-nucleotide polymorphism (SNP) chips that are appropriately designed for Bos indicus cattle. However, the currently available commercial bovine SNP chips are mostly designed for Bos taurus cattle. There are two commercial Bos indicus SNP chips; namely, GeneSeek genomic profiler high-density Bos indicus (GGP-HDi) SNP chip and a low-density (LD) Bos indicus SNP chip (Z chip), but these two Bos indicus SNP chips were built with mixed contents of SNPs for Bos indicus and Bos taurus cattle, due to limited availability of genotype data from Bos indicus cattle. Aims To develop a new GGP indicus 35000 SNP chip specifically for Bos indicus cattle, which has a low cost, but high accuracy of imputation to Illumina BovineHD chips. Methods The design of the chip consisted of 34000 optimally selected SNPs, plus 1000 SNPs pre-reserved for those on the Y chromosome, ‘causative’ mutations for a variety of economically relevant traits, genetic health conditions and International Society for Animal Genetics globally recognised parentage markers for those breeds of cattle. Key results The present results showed that this new indicus LD SNP chip had considerably increased minor allele frequencies in indicus breeds than the previous Z-chip. It demonstrated with high imputation accuracy to HD SNP genotypes in five indicus breeds, and with considerable predictability on 14 growth and reproduction traits in Nellore cattle. Conclusions This new indicus LD chip represented a successful effort to leverage existing knowledge and genotype resources towards the public release of a cost-effective LD SNP chip specifically for Bos indicus cattle, which is expected to replace the previous GGP indicus LD chip and to supplement the existing GGP-HDi 80000 SNP chip. Implications A new SNP chip specifically designed for Bos indicus, with high power of imputation to Illumina BovineHD technology and with excellent coverage of the whole genome, is now available on the market for Bos indicus cattle, and Bos indicus and Bos taurus crosses.


BMC Genetics ◽  
2013 ◽  
Vol 14 (1) ◽  
pp. 38 ◽  
Author(s):  
Jose L Gualdrón Duarte ◽  
Ronald O Bates ◽  
Catherine W Ernst ◽  
Nancy E Raney ◽  
Rodolfo JC Cantet ◽  
...  

Animals ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 1890
Author(s):  
Ling Xu ◽  
Qunhao Niu ◽  
Yan Chen ◽  
Zezhao Wang ◽  
Lei Xu ◽  
...  

Chinese Simmental beef cattle play a key role in the Chinese beef industry due to their great adaptability and marketability. To achieve efficient genetic gain at a low breeding cost, it is crucial to develop a customized cost-effective low-density SNP panel for this cattle population. Thirteen growth, carcass, and meat quality traits and a BovineHD Beadchip genotyping of 1346 individuals were used to select trait-associated variants and variants contributing to great genetic variance. In addition, highly informative SNPs with high MAF in each 500 kb sliding window and in each genic region were also included separately. A low-density SNP panel consisting of 30,684 SNPs was developed, with an imputation accuracy of 97.4% when imputed to the 770 K level. Among 13 traits, the average prediction accuracy levels evaluated by genomic best linear unbiased prediction (GBLUP) and BayesA/B/Cπ were 0.22–0.47 and 0.18–0.60 for the ~30 K array and BovineHD Beadchip, respectively. Generally, the predictive performance of the ~30 K array was trait-dependent, with reduced prediction accuracies for seven traits. While differences in terms of prediction accuracy were observed among the 13 traits, the low-density SNP panel achieved moderate to high accuracies for most of the traits and even improved the accuracies for some traits.


2021 ◽  
Vol 12 ◽  
Author(s):  
Li Xu ◽  
Yin Xu ◽  
Tong Xue ◽  
Xinyu Zhang ◽  
Jin Li

Motivation: The emergence of single-cell RNA sequencing (scRNA-seq) technology has paved the way for measuring RNA levels at single-cell resolution to study precise biological functions. However, the presence of a large number of missing values in its data will affect downstream analysis. This paper presents AdImpute: an imputation method based on semi-supervised autoencoders. The method uses another imputation method (DrImpute is used as an example) to fill the results as imputation weights of the autoencoder, and applies the cost function with imputation weights to learn the latent information in the data to achieve more accurate imputation.Results: As shown in clustering experiments with the simulated data sets and the real data sets, AdImpute is more accurate than other four publicly available scRNA-seq imputation methods, and minimally modifies the biologically silent genes. Overall, AdImpute is an accurate and robust imputation method.


2019 ◽  
Vol 58 (1) ◽  
pp. 1-12 ◽  
Author(s):  
D.P. Berry ◽  
N. McHugh ◽  
E. Wall ◽  
K. McDermott ◽  
A.C. O’Brien

Abstract The generally low usage of artificial insemination and single-sire mating in sheep, compounded by mob lambing (and lambing outdoors), implies that parentage assignment in sheep is challenging. The objective here was to develop a low-density panel of single nucleotide polymorphisms (SNPs) for accurate parentage verification and discovery in sheep. Of particular interest was where SNP selection was limited to only a subset of chromosomes, thereby eliminating the ability to accurately impute genome-wide denser marker panels. Data used consisted of 10,933 candidate SNPs on 9,390 purebred sheep. These data consisted of 1,876 validated genotyped sire–offspring pairs and 2,784 validated genotyped dam–offspring pairs. The SNP panels developed consisted of 87 SNPs to 500 SNPs. Parentage verification and discovery were undertaken using 1) exclusion, based on the sharing of at least one allele between candidate parent–offspring pairs, and 2) a likelihood-based approach. Based on exclusion, allowing for one discordant offspring–parent genotype, a minimum of 350 SNPs was required when the goal was to unambiguously identify the true sire or dam from all possible candidates. Results suggest that, if selecting SNPs across the entire genome, a minimum of 250 carefully selected SNPs are required to ensure that the most likely selected parent (based on the likelihood approach) was, in fact, the true parent. If restricting the SNPs to just a subset of chromosomes, the recommendation is to use at least a 300-SNP panel from at least six chromosomes, with approximately an equal number of SNPs per chromosome.


Sign in / Sign up

Export Citation Format

Share Document