scholarly journals 99 Breed authentication of five lines comparing Structure, Admixture, and an allele frequency method

2020 ◽  
Vol 98 (Supplement_3) ◽  
pp. 25-25
Author(s):  
Austin M Putz ◽  
Patrick Charagu ◽  
Abe Huisman

Abstract Two commonly used population structure software packages are freely available for breed authentication, Structure and Admixture. Structure uses a Bayesian approach to model population structure, while Admixture uses a frequentist approach. More recently, an allele frequency method has been updated to use quadratic programming to constrain the multiple linear regression coefficients of the regression of genotype count (divided by two) on the matrix of allele frequencies for each known breed or line. This constraint forced coefficients to sum to one and be greater than or equal to 0 and less than or equal to 1. The goal of this research was to compare and contrast these three methods to determine the breed/line authenticity for each of the five genetic lines. These five lines included Large White, Landrace, a lean Duroc, a meat quality Duroc, and a Pietrain line. Only animals with a 50K SNP panel were used in this analysis. Analyses were run five times for Structure and Admixture to check repeatability. The allele frequency method did not need to be repeated because it remains the same as long as the reference allele frequency matrix stays constant. For Structure, results of breed composition were inconsistent across replicates. Structure separated at least one of the maternal lines in three out of the five replicates with only 500 animals and kept the Duroc lines together as one population. Only 500 animals could be utilized in each run of Structure due to computational restraints. Admixture was very consistent across runs for each animal, but also failed to separate the two Duroc lines, instead splitting one of the two maternal lines. Finally, the allele frequency method split all five lines correctly and was 100% reproducible as long as the reference allele frequency matrix stays the same across runs.

Forests ◽  
2019 ◽  
Vol 10 (8) ◽  
pp. 681 ◽  
Author(s):  
Huiquan Zheng ◽  
Dehuo Hu ◽  
Ruping Wei ◽  
Shu Yan ◽  
Runhui Wang

Knowledge on population diversity and structure is of fundamental importance for conifer breeding programs. In this study, we concentrated on the development and application of high-density single nucleotide polymorphism (SNP) markers through a high-throughput sequencing technique termed as specific-locus amplified fragment sequencing (SLAF-seq) for the economically important conifer tree species, Chinese fir (Cunninghamia lanceolata). Based on the SLAF-seq, we successfully established a high-density SNP panel consisting of 108,753 genomic SNPs from Chinese fir. This SNP panel facilitated us in gaining insight into the genetic base of the Chinese fir advance breeding population with 221 genotypes for its genetic variation, relationship and diversity, and population structure status. Overall, the present population appears to have considerable genetic variability. Most (94.15%) of the variability was attributed to the genetic differentiation of genotypes, very limited (5.85%) variation occurred on the population (sub-origin set) level. Correspondingly, low FST (0.0285–0.0990) values were seen for the sub-origin sets. When viewing the genetic structure of the population regardless of its sub-origin set feature, the present SNP data opened a new population picture where the advanced Chinese fir breeding population could be divided into four genetic sets, as evidenced by phylogenetic tree and population structure analysis results, albeit some difference in membership of the corresponding set (cluster vs. group). It also suggested that all the genetic sets were admixed clades revealing a complex relationship of the genotypes of this population. With a step wise pruning procedure, we captured a core collection (core 0.650) harboring 143 genotypes that maintains all the allele, diversity, and specific genetic structure of the whole population. This generalist core is valuable for the Chinese fir advanced breeding program and further genetic/genomic studies.


2015 ◽  
Vol 12 (1) ◽  
Author(s):  
José Díaz-García ◽  
Francisco Caro-Lopera

An explicit form for the perturbation effect on the matrix of regression coefficients on the optimal solution in multiresponse surface methodology is obtained in this paper. Then, the sensitivity analysis of the optimal solution is studied and the critical point characterisation of the convex program, associated with the optimum of a multiresponse surface, is also analysed. Finally, the asymptotic normality of the optimal solution is derived by the standard methods.


2014 ◽  
Author(s):  
Debora Yoshihara Caldeira Brandt ◽  
Vitor Rezende da Costa Aguiar ◽  
Bárbara Domingues Bitarello ◽  
Kelly Nunes ◽  
Jérôme Goudet ◽  
...  

Next Generation Sequencing (NGS) technologies have become the standard for data generation in studies of population genomics, as the 1000 Genomes Project (1000G). However, these techniques are known to be problematic when applied to highly polymorphic genomic regions, such as the Human Leukocyte Antigen (HLA) genes. Because accurate genotype calls and allele frequency estimations are crucial to population genomics analises, it is important to assess the reliability of NGS data. Here, we evaluate the reliability of genotype calls and allele frequency estimates of the SNPs reported by 1000G (phase I) at five HLA genes (HLA-A, -B, -C, -DRB1, -DQB1 ). We take advantage of the availability of HLA Sanger sequencing of 930 of the 1,092 1000G samples, and use this as a gold standard to benchmark the 1000G data. We document that 18.6% of SNP genotype calls in HLA genes are incorrect, and that allele frequencies are estimated with an error higher than ??0.1 at approximately 25% of the SNPs in HLA genes. We found a bias towards overestimation of reference allele frequency for the 1000G data, indicating mapping bias is an important cause of error in frequency estimation in this dataset. We provide a list of sites that have poor allele frequency estimates, and discuss the outcomes of including those sites in different kinds of analyses. Since the HLA region is the most polymorphic in the human genome, our results provide insights into the challenges of using of NGS data at other genomic regions of high diversity.


2019 ◽  
Author(s):  
Emil Jørsboe ◽  
Anders Albrechtsen

1AbstractIntroductionAssociation studies using genetic data from SNP-chip based imputation or low depth sequencing data provide a cost efficient design for large scale studies. However, these approaches provide genetic data with uncertainty of the observed genotypes. Here we explore association methods that can be applied to data where the genotype is not directly observed. We investigate how using different priors when estimating genotype probabilities affects the association results in different scenarios such as studies with population structure and varying depth sequencing data. We also suggest a method (ANGSD-asso) that is computational feasible for analysing large scale low depth sequencing data sets, such as can be generated by the non-invasive prenatal testing (NIPT) with low-pass sequencing.MethodsANGSD-asso’s EM model works by modelling the unobserved genotype as a latent variable in a generalised linear model framework. The software is implemented in C/C++ and can be run multi-threaded enabling the analysis of big data sets. ANGSD-asso is based on genotype probabilities, they can be estimated in various ways, such as using the sample allele frequency as a prior, using the individual allele frequencies as a prior or using haplotype frequencies from haplotype imputation. Using simulations of sequencing data we explore how genotype probability based method compares to using genetic dosages in large association studies with genotype uncertainty.Results & DiscussionOur simulations show that in a structured population using the individual allele frequency prior has better power than the sample allele frequency. If there is a correlation between genotype uncertainty and phenotype, then the individual allele frequency prior also helps control the false positive rate. In the absence of population structure the sample allele frequency prior and the individual allele frequency prior perform similarly. In scenarios with sequencing depth and phenotype correlation ANGSD-asso’s EM model has better statistical power and less bias compared to using dosages. Lastly when adding additional covariates to the linear model ANGSD-asso’s EM model has more statistical power and provides less biased effect sizes than other methods that accommodate genotype uncertainly, while also being much faster. This makes it possible to properly account for genotype uncertainty in large scale association studies.


2019 ◽  
Author(s):  
Alan Le Moan ◽  
Dorte Bekkevold ◽  
Jakob Hemmer-Hansen

AbstractChanging environmental conditions can lead to population diversification through differential selection on standing genetic variation. Structural variant (SV) polymorphisms provide examples of ancient alleles that in time become associated with novel environmental gradients. The European plaice (Pleuronectes platessa) is a marine flatfish showing large allele frequency differences at two putative SVs associated with environmental variation. In this study, we explored the contribution of these SVs to population structure across the North East Atlantic. We compared genome wide population structure using sets of RAD sequencing SNPs with the spatial structure of the SVs. We found that in contrast to the rest of the genome, the SVs were only weakly associated with an isolation-by-distance pattern. Indeed, both SVs showed important allele frequency differences associated with two different environmental gradients, with the same allele increasing both along the salinity gradient of the Baltic Sea, and the latitudinal gradient along the Norwegian coast. Nevertheless, both SVs were found to be polymorphic across most sampling sites, even in the Icelandic population inferred to originate from a different glacial refuge than the remaining populations from the European continental shelf. Phylogenetic analyses suggested that the SV alleles are much older than the age of the Baltic Sea itself. These results suggest that the SVs are older than the age of the environmental gradients with which they currently co-vary. Interestingly, both SVs shared similar phylogenetic and genetic diversity, suggesting that they have a common origin. Altogether, our results suggest that the plaice SVs were shaped by evolutionary processes occurring at two time-frames, firstly following their common origin and secondly related to their current association with more recent environmental gradients such as those found in the North Sea − Baltic Sea transition zone.


Genome ◽  
2020 ◽  
Vol 63 (2) ◽  
pp. 103-114 ◽  
Author(s):  
Amrita Mahabir ◽  
Lambert A. Motilal ◽  
David Gopaulchan ◽  
Saila Ramkissoon ◽  
Antoinette Sankar ◽  
...  

Single nucleotide polymorphisms (SNPs) are preferred markers for DNA fingerprinting and diversity studies in cacao (Theobroma cacao L.). Yet, a consensus SNP panel with a minimum number of SNPs for optimal identity analysis is unavailable for cacao. An initial set of 146 SNP panels of varying sizes were assembled based on heterozygosity, linkage disequilibrium (LD), linkage group (LG) distribution, major allele frequency, minor allele frequency (MiAF), polymorphism information content (PIC), and random distribution. These panels were assessed to determine their ability to distinguish among a training set of 155 accessions. The panels with the best separation ability were supplemented with additional SNPs to create 16 designer panels, which separated all 155 accessions. The 16 designer SNP panels were then assessed on a dataset of 1220 accessions coming from 10 ancestral groups. Increasing the number of SNPs generally yielded improved resolution of genetic identities with concomitant reduction of synonymous groups. The number and choice of SNPs were critical factors with LD, MiAF, and PIC being important selection attributes but an even LG distribution was unnecessary. A robust set of 96 SNPs is recommended as a minimal core SNP panel for cacao DNA fingerprinting to the international cacao community.


1981 ◽  
Vol 37 (3) ◽  
pp. 253-263 ◽  
Author(s):  
Sirkka-Liisa Varvio-Aho ◽  
Pekka Pamilo

SUMMARYSpatial and temporal differentiation in Gerris lacustris and G. odontogaster (Heteroptera, Gerridae) were studied in restricted areas, where one population of each species was subdivided into several subpopulations. The aim of the study was to relate genetic population parameters to ecological population structure studied by mark-recapture methods in the same pond systems. Marked differentiation between the subpopulations and significant enzyme allele frequency changes between subsequent generations were found at several loci. It is suggested that non-selective forces are a sufficient explanation for the observed differences.


2019 ◽  
Vol 50 (3) ◽  
pp. 242-249 ◽  
Author(s):  
H. Berihulay ◽  
Y. Li ◽  
X. Liu ◽  
G. Gebreselassie ◽  
R. Islam ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document