scholarly journals PIP-SNP: a pipeline for processing SNP data featured as linkage disequilibrium bin mapping, genotype imputing and marker synthesizing

2021 ◽  
Vol 3 (3) ◽  
Author(s):  
Wenchao Zhang ◽  
Yun Kang ◽  
Xinbin Dai ◽  
Shizhong Xu ◽  
Patrick X Zhao

Abstract Genome-wide association study data analyses often face two significant challenges: (i) high dimensionality of single-nucleotide polymorphism (SNP) genotypes and (ii) imputation of missing values. SNPs are not independent due to physical linkage and natural selection. The correlation of nearby SNPs is known as linkage disequilibrium (LD), which can be used for LD conceptual SNP bin mapping, missing genotype inferencing and SNP dimension reduction. We used a stochastic process to describe the SNP signals and proposed two types of autocorrelations to measure nearby SNPs’ information redundancy. Based on the calculated autocorrelation coefficients, we constructed LD bins. We adopted a k-nearest neighbors algorithm (kNN) to impute the missing genotypes. We proposed several novel methods to find the optimal synthetic marker to represent the SNP bin. We also proposed methods to evaluate the information loss or information conservation between using the original genome-wide markers and using dimension-reduced synthetic markers. Our performance assessments on the real-life SNP data from a rice recombinant inbred line (RIL) population and a rice HapMap project show that the new methods produce satisfactory results. We implemented these functional modules in C/C++ and streamlined them into a web-based pipeline named PIP-SNP (https://bioinfo.noble.org/PIP_SNP/) for processing SNP data.

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Shamseldeen Eltaher ◽  
P. Stephen Baenziger ◽  
Vikas Belamkar ◽  
Hamdy A. Emara ◽  
Ahmed A. Nower ◽  
...  

Abstract Background Improving grain yield in cereals especially in wheat is a main objective for plant breeders. One of the main constrains for improving this trait is the G × E interaction (GEI) which affects the performance of wheat genotypes in different environments. Selecting high yielding genotypes that can be used for a target set of environments is needed. Phenotypic selection can be misleading due to the environmental conditions. Incorporating information from phenotypic and genomic analyses can be useful in selecting the higher yielding genotypes for a group of environments. Results A set of 270 F3:6 wheat genotypes in the Nebraska winter wheat breeding program was tested for grain yield in nine environments. High genetic variation for grain yield was found among the genotypes. G × E interaction was also highly significant. The highest yielding genotype differed in each environment. The correlation for grain yield among the nine environments was low (0 to 0.43). Genome-wide association study revealed 70 marker traits association (MTAs) associated with increased grain yield. The analysis of linkage disequilibrium revealed 16 genomic regions with a highly significant linkage disequilibrium (LD). The candidate parents’ genotypes for improving grain yield in a group of environments were selected based on three criteria; number of alleles associated with increased grain yield in each selected genotype, genetic distance among the selected genotypes, and number of different alleles between each two selected parents. Conclusion Although G × E interaction was present, the advances in DNA technology provided very useful tools and analyzes. Such features helped to genetically select the highest yielding genotypes that can be used to cross grain production in a group of environments.


2018 ◽  
Vol 14 (5) ◽  
pp. e1006105 ◽  
Author(s):  
Aaditya V. Rangan ◽  
Caroline C. McGrouther ◽  
John Kelsoe ◽  
Nicholas Schork ◽  
Eli Stahl ◽  
...  

2021 ◽  
Author(s):  
Alexandra Ficht ◽  
Robert W. Bruce ◽  
Davoud Torkamaneh ◽  
Christopher Grainger ◽  
Milad Eskandari ◽  
...  

Abstract Soybean (Glycine max (L.) Merr) is a crop of global importance for both human and animal consumption, which was domesticated in China more than 6000 years ago. A concern about losing genetic diversity as a result of decades of breeding has been expressed by soybean researchers. In order to develop new cultivars, it is critical for breeders to understand the genetic variability present for traits of interest in their program germplasm. Sucrose concentration is becoming an increasingly important trait for the production of soy-food products. The objective of this study was to use a genome-wide association study (GWAS) to identify putative QTL for sucrose concentration in soybean seed. A GWAS panel consisting of 266 historic and current soybean accessions was genotyped with 76k genotype-by-sequencing (GBS) SNP data and phenotyped in four field locations in Ontario (Canada) from 2015 to 2017. Seven putative QTL were identified on chromosomes 1, 6, 8, 9, 10, 13 and 14. A key gene related to sucrose synthase (Glyma.06g182700) was found to be associated with the QTL found on chromosome 6. This information will facilitate efforts to increase the available genetic variability for sucrose concentration in soybean breeding programs and develop new and improved high-sucrose soybean cultivars suitable for the soy-food industry.


2016 ◽  
Vol 47 (5) ◽  
pp. 971-980 ◽  
Author(s):  
S. H. Gage ◽  
H. J. Jones ◽  
S. Burgess ◽  
J. Bowden ◽  
G. Davey Smith ◽  
...  

BackgroundObservational associations between cannabis and schizophrenia are well documented, but ascertaining causation is more challenging. We used Mendelian randomization (MR), utilizing publicly available data as a method for ascertaining causation from observational data.MethodWe performed bi-directional two-sample MR using summary-level genome-wide data from the International Cannabis Consortium (ICC) and the Psychiatric Genomics Consortium (PGC2). Single nucleotide polymorphisms (SNPs) associated with cannabis initiation (p < 10−5) and schizophrenia (p < 5 × 10−8) were combined using an inverse-variance-weighted fixed-effects approach. We also used height and education genome-wide association study data, representing negative and positive control analyses.ResultsThere was some evidence consistent with a causal effect of cannabis initiation on risk of schizophrenia [odds ratio (OR) 1.04 per doubling odds of cannabis initiation, 95% confidence interval (CI) 1.01–1.07, p = 0.019]. There was strong evidence consistent with a causal effect of schizophrenia risk on likelihood of cannabis initiation (OR 1.10 per doubling of the odds of schizophrenia, 95% CI 1.05–1.14, p = 2.64 × 10−5). Findings were as predicted for the negative control (height: OR 1.00, 95% CI 0.99–1.01, p = 0.90) but weaker than predicted for the positive control (years in education: OR 0.99, 95% CI 0.97–1.00, p = 0.066) analyses.ConclusionsOur results provide some that cannabis initiation increases the risk of schizophrenia, although the size of the causal estimate is small. We find stronger evidence that schizophrenia risk predicts cannabis initiation, possibly as genetic instruments for schizophrenia are stronger than for cannabis initiation.


2011 ◽  
Vol 131 (1-3) ◽  
pp. 43-51 ◽  
Author(s):  
Jingchun Chen ◽  
Grace Lee ◽  
Ayman H. Fanous ◽  
Zhongming Zhao ◽  
Peilin Jia ◽  
...  

2018 ◽  
Vol 19 (1) ◽  
pp. 303-327 ◽  
Author(s):  
Stephen Burgess ◽  
Christopher N. Foley ◽  
Verena Zuber

An observational correlation between a suspected risk factor and an outcome does not necessarily imply that interventions on levels of the risk factor will have a causal impact on the outcome (correlation is not causation). If genetic variants associated with the risk factor are also associated with the outcome, then this increases the plausibility that the risk factor is a causal determinant of the outcome. However, if the genetic variants in the analysis do not have a specific biological link to the risk factor, then causal claims can be spurious. We review the Mendelian randomization paradigm for making causal inferences using genetic variants. We consider monogenic analysis, in which genetic variants are taken from a single gene region, and polygenic analysis, which includes variants from multiple regions. We focus on answering two questions: When can Mendelian randomization be used to make reliable causal inferences, and when can it be used to make relevant causal inferences?


Sign in / Sign up

Export Citation Format

Share Document