scholarly journals Finding genetic variants in plants without complete genomes

2019 ◽  
Author(s):  
Yoav Voichek ◽  
Detlef Weigel

AbstractStructural variants and presence/absence polymorphisms are common in plant genomes, yet they are routinely overlooked in genome-wide association studies (GWAS). Here, we expand the genetic variants detected in GWAS to include major deletions, insertions, and rearrangements. We first use raw sequencing data directly to derive short sequences, k-mers, that mark a broad range of polymorphisms independently of a reference genome. We then link k-mers associated with phenotypes to specific genomic regions. Using this approach, we re-analyzed 2,000 traits measured in Arabidopsis thaliana, tomato, and maize populations. Associations identified with k-mers recapitulate those found with single-nucleotide polymorphisms (SNPs), however, with stronger statistical support. Moreover, we identified new associations with structural variants and with regions missing from reference genomes. Our results demonstrate the power of performing GWAS before linking sequence reads to specific genomic regions, which allow detection of a wider range of genetic variants responsible for phenotypic variation.

2019 ◽  
Author(s):  
Sarah J. C. Craig ◽  
Ana M. Kenney ◽  
Junli Lin ◽  
Ian M. Paul ◽  
Leann L. Birch ◽  
...  

AbstractObesity is highly heritable, yet only a small fraction of its heritability has been attributed to specific genetic variants. These variants are traditionally ascertained from genome-wide association studies (GWAS), which utilize samples with tens or hundreds of thousands of individuals for whom a single summary measurement (e.g., BMI) is collected. An alternative approach is to focus on a smaller, more deeply characterized sample in conjunction with advanced statistical models that leverage detailed phenotypes. Here we use novel functional data analysis (FDA) techniques to capitalize on longitudinal growth information and construct a polygenic risk score (PRS) for obesity in children followed from birth to three years of age. This score, comprised of 24 single nucleotide polymorphisms (SNPs), is significantly higher in children with (vs. without) rapid infant weight gain—a predictor of obesity later in life. Using two independent cohorts, we show that genetic variants identified in early childhood are also informative in older children and in adults, consistent with early childhood obesity being predictive of obesity later in life. In contrast, PRSs based on SNPs identified by adult obesity GWAS are not predictive of weight gain in our cohort of children. Our research provides an example of a successful application of FDA to GWAS. We demonstrate that a deep, statistically sophisticated characterization of a longitudinal phenotype can provide increased statistical power to studies with relatively small sample sizes. This study shows how FDA approaches can be used as an alternative to the traditional GWAS.Author SummaryFinding genetic variants that confer an increased risk of developing a particular disease has long been a focus of modern genetics. Genome wide association studies (GWAS) have catalogued single nucleotide polymorphisms (SNPs) associated with a variety of complex diseases in humans, including obesity, but by and large have done so using increasingly large samples-- tens or even hundreds of thousands of individuals, whose phenotypes are thus often only superficially characterized. This, in turn, may hide the intricacies of the genetic influence on disease. GWAS findings are also usually study-population dependent. We found that genetic risk scores based on SNPs from large adult obesity studies are not predictive of the propensity to gain weight in very young children. However, using a small cohort of a few hundred children deeply characterized with growth trajectories between birth and two years, and leveraging such trajectories through novel functional data analysis (FDA) techniques, we were able to produce a strong childhood obesity genetic risk score.


2020 ◽  
Vol 65 (No. 12) ◽  
pp. 445-453
Author(s):  
Anita Klímová ◽  
Eva Kašná ◽  
Karolína Machová ◽  
Michaela Brzáková ◽  
Josef Přibyl ◽  
...  

The inclusion of animal genotype data has contributed to the development of genomic selection. Animals are selected not only based on pedigree and phenotypic data but also on the basis of information about their genotypes. Genomic information helps to increase the accuracy of selection of young animals and thus enables a reduction of the generation interval. Obtaining information about genotypes in the form of SNPs (single nucleotide polymorphisms) has led to the development of new chips for genotyping. Several methods of genomic comparison have been developed as a result. One of the methods is data imputation, which allows the missing SNPs to be calculated using low-density chips to high-density chips. Through imputations, it is possible to combine information from diverse sets of chips and thus obtain more information about genotypes at a lower cost. Increasing the amount of data helps increase the reliability of predicting genomic breeding values. Imputation methods are increasingly used in genome-wide association studies. When classical genotyping and genome-wide sequencing data are combined, this option helps to increase the chances of identifying loci that are associated with economically significant traits.


Author(s):  
Maria K. Smatti ◽  
Yasser Al-Sarraj ◽  
Omar Albagha ◽  
Hadi M. Yassine

Background: Clinical outcomes of Coronavirus Disease 2019 (COVID-19), caused by the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) showed enormous inter-individual and interpopulation differences, possibly due to host genetics differences. Earlier studies identified single nucleotide polymorphisms (SNPs) associated with SARS-CoV-1 in Eastern Asian (EAS) populations. In this report, we aimed at exploring the frequency of a set of genetic polymorphisms that could affect SARS-CoV-2 susceptibility or severity, including those that were previously associated with SARS-CoV-1. Methods: We extracted the list of SNPs that could potentially modulate SARS-CoV-2 from the genome wide association studies (GWAS) on SARS-CoV-1 and other viruses. We also collected the expression data of these SNPs from the expression quantitative trait loci (eQTLs) databases. Sequences from Qatar Genome Programme (QGP, n=6,054) and 1000Genome project were used to calculate and compare allelic frequencies (AF). Results: A total of 74 SNPs, located in 10 genes: ICAM3, IFN-γ, CCL2, CCL5, AHSG, MBL, Furin, TMPRSS2, IL4, and CD209 promoter, were identified. Analysis of Qatari genomes revealed significantly lower AF of risk variants linked to SARS-CoV-1 severity (CCL2, MBL, CCL5, AHSG, and IL4) compared to that of 1000Genome and/or the EAS population (up to 25-fold change). Conversely, SNPs in TMPRSS2, IFN-γ, ICAM3, and Furin were more common among Qataris (average 2-fold change). Inter-population analysis showed that the distribution of risk alleles among Europeans differs substantially from Africans and EASs. Remarkably, Africans seem to carry extremely lower frequencies of SARS-CoV-1 susceptibility alleles, reaching to 32-fold decrease compared to other populations. Conclusion: Multiple genetic variants, which could potentially modulate SARS-CoV-2 infection, are significantly variable between populations, with the lowest frequency observed among Africans. Our results highlight the importance of exploring population genetics to understand and predict COVID-19 outcomes. Indeed, further studies are needed to validate these findings as well as to identify new genetic determinants linked to SARS-CoV-2.


2021 ◽  
Vol 12 ◽  
Author(s):  
Hye-Won Cho ◽  
Hyun-Seok Jin ◽  
Yong-Bin Eom

Most previous genome-wide association studies (GWAS) have identified genetic variants associated with anthropometric traits. However, most of the evidence were reported in European populations. Anthropometric traits such as height and body fat distribution are significantly affected by gender and genetic factors. Here we performed GWAS involving 64,193 Koreans to identify the genetic factors associated with anthropometric phenotypes including height, weight, body mass index, waist circumference, hip circumference, and waist-to-hip ratio. We found nine novel single-nucleotide polymorphisms (SNPs) and 59 independent genetic signals in genomic regions that were reported previously. Of the 19 SNPs reported previously, eight genetic variants at RP11-513I15.6 and one genetic variant at the RP11-977G19.10 region and six Asian-specific genetic variants were newly found. We compared our findings with those of previous studies in other populations. Five overlapping genetic regions (PAN2, ANKRD52, RNF41, HGMA1, and C6orf106) had been reported previously but none of the SNPs were independently identified in the current study. Seven of the nine newly found novel loci associated with height in women revealed a statistically significant skeletal expression of quantitative trait loci. Our study provides additional insight into the genetic effects of anthropometric phenotypes in East Asians.


Author(s):  
Tom Burr

The genetic basis for some human diseases, in which one or a few genome regions increase the probability of acquiring the disease, is fairly well understood. For example, the risk for cystic fibrosis is linked to particular genomic regions. Identifying the genetic basis of more common diseases such as diabetes has proven to be more difficult, because many genome regions apparently are involved, and genetic effects are thought to depend in unknown ways on other factors, called covariates, such as diet and other environmental factors (Goldstein and Cavalleri, 2005). Genome-wide association studies (GWAS) aim to discover the genetic basis for a given disease. The main goal in a GWAS is to identify genetic variants, single nucleotide polymorphisms (SNPs) in particular, that show association with the phenotype, such as “disease present” or “disease absent” either because they are causal, or more likely, because they are statistically correlated with an unobserved causal variant (Goldstein and Cavalleri, 2005). A GWAS can analyze “by DNA site” or “by multiple DNA sites. ” In either case, data mining tools (Tachmazidou, Verzilli, and De Lorio, 2007) are proving to be quite useful for understanding the genetic causes for common diseases.


2020 ◽  
Author(s):  
Abigail L Pfaff ◽  
Vivien J. Bubb ◽  
John P. Quinn ◽  
Sulev Koks

Abstract Background: The development of Parkinson’s disease (PD) involves a complex interaction of genetic and environmental factors. The majority of studies investigating the genetic component of complex diseases, including PD, have focused on single nucleotide polymorphisms as this enables genome wide analysis of a large number of samples. Genome wide association studies have been crucial in identifying PD risk variants, however a large proportion of the heritability of PD remains to be identified. To investigate the component of PD that may involve complex genetic variants we characterised SINE-VNTR-Alus (SVAs), a retrotransposon known to affect gene expression, in the Parkinson’s Progression Markers Initiative (PPMI) cohort.Results: Utilising whole genome sequencing from the PPMI cohort that consisted of 179 healthy controls, 371 individuals with PD and 58 individuals classified as SWEDD (scans without evidence of dopaminergic deficit) we genotyped SVAs in the reference genome for their presence or absence identifying 81 such SVAs. Seven of these SVAs were associated with progression of the disease, including four whose specific genotypes were linked to an increase in the gradient of dopaminergic loss when comparing the caudate to putamen from DaTscan imaging analysis. These seven SVAs also demonstrated regulatory properties as they were associated with differential gene expression in whole blood RNA sequencing data.Conclusion: This study highlights the importance of addressing variation of SVAs and potentially other types of retrotransposons in PD genetics, furthermore these SVA elements should be considered as regulatory domains that could play a role in disease progression.


2021 ◽  
Vol 12 ◽  
Author(s):  
Robert E. Weber ◽  
Stephan Fuchs ◽  
Franziska Layer ◽  
Anna Sommer ◽  
Jennifer K. Bender ◽  
...  

BackgroundAs next generation sequencing (NGS) technologies have experienced a rapid development over the last decade, the investigation of the bacterial genetic architecture reveals a high potential to dissect causal loci of antibiotic resistance phenotypes. Although genome-wide association studies (GWAS) have been successfully applied for investigating the basis of resistance traits, complex resistance phenotypes have been omitted so far. For S. aureus this especially refers to antibiotics of last resort like daptomycin and ceftaroline. Therefore, we aimed to perform GWAS for the identification of genetic variants associated with DAP and CPT resistance in clinical S. aureus isolates.Materials/methodsTo conduct microbial GWAS, we selected cases and controls according to their clonal background, date of isolation, and geographical origin. Association testing was performed with PLINK and SEER analysis. By using in silico analysis, we also searched for rare genetic variants in candidate loci that have previously been described to be involved in the development of corresponding resistance phenotypes.ResultsGWAS revealed MprF P314L and L826F to be significantly associated with DAP resistance. These mutations were found to be homogenously distributed among clonal lineages suggesting convergent evolution. Additionally, rare and yet undescribed single nucleotide polymorphisms could be identified within mprF and putative candidate genes. Finally, we could show that each DAP resistant isolate exhibited at least one amino acid substitution within the open reading frame of mprF. Due to the presence of strong population stratification, no genetic variants could be associated with CPT resistance. However, the investigation of the staphylococcal cassette chromosome mec (SCCmec) revealed various mecA SNPs to be putatively linked with CPT resistance. Additionally, some CPT resistant isolates revealed no mecA mutations, supporting the hypothesis that further and still unknown resistance determinants are crucial for the development of CPT resistance in S. aureus.ConclusionWe hereby confirmed the potential of GWAS to identify genetic variants that are associated with antibiotic resistance traits in S. aureus. However, precautions need to be taken to prevent the detection of spurious associations. In addition, the implementation of different approaches is still essential to detect multiple forms of variations and mutations that occur with a low frequency.


Author(s):  
Ebrahim Mahmoudi ◽  
Joshua R Atkins ◽  
Yann Quidé ◽  
William R Reay ◽  
Heath M Cairns ◽  
...  

Abstract Genome-wide association studies (GWAS) of schizophrenia have strongly implicated a risk locus in close proximity to the gene for miR-137. While there are candidate single-nucleotide polymorphisms (SNPs) with functional implications for the microRNA’s expression encompassed by the common haplotype tagged by rs1625579, there are likely to be others, such as the variable number tandem repeat (VNTR) variant rs58335419, that have no proxy on the SNP genotyping platforms used in GWAS to date. Using whole-genome sequencing data from schizophrenia patients (n = 299) and healthy controls (n = 131), we observed that the MIR137 4-repeats VNTR (VNTR4) variant was enriched in a cognitive deficit subtype of schizophrenia and associated with altered brain morphology, including thicker left inferior temporal gyrus and deeper right postcentral sulcus. These findings suggest that the MIR137 VNTR4 may impact neuroanatomical development that may, in turn, influence the expression of more severe cognitive symptoms in patients with schizophrenia.


2020 ◽  
Vol 22 (Supplement_C) ◽  
pp. C34-C45 ◽  
Author(s):  
Florian Thibord ◽  
Gaëlle Munsch ◽  
Claire Perret ◽  
Pierre Suchon ◽  
Maguelonne Roux ◽  
...  

Abstract MicroRNAs (miRNAs) are small regulatory RNAs participating to several biological processes and known to be involved in various pathologies. Measurable in body fluids, miRNAs have been proposed to serve as efficient biomarkers for diseases and/or associated traits. Here, we performed a next-generation-sequencing based profiling of plasma miRNAs in 344 patients with venous thrombosis (VT) and assessed the association of plasma miRNA levels with several haemostatic traits and the risk of VT recurrence. Among the most significant findings, we detected an association between hsa-miR-199b-3p and haematocrit levels (P = 0.0016), these two markers having both been independently reported to associate with VT risk. We also observed suggestive evidence for association of hsa-miR-370-3p (P = 0.019), hsa-miR-27b-3p (P = 0.016) and hsa-miR-222-3p (P = 0.049) with VT recurrence, the observations at the latter two miRNAs confirming the recent findings of Wang et al. Besides, by conducting Genome-Wide Association Studies on miRNA levels and meta-analyzing our results with some publicly available, we identified 21 new associations of single nucleotide polymorphisms with plasma miRNA levels at the statistical significance threshold of P < 5 × 10−8, some of these associations pertaining to thrombosis associated mechanisms. In conclusion, this study provides novel data about the impact of miRNAs’ variability in haemostasis and new arguments supporting the association of few miRNAs with the risk of recurrence in patients with venous thrombosis.


2017 ◽  
Author(s):  
Yue Li ◽  
Alvin Houze Shi ◽  
Ryan Tewhey ◽  
Pardis C. Sabeti ◽  
Jason Ernst ◽  
...  

Massively-parallel reporter assays (MPRA) enable unprecedented opportunities to test for regulatory activity of thousands of regulatory sequences. However, MPRA only assay a subset of the genome thus limiting their applicability for genome-wide functional annotations. To overcome this limitation, we have used existing MPRA datasets to train a machine learning model that uses DNA sequence information, regulatory motif annotations, evolutionary conservation, and epigenomic information to predict genomic regions that show enhancer activity when tested in MPRA assays. We used the resulting model to generate global predictions of regulatory activity at single-nucleotide resolution across 14 million common variants. We find that genetic variants with stronger predicted regulatory activity show significantly lower minor allele frequency, indicative of evolutionary selection within the human population. They also show higher over-lap with eQTL annotations across multiple tissues relative to the background SNPs, indicating that their perturbations in vivo more frequently result in changes in gene expression. In addition, they are more frequently associated with trait-associated SNPs from genome-wide association studies (GWAS), enabling us to prioritize genetic variants that are more likely to be causal based on their predicted regulatory activity. Lastly, we use our model to compare MPRA inferences across cell types and platforms and to prioritize the assays most predictive of MPRA assay results, including cell-dependent DNase hypersensitivity sites and transcription factors known to be active in the tested cell types. Our results indicate that high-throughput testing of thousands of putative regions, coupled with regulatory predictions across millions of sites, presents a powerful strategy for systematic annotation of genomic regions and genetic variants.


Sign in / Sign up

Export Citation Format

Share Document