An enrichment method for mapping ambiguous reads to the reference genome for NGS analysis

Mapping short reads to a reference genome is an essential step in many next-generation sequencing (NGS) analyses. In plants with large genomes, a large fraction of the reads can align to multiple locations of the genome with equally good alignment scores. How to map these ambiguous reads to the genome is a challenging problem with big impacts on the downstream analysis. Traditionally, the default method is to assign an ambiguous read randomly to one of the many potential locations. In this study, we explore two alternative methods that are based on the hypothesis that the possibility of an ambiguous read being generated by a location is proportional to the total number of reads produced by that location: (1) the enrichment method that assigns an ambiguous read to the location that has produced the most reads among all the potential locations, (2) the probability method that assigns an ambiguous read to a location based on a probability proportional to the number of reads the location produces. We systematically compared the performance of the proposed methods with that of the default random method. Our results showed that the enrichment method produced better results than the default random method and the probability method in the discovery of single nucleotide polymorphisms (SNPs). Not only did it produce more SNP markers, but it also produced SNP markers with better quality, which was demonstrated using multiple mainstay genomic analyses, including genome-wide association studies (GWAS), minor allele distribution, population structure, and genomic prediction.

Download Full-text

An Enrichment Method For Mapping Ambiguous Reads To Reference Genome For NGS Analysis

10.29007/kw3c ◽

2019 ◽

Author(s):

Yuan Liu ◽

Yongchao Ma ◽

Evan Salsman ◽

Frank Manthey ◽

Elias Elias ◽

...

Keyword(s):

Reference Genome ◽

Association Studies ◽

Large Fraction ◽

Snp Markers ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Enrichment Method ◽

The Many ◽

Downstream Analysis ◽

Next Generation Sequencing Ngs

Mapping short reads to a reference genome is an essential step in many next- generation sequencing (NGS) analysis. In plants with large genomes, a large fraction of the reads can align to multiple locations of the genome with equally good alignment scores. How to map these ambiguous reads to the genome is a challenging problem with big impacts in the downstream analysis. Traditionally, the default method is to assign an ambiguous read randomly to one of the many potential locations. In this study, we explore an enrichment method that assigns an ambiguous read to the location that has produced the most reads among all the potential locations. Our results show that the enrichment method produced better results than the default random method in the discovery of single nucleotide polymorphisms (SNPs). Not only did it produce more SNP markers, but it also produced markers with better quality, which was demonstrated by higher trait-marker correlation in genome-wide association studies (GWAS).

Download Full-text

Genome wide association analyses to understand genetic basis of flowering and plant height under three levels of nitrogen application in Brassica juncea (L.) Czern & Coss

Scientific Reports ◽

10.1038/s41598-021-83689-w ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Javed Akhatar ◽

Anna Goyal ◽

Navneet Kaur ◽

Chhaya Atri ◽

Meenakshi Mittal ◽

...

Keyword(s):

Plant Height ◽

Indian Subcontinent ◽

Association Studies ◽

Snp Markers ◽

Genome Wide Association ◽

Strong Interactions ◽

N Availability ◽

Oilseed Crop ◽

Genome Wide Association Studies ◽

Genome Wide

AbstractTimely transition to flowering, maturity and plant height are important for agronomic adaptation and productivity of Indian mustard (B. juncea), which is a major edible oilseed crop of low input ecologies in Indian subcontinent. Breeding manipulation for these traits is difficult because of the involvement of multiple interacting genetic and environmental factors. Here, we report a genetic analysis of these traits using a population comprising 92 diverse genotypes of mustard. These genotypes were evaluated under deficient (N75), normal (N100) or excess (N125) conditions of nitrogen (N) application. Lower N availability induced early flowering and maturity in most genotypes, while high N conditions delayed both. A genotyping-by-sequencing approach helped to identify 406,888 SNP markers and undertake genome wide association studies (GWAS). 282 significant marker-trait associations (MTA's) were identified. We detected strong interactions between GWAS loci and nitrogen levels. Though some trait associated SNPs were detected repeatedly across fertility gradients, majority were identified under deficient or normal levels of N applications. Annotation of the genomic region (s) within ± 50 kb of the peak SNPs facilitated prediction of 30 candidate genes belonging to light perception, circadian, floral meristem identity, flowering regulation, gibberellic acid pathways and plant development. These included over one copy each of AGL24, AP1, FVE, FRI, GID1A and GNC. FLC and CO were predicted on chromosomes A02 and B08 respectively. CDF1, CO, FLC, AGL24, GNC and FAF2 appeared to influence the variation for plant height. Our findings may help in improving phenotypic plasticity of mustard across fertility gradients through marker-assisted breeding strategies.

Download Full-text

Application of Genetic Studies to Flow Cytometry Data and Its Impact on Therapeutic Intervention for Autoimmune Disease

Frontiers in Immunology ◽

10.3389/fimmu.2021.714461 ◽

2021 ◽

Vol 12 ◽

Author(s):

Valeria Orrù ◽

Maristella Steri ◽

Francesco Cucca ◽

Edoardo Fiorillo

Keyword(s):

Flow Cytometry ◽

Autoimmune Disease ◽

Autoimmune Diseases ◽

Immune Cell ◽

Disease Risk ◽

Association Studies ◽

Large Fraction ◽

Surface Protein ◽

Genome Wide Association Studies ◽

Human Immune System

In recent years, systematic genome-wide association studies of quantitative immune cell traits, represented by circulating levels of cell subtypes established by flow cytometry, have revealed numerous association signals, a large fraction of which overlap perfectly with genetic signals associated with autoimmune diseases. By identifying further overlaps with association signals influencing gene expression and cell surface protein levels, it has also been possible, in several cases, to identify causal genes and infer candidate proteins affecting immune cell traits linked to autoimmune disease risk. Overall, these results provide a more detailed picture of how genetic variation affects the human immune system and autoimmune disease risk. They also highlight druggable proteins in the pathogenesis of autoimmune diseases; predict the efficacy and side effects of existing therapies; provide new indications for use for some of them; and optimize the research and development of new, more effective and safer treatments for autoimmune diseases. Here we review the genetic-driven approach that couples systematic multi-parametric flow cytometry with high-resolution genetics and transcriptomics to identify endophenotypes of autoimmune diseases for the development of new therapies.

Download Full-text

Genome-wide association study reveals candidate genes for flowering time in cowpea (Vigna unguiculata [L.] Walp)

10.1101/2021.04.01.438123 ◽

2021 ◽

Author(s):

Dev Paudel ◽

Rocheteau Dareus ◽

Julia Rosenwald ◽

Maria Munoz-Amatriain ◽

Esteban Rios

Keyword(s):

Flowering Time ◽

Candidate Genes ◽

Vigna Unguiculata ◽

Association Studies ◽

Snp Markers ◽

Genome Wide Association ◽

Human Consumption ◽

Phenotypic Variance ◽

Genome Wide Association Studies ◽

Genome Wide

Cowpea (Vigna unguiculata [L.] Walp., diploid, 2n = 22) is a major crop used as a protein source for human consumption as well as a quality feed for livestock. It is drought and heat tolerant and has been bred to develop varieties that are resilient to changing climates. Plant adaptation to new climates and their yield are strongly affected by flowering time. Therefore, understanding the genetic basis of flowering time is critical to advance cowpea breeding. The aim of this study was to perform genome-wide association studies (GWAS) to identify marker trait associations for flowering time in cowpea using single nucleotide polymorphism (SNP) markers. A total of 367 accessions from a cowpea mini-core collection were evaluated in Ft. Collins, CO in 2019 and 2020, and 292 accessions were evaluated in Citra, FL in 2018. These accessions were genotyped using the Cowpea iSelect Consortium Array that contained 51,128 SNPs. GWAS revealed seven reliable SNPs for flowering time that explained 8-12% of the phenotypic variance. Candidate genes including FT, GI, CRY2, LSH3, UGT87A2, LIF2, and HTA9 that are associated with flowering time were identified for the significant SNP markers. Further efforts to validate these loci will help to understand their role in flowering time in cowpea, and it could facilitate the transfer of some of this knowledge to other closely related legume species.

Download Full-text

Identification of molecular markers for starch content in barley (Hordeum vulgare L.) by genome-wide association studies based on bulked samples

Plant Genetic Resources ◽

10.1017/s1479262120000143 ◽

2020 ◽

Vol 18 (3) ◽

pp. 111-119

Author(s):

Yinghu Zhang ◽

Haiye Luan ◽

Hui Zang ◽

Hongyan Yang ◽

Xiao Xu ◽

...

Keyword(s):

Molecular Markers ◽

Association Studies ◽

Starch Content ◽

Principal Component ◽

Mixed Linear Model ◽

High Linkage Disequilibrium ◽

Snp Markers ◽

Genome Wide Association Studies ◽

Hordeum Vulgare L ◽

Growing Seasons

AbstractStarch content is an important trait in barley. To evaluate the genetic diversity and identify molecular markers of starch content in barley, 40 cultivated barley genotypes collected from different regions, including genotypes whose starch content is at either the high or low end of the spectrum (15), were used in this study. All the genotypes were re-sequenced by the double-digest-restriction associated DNA sequencing method, and a total of 299,103 single-nucleotide polymorphism (SNP) markers were obtained. The genotypes were divided into four sub-populations based on FASTSTRUCTURE, principal component analysis and neighbour-joining tree analysis. All four sub-populations had a high linkage disequilibrium, especially group 3, whose members were recently bred for malting in the Jiangsu coastal area. The starch content of the barley lines was evaluated during three growing seasons (2014–2017), and the average values of starch content across the three growing seasons at the low and high ends were 51.5 and 55.0%, respectively. The starch content was affected by population structure, the barley in group 2 had a low starch content, while the barley in group 4 had a high starch content. Twenty-six SNP markers were identified as being significantly associated with starch content (P ⩽ 0.001) based on the average values across the three growing seasons using the mixed linear model method. These SNP markers were located on chromosomes 1H and 4H, and were considered loci of qSC1-1 and qSC4-1, respectively. The major identified QTLs for starch content are helpful for further research on carbohydrates and for barley breeding.

Download Full-text

Genome-wide association study in a Korean population identifies six novel susceptibility loci for rheumatoid arthritis

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2020-217663 ◽

2020 ◽

Vol 79 (11) ◽

pp. 1438-1445

Author(s):

Young-Chang Kwon ◽

Jiwoo Lim ◽

So-Young Bang ◽

Eunji Ha ◽

Mi Yeong Hwang ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Genome Wide Association Study ◽

Association Studies ◽

Large Fraction ◽

Genome Wide Association ◽

Chromatin Interaction ◽

Genome Wide Association Studies ◽

Bioinformatics Analyses ◽

Specific Expression ◽

Genome Wide

ObjectiveGenome-wide association studies (GWAS) in rheumatoid arthritis (RA) have discovered over 100 RA loci, explaining patient-relevant RA pathogenesis but showing a large fraction of missing heritability. As a continuous effort, we conducted GWAS in a large Korean RA case–control population.MethodsWe newly generated genome-wide variant data in two independent Korean cohorts comprising 4068 RA cases and 36 487 controls, followed by a whole-genome imputation and a meta-analysis of the disease association results in the two cohorts. By integrating publicly available omics data with the GWAS results, a series of bioinformatic analyses were conducted to prioritise the RA-risk genes in RA loci and to dissect biological mechanisms underlying disease associations.ResultsWe identified six new RA-risk loci (SLAMF6, CXCL13, SWAP70, NFKBIA, ZFP36L1 and LINC00158) with pmeta<5×10−8 and consistent disease effect sizes in the two cohorts. A total of 122 genes were prioritised from the 6 novel and 13 replicated RA loci based on physical distance, regulatory variants and chromatin interaction. Bioinformatics analyses highlighted potentially RA-relevant tissues (including immune tissues, lung and small intestine) with tissue-specific expression of RA-associated genes and suggested the immune-related gene sets (such as CD40 pathway, IL-21-mediated pathway and citrullination) and the risk-allele sharing with other diseases.ConclusionThis study identified six new RA-associated loci that contributed to better understanding of the genetic aetiology and biology in RA.

Download Full-text

Exome resequencing and GWAS for growth, ecophysiology, and chemical and metabolomic composition of wood of Populus trichocarpa

BMC Genomics ◽

10.1186/s12864-019-6160-9 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 1

Author(s):

Fernando P. Guerra ◽

Haktan Suren ◽

Jason Holliday ◽

James H. Richards ◽

Oliver Fiehn ◽

...

Keyword(s):

Biomass Production ◽

Complex Traits ◽

Association Studies ◽

Populus Trichocarpa ◽

Significant Snps ◽

Snp Markers ◽

Exome Capture ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Improvement Programs

Abstract Background Populus trichocarpa is an important forest tree species for the generation of lignocellulosic ethanol. Understanding the genomic basis of biomass production and chemical composition of wood is fundamental in supporting genetic improvement programs. Considerable variation has been observed in this species for complex traits related to growth, phenology, ecophysiology and wood chemistry. Those traits are influenced by both polygenic control and environmental effects, and their genome architecture and regulation are only partially understood. Genome wide association studies (GWAS) represent an approach to advance that aim using thousands of single nucleotide polymorphisms (SNPs). Genotyping using exome capture methodologies represent an efficient approach to identify specific functional regions of genomes underlying phenotypic variation. Results We identified 813 K SNPs, which were utilized for genotyping 461 P. trichocarpa clones, representing 101 provenances collected from Oregon and Washington, and established in California. A GWAS performed on 20 traits, considering single SNP-marker tests identified a variable number of significant SNPs (p-value < 6.1479E-8) in association with diameter, height, leaf carbon and nitrogen contents, and δ15N. The number of significant SNPs ranged from 2 to 220 per trait. Additionally, multiple-marker analyses by sliding-windows tests detected between 6 and 192 significant windows for the analyzed traits. The significant SNPs resided within genes that encode proteins belonging to different functional classes as such protein synthesis, energy/metabolism and DNA/RNA metabolism, among others. Conclusions SNP-markers within genes associated with traits of importance for biomass production were detected. They contribute to characterize the genomic architecture of P. trichocarpa biomass required to support the development and application of marker breeding technologies.

Download Full-text

Multi-SNP mediation intersection-union test

Bioinformatics ◽

10.1093/bioinformatics/btz285 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4724-4729 ◽

Cited By ~ 4

Author(s):

Wujuan Zhong ◽

Cassandra N Spracklen ◽

Karen L Mohlke ◽

Xiaojing Zheng ◽

Jason Fine ◽

...

Keyword(s):

Association Studies ◽

R Package ◽

Alternative Methods ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Mediation Effects ◽

Coding Regions ◽

Genome Wide ◽

Plasma Adiponectin Level ◽

Intersection Union Test

Abstract Summary Tens of thousands of reproducibly identified GWAS (Genome-Wide Association Studies) variants, with the vast majority falling in non-coding regions resulting in no eventual protein products, call urgently for mechanistic interpretations. Although numerous methods exist, there are few, if any methods, for simultaneously testing the mediation effects of multiple correlated SNPs via some mediator (e.g. the expression of a gene in the neighborhood) on phenotypic outcome. We propose multi-SNP mediation intersection-union test (SMUT) to fill in this methodological gap. Our extensive simulations demonstrate the validity of SMUT as well as substantial, up to 92%, power gains over alternative methods. In addition, SMUT confirmed known mediators in a real dataset of Finns for plasma adiponectin level, which were missed by many alternative methods. We believe SMUT will become a useful tool to generate mechanistic hypotheses underlying GWAS variants, facilitating functional follow-up. Availability and implementation The R package SMUT is publicly available from CRAN at https://CRAN.R-project.org/package=SMUT. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MR-LDP: a two-sample Mendelian randomization for GWAS summary statistics accounting for linkage disequilibrium and horizontal pleiotropy

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa028 ◽

2020 ◽

Vol 2 (2) ◽

Cited By ~ 2

Author(s):

Qing Cheng ◽

Yi Yang ◽

Xingjie Shi ◽

Kar-Fu Yeung ◽

Can Yang ◽

...

Keyword(s):

Risk Factors ◽

Linkage Disequilibrium ◽

Genetic Variants ◽

Mendelian Randomization ◽

Association Studies ◽

Alternative Methods ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Causal Relationships ◽

Disease Outcomes

Abstract The proliferation of genome-wide association studies (GWAS) has prompted the use of two-sample Mendelian randomization (MR) with genetic variants as instrumental variables (IVs) for drawing reliable causal relationships between health risk factors and disease outcomes. However, the unique features of GWAS demand that MR methods account for both linkage disequilibrium (LD) and ubiquitously existing horizontal pleiotropy among complex traits, which is the phenomenon wherein a variant affects the outcome through mechanisms other than exclusively through the exposure. Therefore, statistical methods that fail to consider LD and horizontal pleiotropy can lead to biased estimates and false-positive causal relationships. To overcome these limitations, we proposed a probabilistic model for MR analysis in identifying the causal effects between risk factors and disease outcomes using GWAS summary statistics in the presence of LD and to properly account for horizontal pleiotropy among genetic variants (MR-LDP) and develop a computationally efficient algorithm to make the causal inference. We then conducted comprehensive simulation studies to demonstrate the advantages of MR-LDP over the existing methods. Moreover, we used two real exposure–outcome pairs to validate the results from MR-LDP compared with alternative methods, showing that our method is more efficient in using all-instrumental variants in LD. By further applying MR-LDP to lipid traits and body mass index (BMI) as risk factors for complex diseases, we identified multiple pairs of significant causal relationships, including a protective effect of high-density lipoprotein cholesterol on peripheral vascular disease and a positive causal effect of BMI on hemorrhoids.

Download Full-text

Contributions of common genetic variants to risk of schizophrenia among individuals of African and Latino ancestry

Molecular Psychiatry ◽

10.1038/s41380-019-0517-y ◽

2019 ◽

Vol 25 (10) ◽

pp. 2455-2467 ◽

Cited By ~ 12

Author(s):

Tim B. Bigdeli ◽

◽

Giulio Genovese ◽

Penelope Georgakopoulos ◽

Jacquelyn L. Meyers ◽

...

Keyword(s):

Genetic Variants ◽

Association Studies ◽

Large Fraction ◽

African Ancestry ◽

Common Variant ◽

Human Populations ◽

Polygenic Risk Score ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Common Genetic Variants

Abstract Schizophrenia is a common, chronic and debilitating neuropsychiatric syndrome affecting tens of millions of individuals worldwide. While rare genetic variants play a role in the etiology of schizophrenia, most of the currently explained liability is within common variation, suggesting that variation predating the human diaspora out of Africa harbors a large fraction of the common variant attributable heritability. However, common variant association studies in schizophrenia have concentrated mainly on cohorts of European descent. We describe genome-wide association studies of 6152 cases and 3918 controls of admixed African ancestry, and of 1234 cases and 3090 controls of Latino ancestry, representing the largest such study in these populations to date. Combining results from the samples with African ancestry with summary statistics from the Psychiatric Genomics Consortium (PGC) study of schizophrenia yielded seven newly genome-wide significant loci, and we identified an additional eight loci by incorporating the results from samples with Latino ancestry. Leveraging population differences in patterns of linkage disequilibrium, we achieve improved fine-mapping resolution at 22 previously reported and 4 newly significant loci. Polygenic risk score profiling revealed improved prediction based on trans-ancestry meta-analysis results for admixed African (Nagelkerke’s R2 = 0.032; liability R2 = 0.017; P < 10−52), Latino (Nagelkerke’s R2 = 0.089; liability R2 = 0.021; P < 10−58), and European individuals (Nagelkerke’s R2 = 0.089; liability R2 = 0.037; P < 10−113), further highlighting the advantages of incorporating data from diverse human populations.

Download Full-text