scholarly journals Describing the Local Structure of Sequence Graphs

2017 ◽  
Author(s):  
Yohei Rosen ◽  
Jordan Eizenga ◽  
Benedict Paten

AbstractAnalysis of genetic variation using graph structures is an emerging paradigm of genomics. However, defining genetic sites on sequence graphs remains an open problem. Paten’s invention of the ultra-bubble and snarl, special subgraphs of sequence graphs which can identified with efficient algorithms, represents important first step to segregating graphs into genetic sites. We extend the theory of ultrabubbles to a special subclass where every detail of the ultrabubble can be described in a series and parallel arrangement of genetic sites. We furthermore introduce the concept of bundle structures, which allows us to recognize the graph motifs created by additional combinations of variation in the graph, including but not limited to runs of abutting single nucleotide variants. We demonstrate linear-time identification of bundles in a bidirected graph. These two advances build on initial work on ultrabubbles in bidirected graphs, and define a more granular concept of genetic site.


2021 ◽  
Author(s):  
Morteza M Saber ◽  
Jannik Donner ◽  
Inès Levade ◽  
Nicole Acosta ◽  
Michael D Perkins ◽  
...  

Complex polymicrobial communities inhabit the lungs of individuals with cystic fibrosis (CF) and contribute to the decline in lung function. However, the severity of lung disease and its progression in CF patients are highly variable and imperfectly predicted by host clinical factors at baseline, CFTR mutations in the host genome, or sputum polymicrobial community variation. The opportunistic pathogen Pseudomonas aeruginosa (Pa) dominates airway infections in the majority of CF adults. Here we hypothesized that genetic variation within Pa populations would be predictive of lung disease severity. To quantify Pa genetic variation within whole CF sputum samples, we used deep amplicon sequencing on a newly developed custom Ion AmpliSeq panel of 209 Pa genes previously associated with the host pathoadaptation and pathogenesis of CF infection. We trained machine learning models using Pa single nucleotide variants (SNVs), clinical and microbiome diversity data to classify lung disease severity at the time of sputum sampling, and to predict future lung function decline over five years in a cohort of 54 adult CF patients with chronic Pa infection. The models using Pa SNVs alone classified baseline lung disease with good sensitivity and specificity, with an area under the receiver operating characteristic curve (AUROC) of 0.87. While the models were less predictive of future lung function decline, they still achieved an AUROC of 0.74. The addition of clinical data to the models, but not microbiome community data, yielded modest improvements (baseline lung function: AUROC=0.92; lung function decline: AUROC=0.79), highlighting the predictive value of the AmpliSeq data. Together, our work provides a proof-of-principle that Pa genetic variation in sputum is strongly associated with baseline lung disease, moderately predicts future lung function decline, and provides insight into the pathobiology of Pa's effect on CF.



Genes ◽  
2020 ◽  
Vol 11 (9) ◽  
pp. 1102 ◽  
Author(s):  
Satishkumar Ranganathan Ganakammal ◽  
Emil Alexov

Single-nucleotide variants (SNVs) are a major form of genetic variation in the human genome that contribute to various disorders. There are two types of SNVs, namely non-synonymous (missense) variants (nsSNVs) and synonymous variants (sSNVs), predominantly involved in RNA processing or gene regulation. sSNVs, unlike missense or nsSNVs, do not alter the amino acid sequences, thereby making challenging candidates for downstream functional studies. Numerous computational methods have been developed to evaluate the clinical impact of nsSNVs, but very few methods are available for understanding the effects of sSNVs. For this analysis, we have downloaded sSNVs from the ClinVar database with various features such as conservation, DNA-RNA, and splicing properties. We performed feature selection and implemented an ensemble random forest (RF) classification algorithm to build a classifier to predict the pathogenicity of the sSNVs. We demonstrate that the ensemble predictor with selected features (20 features) enhances the classification of sSNVs into two categories, pathogenic and benign, with high accuracy (87%), precision (79%), and recall (91%). Furthermore, we used this prediction model to reclassify sSNVs with unknown clinical significance. Finally, the method is very robust and can be used to predict the effect of other unknown sSNVs.



2018 ◽  
Author(s):  
Cristian Groza ◽  
Tony Kwan ◽  
Nicole Soranzo ◽  
Tomi Pastinen ◽  
Guillaume Bourque

AbstractBackgroundEpigenomic studies that use next generation sequencing experiments typically rely on the alignment of reads to a reference sequence. However, because of genetic diversity and the diploid nature of the human genome, we hypothesized that using a generic reference could lead to incorrectly mapped reads and bias downstream results.ResultsWe show that accounting for genetic variation using a modified reference genome (MPG) or a denovo assembled genome (DPG) can alter histone H3K4me1 and H3K27ac ChIP-seq peak calls by either creating new personal peaks or by the loss of reference peaks. MPGs are found to alter approximately 1% of peak calls while DPGs alter up to 5% of peaks. We also show statistically significant differences in the amount of reads observed in regions associated with the new, altered and unchanged peaks. We report that short insertions and deletions (indels), followed by single nucleotide variants (SNVs), have the highest probability of modifying peak calls. A counter-balancing factor is peak width, with wider calls being less likely to be altered. Next, because high-quality DPGs remain hard to obtain, we show that using a graph personalized genome (GPG), represents a reasonable compromise between MPGs and DPGs and alters about 2.5% of peak calls. Finally, we demonstrate that altered peaks have a genomic distribution typical of other peaks. For instance, for H3K4me1, 518 personal-only peaks were replicated using at least two of three approaches, 394 of which were inside or within 10Kb of a gene.ConclusionsAnalysing epigenomic datasets with personalized and graph genomes allows the recovery of new peaks enriched for indels and SNVs. These altered peaks are more likely to differ between individuals and, as such, could be relevant in the study of various human phenotypes.



2021 ◽  
Vol 27 (Supplement_1) ◽  
pp. S30-S30
Author(s):  
Allyson Hodgkins ◽  
R Alan Harris ◽  
Justin Qian ◽  
Richard Kellermayer

Abstract Background Non-caseating granulomas are a hallmark histopathological finding of Crohn’s disease (CD). Studies have suggested that the presence of granulomas may indicate a more aggressive CD phenotype associated with a complicated clinical course, including stricturing and/or penetrating disease, need for biologic therapy, and need for surgery. As such, identification of genetic associations of granulomatous CD (GCD) may help elucidate disease pathogenesis, which in turn may optimize treatments and guide novel therapeutics to combat CD complications. There is relatively sparse genetic information known about CD subtypes, especially GCD. The aim of this study was to determine the extent of genetic variation between pediatric CD patients with and without a pathognomonic sub-mucosal granuloma detected at the time of diagnosis. Methods Whole-exome next-generation sequencing (WES) was performed on peripheral blood derived DNA from patients with GCD and non-GCD (NGCD). PLINK analysis was used to identify single nucleotide polymorphisms (SNPs) that were overrepresented in comparisons between groups, and subgroup allele frequencies were also compared to publically available, large population-based genomic data in gnomAD. The potential deleteriousness of single nucleotide variants was determined by the CADD scoring tool. Results WES was completed for 17 patients with GCD and 19 with NGCD. There were no significant differences in baseline clinical characteristics, treatments, nor 1-year outcomes between the groups. Overlap analyses between PLINK- and gnomAD-generated SNPs revealed significant enrichment in those associated with HLA-DQA1, LILRA1, SAA2, PCDHB, HLA-B, NOD2, and IGH shared by GCD and NGCD groups. In the meantime, GCD-specific (top candidates linked with HLA-B and MUC4) and NGCD-specific (top candidates linked with HLA-B and ACOT9) SNPs were sparse. Conclusions We are the first to examine WES-based genetic variation between treatment-naïve pediatric CD patients purely separated by the presence of a sub-mucosal epithelioid granuloma. While there were very few SNPs consistently differentiating between GCD and NGCD patients in our cohort using two distinct methodologies, we were able to identify several novel SNPs that were significantly enriched in pediatric CD patients compared to the control human genome. Our findings will require subsequent confirmation in larger but similarly scrutinized cohorts of patients.



Author(s):  
Renata Parissi Buainain ◽  
Matheus Negri Boschiero ◽  
Bruno Camporeze ◽  
Paulo Henrique Pires de Aguiar ◽  
Fernando Augusto Lima Marson ◽  
...  


2021 ◽  
Vol 11 (1) ◽  
pp. 33
Author(s):  
Nayoung Han ◽  
Jung Mi Oh ◽  
In-Wha Kim

For predicting phenotypes and executing precision medicine, combination analysis of single nucleotide variants (SNVs) genotyping with copy number variations (CNVs) is required. The aim of this study was to discover SNVs or common copy CNVs and examine the combined frequencies of SNVs and CNVs in pharmacogenes using the Korean genome and epidemiology study (KoGES), a consortium project. The genotypes (N = 72,299) and CNV data (N = 1000) were provided by the Korean National Institute of Health, Korea Centers for Disease Control and Prevention. The allele frequencies of SNVs, CNVs, and combined SNVs with CNVs were calculated and haplotype analysis was performed. CYP2D6 rs1065852 (c.100C>T, p.P34S) was the most common variant allele (48.23%). A total of 8454 haplotype blocks in 18 pharmacogenes were estimated. DMD ranked the highest in frequency for gene gain (64.52%), while TPMT ranked the highest in frequency for gene loss (51.80%). Copy number gain of CYP4F2 was observed in 22 subjects; 13 of those subjects were carriers with CYP4F2*3 gain. In the case of TPMT, approximately one-half of the participants (N = 308) had loss of the TPMT*1*1 diplotype. The frequencies of SNVs and CNVs in pharmacogenes were determined using the Korean cohort-based genome-wide association study.



Pathogens ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 363
Author(s):  
Sulochana K. Wasala ◽  
Dana K. Howe ◽  
Louise-Marie Dandurand ◽  
Inga A. Zasada ◽  
Dee R. Denver

Globodera pallida is among the most significant plant-parasitic nematodes worldwide, causing major damage to potato production. Since it was discovered in Idaho in 2006, eradication efforts have aimed to contain and eradicate G. pallida through phytosanitary action and soil fumigation. In this study, we investigated genome-wide patterns of G. pallida genetic variation across Idaho fields to evaluate whether the infestation resulted from a single or multiple introduction(s) and to investigate potential evolutionary responses since the time of infestation. A total of 53 G. pallida samples (~1,042,000 individuals) were collected and analyzed, representing five different fields in Idaho, a greenhouse population, and a field in Scotland that was used for external comparison. According to genome-wide allele frequency and fixation index (Fst) analyses, most of the genetic variation was shared among the G. pallida populations in Idaho fields pre-fumigation, indicating that the infestation likely resulted from a single introduction. Temporal patterns of genome-wide polymorphisms involving (1) pre-fumigation field samples collected in 2007 and 2014 and (2) pre- and post-fumigation samples revealed nucleotide variants (SNPs, single-nucleotide polymorphisms) with significantly differentiated allele frequencies indicating genetic differentiation. This study provides insights into the genetic origins and adaptive potential of G. pallida invading new environments.



Author(s):  
Pauline Arnaud ◽  
Hélène Morel ◽  
Olivier Milleron ◽  
Laurent Gouya ◽  
Christine Francannet ◽  
...  

Abstract Purpose Individuals with mosaic pathogenic variants in the FBN1 gene are mainly described in the course of familial screening. In the literature, almost all these mosaic individuals are asymptomatic. In this study, we report the experience of our team on more than 5,000 Marfan syndrome (MFS) probands. Methods Next-generation sequencing (NGS) capture technology allowed us to identify five cases of MFS probands who harbored a mosaic pathogenic variant in the FBN1 gene. Results These five sporadic mosaic probands displayed classical features usually seen in Marfan syndrome. Combined with the results of the literature, these rare findings concerned both single-nucleotide variants and copy-number variations. Conclusion This underestimated finding should not be overlooked in the molecular diagnosis of MFS patients and warrants an adaptation of the parameters used in bioinformatics analyses. The five present cases of symptomatic MFS probands harboring a mosaic FBN1 pathogenic variant reinforce the fact that apparently asymptomatic mosaic parents should have a complete clinical examination and a regular cardiovascular follow-up. We advise that individuals with a typical MFS for whom no single-nucleotide pathogenic variant or exon deletion/duplication was identified should be tested by NGS capture panel with an adapted variant calling analysis.



2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Fatao Liu ◽  
Yongsheng Li ◽  
Dongjian Ying ◽  
Shimei Qiu ◽  
Yong He ◽  
...  

AbstractNeuroendocrine carcinoma (NEC) of the gallbladder (GB-NEC) is a rare but extremely malignant subtype of gallbladder cancer (GBC). The genetic and molecular signatures of GB-NEC are poorly understood; thus, molecular targeting is currently unavailable. In the present study, we applied whole-exome sequencing (WES) technology to detect gene mutations and predicted somatic single-nucleotide variants (SNVs) in 15 cases of GB-NEC and 22 cases of general GBC. In 15 GB-NECs, the C > T mutation was predominant among the 6 types of SNVs. TP53 showed the highest mutation frequency (73%, 11/15). Compared with neuroendocrine carcinomas of other organs, significantly mutated genes (SMGs) in GB-NECs were more similar to those in pulmonary large-cell neuroendocrine carcinomas (LCNECs), with driver roles for TP53 and RB1. In the COSMIC database of cancer-related genes, 211 genes were mutated. Strikingly, RB1 (4/15, 27%) and NAB2 (3/15, 20%) mutations were found specifically in GB-NECs; in contrast, mutations in 29 genes, including ERBB2 and ERBB3, were identified exclusively in GBC. Mutations in RB1 and NAB2 were significantly related to downregulation of the RB1 and NAB2 proteins, respectively, according to immunohistochemical (IHC) data (p values = 0.0453 and 0.0303). Clinically actionable genes indicated 23 mutated genes, including ALK, BRCA1, and BRCA2. In addition, potential somatic SNVs predicted by ISOWN and SomVarIUS constituted 6 primary COSMIC mutation signatures (1, 3, 30, 6, 7, and 13) in GB-NEC. Genes carrying somatic SNVs were enriched mainly in oncogenic signaling pathways involving the Notch, WNT, Hippo, and RTK-RAS pathways. In summary, we have systematically identified the mutation landscape of GB-NEC, and these findings may provide mechanistic insights into the specific pathogenesis of this deadly disease.



Sign in / Sign up

Export Citation Format

Share Document