scholarly journals Reference-agnostic representation and visualization of pan-genomes

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Qihua Liang ◽  
Stefano Lonardi

Abstract Background The pan-genome of a species is the union of the genes and non-coding sequences present in all individuals (cultivar, accessions, or strains) within that species. Results Here we introduce PGV, a reference-agnostic representation of the pan-genome of a species based on the notion of consensus ordering. Our experimental results demonstrate that PGV enables an intuitive, effective and interactive visualization of a pan-genome by providing a genome browser that can elucidate complex structural genomic variations. Conclusions The PGV software can be installed via conda or downloaded from https://github.com/ucrbioinfo/PGV. The companion PGV browser at http://pgv.cs.ucr.edu can be tested using example bed tracks available from the GitHub page.

2017 ◽  
Author(s):  
Benedict Paten ◽  
Adam M Novak ◽  
Erik Garrison ◽  
Glenn Hickey

AbstractA superbubble is a type of directed acyclic subgraph with single distinct source and sink vertices. In genome assembly and genetics, the possible paths through a superbubble can be considered to represent the set of possible sequences at a location in a genome. Bidirected and biedged graphs are a generalization of digraphs that are increasingly being used to more fully represent genome assembly and variation problems. Here we define snarls and ultrabubbles, generalizations of superbubbles for bidirected and biedged graphs, and give an efficient algorithm for the detection of these more general structures. Key to this algorithm is the cactus graph, which we show encodes the nested decomposition of a graph into snarls and ultrabubbles within its structure. We propose and demonstrate empirically that this decomposition on bidirected and biedged graphs solves a fundamental problem by defining genetic sites for any collection of genomic variations, including complex structural variations, without need for any single reference genome coordinate system. Furthermore, the nesting of the decomposition gives a natural way to describe and model variations contained within large variations, a case not currently dealt with by existing formats, e.g. VCF.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Huihui Li ◽  
Mingzhe Xie ◽  
Yan Wang ◽  
Ludong Yang ◽  
Zhi Xie ◽  
...  

AbstractriboCIRC is a translatome data-oriented circRNA database specifically designed for hosting, exploring, analyzing, and visualizing translatable circRNAs from multi-species. The database provides a comprehensive repository of computationally predicted ribosome-associated circRNAs; a manually curated collection of experimentally verified translated circRNAs; an evaluation of cross-species conservation of translatable circRNAs; a systematic de novo annotation of putative circRNA-encoded peptides, including sequence, structure, and function; and a genome browser to visualize the context-specific occupant footprints of circRNAs. It represents a valuable resource for the circRNA research community and is publicly available at http://www.ribocirc.com.


Author(s):  
Suchart Chanama ◽  
Chanwit Suriyachadkun ◽  
Manee Chanama

A novel actinomycete, strain SMC 257T, was isolated from a soil sample collected from mountain forest, Nan Province, Thailand. Strain SMC 257T formed tightly closed spiral spore chains on aerial mycelia. A polyphasic approach was used for the taxonomic study of this strain. Phylogenetic analysis based on 16S rRNA gene sequences indicated that strain SMC 257T belonged to the genus Nonomuraea , and the closest phylogenetically related species were Nonomuraea roseoviolacea subsp. carminata JCM 9946T (98.9 % 16S rRNA gene sequence similarity), Nonomuraea rhodomycinica TBRC 6557T (98.4 %), and Nonomuraea roseoviolacea subsp. roseoviolacea JCM 3145T (98.3 %). Genome sequencing revealed a genome size of 9.76 Mbp and a G+C content of 72.3 mol%. The genome average nucleotide identity (ANI) and the digital DNA–DNA hybridization (dDDH) values that distinguished this novel strain from its closest related species were species boundary of 95–96 % and 70 %, respectively. The cell wall peptidoglycan contained meso-diaminopimelic acid. The whole-cell sugars were glucose, ribose, madurose and mannose. The major menaquinone was MK-9(H4). The polar lipid profile consisted of phosphatidylethanolamine, hydroxyphosphatidylethanolamine, lysophosphatidylethanolamine, diphosphatidylglycerol, N-phosphatidylglycerol, phosphatidylinositol and phosphatidylinositol mannosides. The predominant cellular fatty acids were C17 : 0 10-methyl and iso-C16 : 0. Based on comparative analysis of phenotypic, chemotaxonomic and genotypic data, strain SMC 257T is considered to represent a novel species of the genus Nonomuraea , for which the name Nonomuraea montanisoli is proposed. The type strain is SMC 257T (=TBRC 13065T=NBRC 114772T).


Cells ◽  
2018 ◽  
Vol 7 (12) ◽  
pp. 269 ◽  
Author(s):  
Meenakshi Agarwal ◽  
Ashish Pathak ◽  
Rajesh Rathore ◽  
Om Prakash ◽  
Rakesh Singh ◽  
...  

Two Burkholderia spp. (strains SRS-25 and SRS-46) were isolated from high concentrations of uranium (U) from the U.S. Department of Energy (DOE)-managed Savannah River Site (SRS). SRS contains soil gradients that remain co-contaminated by heavy metals from previous nuclear weapons production activities. Uranium (U) is one of the dominant contaminants within the SRS impacted soils, which can be microbially transformed into less toxic forms. We established microcosms containing strains SRS-25 and SRS-46 spiked with U and evaluated the microbially-mediated depletion with concomitant genomic and proteomic analysis. Both strains showed a rapid depletion of U; draft genome sequences revealed SRS-25 genome to be of approximately 8,152,324 bp, a G + C content of 66.5, containing a total 7604 coding sequences with 77 total RNA genes. Similarly, strain SRS-46 contained a genome size of 8,587,429 bp with a G + C content of 67.1, 7895 coding sequences, with 73 total RNA genes, respectively. An in-depth, genome-wide comparisons between strains 25, 46 and a previously isolated strain from our research (Burkholderia sp. strain SRS-W-2-2016), revealed a common pool of 3128 genes; many were found to be homologues to previously characterized metal resistance genes (e.g., for cadmium, cobalt, and zinc), as well as for transporter, stress/detoxification, cytochromes, and drug resistance functions. Furthermore, proteomic analysis of strains with or without U stress, revealed the increased expression of 34 proteins from strain SRS-25 and 52 proteins from strain SRS-46; similar to the genomic analyses, many of these proteins have previously been shown to function in stress response, DNA repair, protein biosynthesis and metabolism. Overall, this comparative proteogenomics study confirms the repertoire of metabolic and stress response functions likely rendering the ecological competitiveness to the isolated strains for colonization and survival in the heavy metals contaminated SRS soil habitat.


2020 ◽  
Vol 16 (12) ◽  
pp. e1008439
Author(s):  
Jennifer Lu ◽  
Steven L. Salzberg

GC skew is a phenomenon observed in many bacterial genomes, wherein the two replication strands of the same chromosome contain different proportions of guanine and cytosine nucleotides. Here we demonstrate that this phenomenon, which was first discovered in the mid-1990s, can be used today as an analysis tool for the 15,000+ complete bacterial genomes in NCBI’s Refseq library. In order to analyze all 15,000+ genomes, we introduce a new method, SkewIT (Skew Index Test), that calculates a single metric representing the degree of GC skew for a genome. Using this metric, we demonstrate how GC skew patterns are conserved within certain bacterial phyla, e.g. Firmicutes, but show different patterns in other phylogenetic groups such as Actinobacteria. We also discovered that outlier values of SkewIT highlight potential bacterial mis-assemblies. Using our newly defined metric, we identify multiple mis-assembled chromosomal sequences in previously published complete bacterial genomes. We provide a SkewIT web app https://jenniferlu717.shinyapps.io/SkewIT/ that calculates SkewI for any user-provided bacterial sequence. The web app also provides an interactive interface for the data generated in this paper, allowing users to further investigate the SkewI values and thresholds of the Refseq-97 complete bacterial genomes. Individual scripts for analysis of bacterial genomes are provided in the following repository: https://github.com/jenniferlu717/SkewIT.


2020 ◽  
Vol 70 (8) ◽  
pp. 4646-4652 ◽  
Author(s):  
Nadezhda V. Agafonova ◽  
Elena N. Kaparullina ◽  
Denis S. Grouzdev ◽  
Nina V. Doronina

Novel aerobic, restricted facultatively methylotrophic bacteria were isolated from buds of English oak (Quercus robur L.; strain DubT) and northern red oak (Quercus rubra L.; strain KrD). The isolates were Gram-negative, asporogenous, motile short rods that multiplied by binary fisson. They utilized methanol, methylamine and a few polycarbon compounds as carbon and energy sources. Optimal growth occurred at 25 °C and pH 7.5. The dominant phospholipids were phosphatidylethanolamine, phosphatidylcholine, diphosphatidylglycerol and phoshatidylglycerol. The major cellular fatty acids of cells were C18 : 1 ω7c, 11-methyl C18 : 1 ω7c and C16 : 0. The major ubiquinone was Q-10. Analysis of 16S rRNA gene sequences showed that the strains were closely related to the members of the genus Hansschlegelia : Hansschlegelia zhihuaiae S113T(97.5–98.0 %), Hansschlegelia plantiphila S1T (97.4–97.6 %) and Hansschlegelia beijingensis PG04T(97.0–97.2 %). The 16S rRNA gene sequence similarity between strains DubT and KrD was 99.7 %, and the DNA–DNA hybridization (DDH) result between the strains was 85 %. The ANI and the DDH values between strain DubT and H. zhihuaiae S113T were 80.1 and 21.5  %, respectively. Genome sequencing of the strain DubT revealed a genome size of 3.57 Mbp and a G+C content of 67.0 mol%. Based on the results of the phenotypic, chemotaxonomic and genotypic analyses, it is proposed that the isolates be assigned to the genus Hansschlegelia as Hansschlegelia quercus sp. nov. with the type strain DubT (=VKM B-3284T=CCUG 73648T=JCM 33463T).


2017 ◽  
Author(s):  
Mircea Cretu Stancu ◽  
Markus J. van Roosmalen ◽  
Ivo Renkens ◽  
Marleen Nieboer ◽  
Sjors Middelkamp ◽  
...  

AbstractStructural genomic variants form a common type of genetic alteration underlying human genetic disease and phenotypic variation. Despite major improvements in genome sequencing technology and data analysis, the detection of structural variants still poses challenges, particularly when variants are of high complexity. Emerging long-read single-molecule sequencing technologies provide new opportunities for detection of structural variants. Here, we demonstrate sequencing of the genomes of two patients with congenital abnormalities using the ONT MinION at 11x and 16x mean coverage, respectively. We developed a bioinformatic pipeline - NanoSV - to efficiently map genomic structural variants (SVs) from the long-read data. We demonstrate that the nanopore data are superior to corresponding short-read data with regard to detection of de novo rearrangements originating from complex chromothripsis events in the patients. Additionally, genome-wide surveillance of SVs, revealed 3,253 (33%) novel variants that were missed in short-read data of the same sample, the majority of which are duplications < 200bp in size. Long sequencing reads enabled efficient phasing of genetic variations, allowing the construction of genome-wide maps of phased SVs and SNVs. We employed read-based phasing to show that all de novo chromothripsis breakpoints occurred on paternal chromosomes and we resolved the long-range structure of the chromothripsis. This work demonstrates the value of long-read sequencing for screening whole genomes of patients for complex structural variants.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 1039
Author(s):  
Xinyan Zhang ◽  
Manali Rupji ◽  
Jeanne Kowalski

We present GAC, a shiny R based tool for interactive visualization of clinical associations based on high-dimensional data. The tool provides a web-based suite to perform supervised principal component analysis (SuperPC), an approach that uses both high-dimensional data, such as gene expression, combined with clinical data to infer clinical associations. We extended the approach to address binary outcomes, in addition to continuous and time-to-event data in our package, thereby increasing the use and flexibility of SuperPC.  Additionally, the tool provides an interactive visualization for summarizing results based on a forest plot for both binary and time-to-event data.  In summary, the GAC suite of tools provide a one stop shop for conducting statistical analysis to identify and visualize the association between a clinical outcome of interest and high-dimensional data types, such as genomic data. Our GAC package has been implemented in R and is available via http://shinygispa.winship.emory.edu/GAC/. The developmental repository is available at https://github.com/manalirupji/GAC.


2020 ◽  
Vol 70 (3) ◽  
pp. 1868-1875 ◽  
Author(s):  
Shan-Hui Li ◽  
Jaeho Song ◽  
Yeonjung Lim ◽  
Yochan Joung ◽  
Ilnam Kang ◽  
...  

A Gram-stain-negative, rod-shaped, aerobic, non-flagellated, chemoheterotrophic bacterium, designated IMCC14385T, was isolated from surface seawater of the East Sea, Republic of Korea. The 16S rRNA gene sequence analysis indicated that IMCC14385T represented a member of the genus Halioglobus sharing 94.6–97.8 % similarities with species of the genus. Whole-genome sequencing of IMCC14385T revealed a genome size of 4.3 Mbp and DNA G+C content of 56.7 mol%. The genome of IMCC14385T shared an average nucleotide identity of 76.6 % and digital DNA–DNA hybridization value of 21.6 % with the genome of Halioglobus japonicus KCTC 23429T. The genome encoded the complete poly-β-hydroxybutyrate biosynthesis pathway. The strain contained summed feature 8 (C18 : 1 ω7c and/or C18 : 1 ω6c), summed feature 3 (C16 : 1 ω7c and/or C16 : 1 ω6c) and C17 : 1 ω8c as the predominant cellular fatty acids as well as ubiquinone-8 (Q-8) as the respiratory quinone. The polar lipids detected in the strain were phosphatidylethanolamine, phosphatidylglycerol, diphosphatidylglycerol, five unidentified phospholipids, an unidentified aminolipid, an unidentified aminophospholipid and four unidentified lipids. On the basis of taxonomic data obtained in this study, it is suggested that IMCC14385T represents a novel species of the genus Halioglobus , for which the name Halioglobus maricola sp. nov. is proposed. The type strain is IMCC14385T (=KCTC 72520T=NBRC 114072T).


Author(s):  
Angéline Antezack ◽  
Manon Boxberger ◽  
Mariem Ben Khedher ◽  
Bernard La Scola ◽  
Virginie Monnet-Corti

A Gram-stain-negative bacterium, designated strain Marseille-Q3039T, was isolated from subgingival dental plaque of a woman with gingivitis in Marseille, France. Strain Marseille-Q3039T was found to be an anaerobic, motile and spore-forming crescent-shaped bacterium that grew at 25–41.5 °C (optimum, 37 °C), pH 5.5–8.5 (optimum, pH 7.5) and salinity of 5.0 g l−1 NaCl. The results of 16S rRNA gene sequence analysis revealed that strain Marseille-Q3039T was closely related to Selenomonas infelix ATCC 43532T (98.42 % similarity), Selenomonas dianae ATCC 43527T (97.25 %) and Centipedia periodontii DSM 2778T (97.19 %). The orthologous average nucleotide identity and digital DNA–DNA hybridization relatedness between strain Q3039T and its closest phylogenetic neighbours were respectively 84.57 and 28.2 % for S. infelix ATCC 43532T and 83.93 and 27.2 % for C. periodontii DSM 2778T. The major fatty acids were identified as C13 : 0 (27.7 %), C15 : 0 (24.4 %) and specific C13 : 0 3-OH (12.3 %). Genome sequencing revealed a genome size of 2 351 779 bp and a G+C content of 57.2 mol%. On the basis of the results from phenotypic, chemotaxonomic, genomic and phylogenetic analyses and data, we concluded that strain Marseille-Q3039T represents a novel species of the genus Selenomonas , for which the name Selenomonas timonae sp. nov. is proposed (=CSUR Q3039=CECT 30128).


Sign in / Sign up

Export Citation Format

Share Document