scholarly journals GenGraph: a python module for the simple generation and manipulation of genome graphs

2018 ◽  
Author(s):  
Jon Mitchell Ambler ◽  
Shandukani Mulaudzi ◽  
Nicola Mulder

AbstractBackgroundAs sequencing technology improves, the concept of a single reference genome is becoming increasingly restricting. In the case of Mycobacterium tuberculosis, one must often choose between using a genome that is closely related to the isolate, or one that is annotated in detail. One promising solution to this problem is through the graph based representation of collections of genomes as a single genome graph. Though there are currently a handful of tools that can create genome graphs and have demonstrated the advantages of this new paradigm, there still exists a need for flexible tools that can be used by researchers to overcome challenges in genomics studies.ResultsWe present the GenGraph toolkit, a tool that uses existing multiple sequence alignment tools to create genome graphs. It is written in Python, one of the most popular coding languages for the biological sciences, and creates the genome graphs as Python NetworkX graph objects. The conceptual model is highly intuitive, and as much as possible represents the biological relationship between the genomes. This design means that users will quickly be able to start creating genome graphs and using them in their own projects.We outline the methods used in the generation of the graphs, and give some examples of how the created graphs may be used. GenGraph utilises existing file formats and methods in the generation of these graphs, allowing graphs to be visualised and imported with widely used applications, including Cytoscape, R, and Java Script.ConclusionGenGraph provides a set of tools for generating graph based representations of sets of sequences with a simple conceptual model in a widely used coding language. It is publicly available on Github (https://github.com/jambler24/GenGraph).


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Jon Mitchell Ambler ◽  
Shandukani Mulaudzi ◽  
Nicola Mulder

Abstract Background As sequencing technology improves, the concept of a single reference genome is becoming increasingly restricting. In the case of Mycobacterium tuberculosis, one must often choose between using a genome that is closely related to the isolate, or one that is annotated in detail. One promising solution to this problem is through the graph based representation of collections of genomes as a single genome graph. Though there are currently a handful of tools that can create genome graphs and have demonstrated the advantages of this new paradigm, there still exists a need for flexible tools that can be used by researchers to overcome challenges in genomics studies. Results We present GenGraph, a Python toolkit and accompanying modules that use existing multiple sequence alignment tools to create genome graphs. Python is one of the most popular coding languages for the biological sciences, and by providing these tools, GenGraph makes it easier to experiment and develop new tools that utilise genome graphs. The conceptual model used is highly intuitive, and as much as possible the graph structure represents the biological relationship between the genomes. This design means that users will quickly be able to start creating genome graphs and using them in their own projects. We outline the methods used in the generation of the graphs, and give some examples of how the created graphs may be used. GenGraph utilises existing file formats and methods in the generation of these graphs, allowing graphs to be visualised and imported with widely used applications, including Cytoscape, R, and Java Script. Conclusions GenGraph provides a set of tools for generating graph based representations of sets of sequences with a simple conceptual model, written in the widely used coding language Python, and publicly available on Github.



Endocrinology ◽  
2008 ◽  
Vol 149 (8) ◽  
pp. 3860-3869 ◽  
Author(s):  
Scott I. Kavanaugh ◽  
Masumi Nozaki ◽  
Stacia A. Sower

We cloned a cDNA encoding a novel (GnRH), named lamprey GnRH-II, from the sea lamprey, a basal vertebrate. The deduced amino acid sequence of the newly identified lamprey GnRH-II is QHWSHGWFPG. The architecture of the precursor is similar to that reported for other GnRH precursors consisting of a signal peptide, decapeptide, a downstream processing site, and a GnRH-associated peptide; however, the gene for lamprey GnRH-II does not have introns in comparison with the gene organization for all other vertebrate GnRHs. Lamprey GnRH-II precursor transcript was widely expressed in a variety of tissues. In situ hybridization of the brain showed expression and localization of the transcript in the hypothalamus, medulla, and olfactory regions, whereas immunohistochemistry using a specific antiserum showed only GnRH-II cell bodies and processes in the preoptic nucleus/hypothalamus areas. Lamprey GnRH-II was shown to stimulate the hypothalamic-pituitary axis using in vivo and in vitro studies. Lamprey GnRH-II was also shown to activate the inositol phosphate signaling system in COS-7 cells transiently transfected with the lamprey GnRH receptor. These studies provide evidence for a novel lamprey GnRH that has a role as a third hypothalamic GnRH. In summary, the newly discovered lamprey GnRH-II offers a new paradigm of the origin of the vertebrate GnRH family. We hypothesize that due to a genome/gene duplication event, an ancestral gene gave rise to two lineages of GnRHs: the gnathostome GnRH and lamprey GnRH-II.



2018 ◽  
Vol 14 (3) ◽  
pp. 01-11 ◽  
Author(s):  
Fernanda Dolcimasculo ◽  
Alessandra Ferreira Ribas ◽  
Luiz Gonzaga Esteves Vieira ◽  
Tiago Benedito dos Santos

Galactinol synthase (GolS) is theenzyme that catalyzes the first step of the biosynthesis of the raffinose family oligosaccharides (RFOs), andisinvolved in manybiological processes in plants.In the present study, four putative GolSgenes were identified in the Musa acuminatagenome. We further characterized these MaGolSgenes in terms of protein length, molecular weight, theoretical isoelectric point and 3D protein structure. Genomic organization revealed that most MaGolSgenes have four exons. The conserved motifs were identified,demonstrating high group-specificityof all MaGolS proteins. Multiple sequence alignment showedthat theAPSAA typical domainispresent in all GolS proteins. Comparativephylogenetic analysis oftheMaGolS proteins revealed three distinct groups.These data provide insight to support new studies adressing the role of GolSgenes in this important fruit species.



2017 ◽  
Author(s):  
Lena M. Joesch-Cohen ◽  
Max Robinson ◽  
Neda Jabbari ◽  
Christopher Lausted ◽  
Gustavo Glusman

AbstractBackgroundBacterial genomes have characteristic compositional skews, which are differences in nucleotide frequency between the leading and lagging DNA strands across a segment of a genome. It is thought that these strand asymmetries arise as a result of mutational biases and selective constraints, particularly for energy efficiency. Analysis of compositional skews in a diverse set of bacteria provides a comparative context in which mutational and selective environmental constraints can be studied. These analyses typically require finished and well-annotated genomic sequences.ResultsWe present three novel metrics for examining genome composition skews; all three metrics can be computed for unfinished or partially-annotated genomes. The first two metrics, (dot-skew and cross-skew) depend on sequence and gene annotation of a single genome, while the third metric (residual skew) highlights unusual genomes by subtracting a GC content-based model of a library of genome sequences. We applied these metrics to all 7738 available bacterial genomes, including partial drafts, and identified outlier species. A number of these outliers (i.e., Borrelia, Ehrlichia, Kinetoplastibacterium, and Phytoplasma) display similar skew patterns despite only distant phylogenetic relationship. While unrelated, some of the outlier bacterial species share lifestyle characteristics, in particular intracellularity and biosynthetic dependence on their hosts.ConclusionsOur novel metrics appear to reflect the effects of biosynthetic constraints and adaptations to life within one or more hosts on genome composition. We provide results for each analyzed genome, software and interactive visualizations at http://db.systemsbiology.net/gestalt/skew_metrics.



2021 ◽  
Author(s):  
Josh Moore ◽  
Chris Allan ◽  
Sebastien Besson ◽  
Jean-marie Burel ◽  
Erin Diel ◽  
...  

Biological imaging is one of the most innovative fields in the modern biological sciences. New imaging modalities, probes, and analysis tools appear every few months and often prove decisive for enabling new directions in scientific discovery. One feature of this dynamic field is the need to capture new types of data and data structures. While there is a strong drive to make scientific data Findable, Accessible, Interoperable and Reproducible (FAIR, 1), the rapid rate of innovation in imaging impedes the unification and adoption of standardized data formats. Despite this, the opportunities for sharing and integrating bioimaging data and, in particular, linking these data to other "omics" datasets have never been greater; therefore, to every extent possible, increasing "FAIRness" of bioimaging data is critical for maximizing scientific value, as well as for promoting openness and integrity. In the absence of a common, FAIR format, two approaches have emerged to provide access to bioimaging data: translation and conversion. On-the-fly translation produces a transient representation of bioimage metadata and binary data but must be repeated on each use. In contrast, conversion produces a permanent copy of the data, ideally in an open format that makes the data more accessible and improves performance and parallelization in reads and writes. Both approaches have been implemented successfully in the bioimaging community but both have limitations. At cloud-scale, those shortcomings limit scientific analysis and the sharing of results. We introduce here next-generation file formats (NGFF) as a solution to these challenges.



Author(s):  
Radhakrishnan Sriganesh ◽  
◽  
R. Joseph Ponniah ◽  

The article explores the biology of reading and how reading influences the biological relationship among language, cognition, and emotion (LCE). Reading aids in the enhancement of LCE under the precondition that biological predispositions for reading ability and LCE, such as genetic makeup, epigenetic modifications and neuronal development are favourable. A conceptual model was developed to explain how reading incrementally enhances LCE. The model serves as a tool to understand the biological and pedagogical conditions through which reading helps in progressing through successive LCE levels. The article also proposes that this holistic perspective of reading, considering genetics, epigenetics, neuroscience, neuropsychology and pedagogy, paves way for targeted clinical and educational interventions for people with language learning difficulties/disability.



Agronomy ◽  
2020 ◽  
Vol 10 (7) ◽  
pp. 1015
Author(s):  
Yuan Ren ◽  
Dapeng Ge ◽  
Jianmei Dong ◽  
Linhui Guo ◽  
Zhaohe Yuan

Mitogen-activated protein kinase (MAPK) cascade is involved in the regulation of a series of biological processes in organisms, which are composed of MAPKKKs, MAPKKs, and MAPKs. Although genome-wide analyses of it has been well described in some species, little is known about MAPK and MAPKK genes in pomegranates. In this study, we identified 18 PgMAPKs, 9 PgMAPKKs through a genome-wide search. Chromosome localization showed that 27 genes are distributed on 7 chromosomes with different densities. Multiple sequence alignment and phylogenetic analysis revealed that PgMAPKs and PgMAPKKs could be divided into 4 subfamilies (groups A, B, C, and D), respectively. In addition, exon-introns structural analysis of each candidate gene has indicated high levels of conservation within and between phylogenetic groups. Cis-acting element analysis predicted that PgMAPKs and PgMAPKKs were widely involved in the growth, development, stress and hormone response of pomegranate. Expression profile analyses of PgMAPKs and PgMAPKKs were performed in different tissues (root, leaf, flower and fruit), and PgMAPK13 was significantly expressed in all tissues. To our knowledge, this is the first genome-wide analysis of the MAPK and MAPKK gene family in pomegranate. This study provides valuable information for understanding the classification and functions of pomegranate MAPK signal.



F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 1391
Author(s):  
Evan Biederstedt ◽  
Jeffrey C. Oliver ◽  
Nancy F. Hansen ◽  
Aarti Jajoo ◽  
Nathan Dunn ◽  
...  

Genome graphs are emerging as an important novel approach to the analysis of high-throughput human sequencing data. By explicitly representing genetic variants and alternative haplotypes in a mappable data structure, they can enable the improved analysis of structurally variable and hyperpolymorphic regions of the genome. In most existing approaches, graphs are constructed from variant call sets derived from short-read sequencing. As long-read sequencing becomes more cost-effective and enables de novo assembly for increasing numbers of whole genomes, a method for the direct construction of a genome graph from sets of assembled human genomes would be desirable. Such assembly-based genome graphs would encompass the wide spectrum of genetic variation accessible to long-read-based de novo assembly, including large structural variants and divergent haplotypes. Here we present NovoGraph, a method for the construction of a human genome graph directly from a set of de novo assemblies. NovoGraph constructs a genome-wide multiple sequence alignment of all input contigs and creates a graph by merging the input sequences at positions that are both homologous and sequence-identical. NovoGraph outputs resulting graphs in VCF format that can be loaded into third-party genome graph toolkits. To demonstrate NovoGraph, we construct a genome graph with 23,478,835 variant sites and 30,582,795 variant alleles from de novo assemblies of seven ethnically diverse human genomes (AK1, CHM1, CHM13, HG003, HG004, HX1, NA19240). Initial evaluations show that mapping against the constructed graph reduces the average mismatch rate of reads from sample NA12878 by approximately 0.2%, albeit at a slightly increased rate of reads that remain unmapped.



2017 ◽  
Author(s):  
Eric S Ho ◽  
Catherine M Newsom-Stewart ◽  
Lysa Diarra ◽  
Caroline S McCauley

Background: Geminivirus (family Geminiviridae) is a prevalent plant virus that imperils agriculture globally, causing serious damage to the livelihood of farmers, particularly in developing countries. The virus evolves rapidly, attributing to its single-stranded genome propensity, resulting in worldwide circulation of diverse and viable genomes. Genomics is a prominent approach taken by researchers in elucidating the infectious mechanism of the virus. Currently, NCBI Viral Genome website is a popular repository of viral genomes that conveniently provides researchers a centralized data source of genomic information. However, unlike the genome of living organisms, viral genomes most often maintain peculiar characteristics that fit into no single genome architecture. By imposing a unified annotation scheme on the myriad of viral genomes may downplay their hallmark features. For example, virion of Begomovirus prevailing in America encapsulates two similar-sized circular genomes and both are required to maintain virulence. But, the two bipartite genomes are kept separately in NCBI with no explicit association in linking them. Thus, our goal is to build a comprehensive Geminivirus genomics database, namely gb4gv, that not only preserves genomic characteristics of the virus, but also supplements biologically relevant annotations that help to interrogate this virus e.g. the targeted host, putative iterons, siRNA targets etc. Methods: We have employed manual and automatic methods to curate 508 genomes from four major genera of Geminiviruses, and 161 associated satellites obtained from NCBI RefSeq and PubMed databases. Results: These data are available for free access without registration from our website. Besides genomic content, our website provides visualization capability inherited from UCSC Genome Browser. Discussion: With the genomic information readily accessible, we hope that our database will inspire researchers in gaining better understanding about this virus, resulting in insightful strategies to conquer the devastation inflicted agriculture. Availability and Implementation: Database URL: http://gb4gv.lafayette.edu .



2017 ◽  
Vol 114 (35) ◽  
pp. 9391-9396 ◽  
Author(s):  
JaeJin Choi ◽  
Sung-Hou Kim

Fungi belong to one of the largest and most diverse kingdoms of living organisms. The evolutionary kinship within a fungal population has so far been inferred mostly from the gene-information–based trees (“gene trees”), constructed commonly based on the degree of differences of proteins or DNA sequences of a small number of highly conserved genes common among the population by a multiple sequence alignment (MSA) method. Since each gene evolves under different evolutionary pressure and time scale, it has been known that one gene tree for a population may differ from other gene trees for the same population depending on the subjective selection of the genes. Within the last decade, a large number of whole-genome sequences of fungi have become publicly available, which represent, at present, the most fundamental and complete information about each fungal organism. This presents an opportunity to infer kinship among fungi using a whole-genome information-based tree (“genome tree”). The method we used allows comparison of whole-genome information without MSA, and is a variation of a computational algorithm developed to find semantic similarities or plagiarism in two books, where we represent whole-genomic information of an organism as a book of words without spaces. The genome tree reveals several significant and notable differences from the gene trees, and these differences invoke new discussions about alternative narratives for the evolution of some of the currently accepted fungal groups.



Sign in / Sign up

Export Citation Format

Share Document