scholarly journals Construction of coffee transcriptome networks based on gene annotation semantics

2012 ◽  
Vol 9 (3) ◽  
pp. 80-92 ◽  
Author(s):  
Luis F. Castillo ◽  
Narmer Galeano ◽  
Gustavo A. Isaza ◽  
Alvaro Gaitan

Summary Gene annotation is a process that encompasses multiple approaches on the analysis of nucleic acids or protein sequences in order to assign structural and functional characteristics to gene models. When thousands of gene models are being described in an organism genome, construction and visualization of gene networks impose novel challenges in the understanding of complex expression patterns and the generation of new knowledge in genomics research. In order to take advantage of accumulated text data after conventional gene sequence analysis, this work applied semantics in combination with visualization tools to build transcriptome networks from a set of coffee gene annotations. A set of selected coffee transcriptome sequences, chosen by the quality of the sequence comparison reported by Basic Local Alignment Search Tool (BLAST) and Interproscan, were filtered out by coverage, identity, length of the query, and e-values. Meanwhile, term descriptors for molecular biology and biochemistry were obtained along the Wordnet dictionary in order to construct a Resource Description Framework (RDF) using Ruby scripts and Methontology to find associations between concepts. Relationships between sequence annotations and semantic concepts were graphically represented through a total of 6845 oriented vectors, which were reduced to 745 non-redundant associations. A large gene network connecting transcripts by way of relational concepts was created where detailed connections remain to be validated for biological significance based on current biochemical and genetics frameworks. Besides reusing text information in the generation of gene connections and for data mining purposes, this tool development opens the possibility to visualize complex and abundant transcriptome data, and triggers the formulation of new hypotheses in metabolic pathways analysis.

2019 ◽  
Author(s):  
Joel Vizueta ◽  
Alejandro Sánchez-Gracia ◽  
Julio Rozas

AbstractGene annotation is a critical bottleneck in genomic research, especially for the comprehensive study of very large gene families in the genomes of non-model organisms. Despite the recent progress in automatic methods, the tools developed for this task often produce inaccurate annotations, such as fused, chimeric, partial or even completely absent gene models for many family copies, which require considerable extra efforts to be amended. Here we present BITACORA, a bioinformatics solution that integrates sequence similarity search tools and Perl scripts to facilitate both the curation of these inaccurate annotations and the identification of previously undetected gene family copies directly from DNA sequences. We tested the performance of the BITACORA pipeline in annotating the members of two chemosensory gene families of different sizes in seven available chelicerate genome drafts. Despite the relatively high fragmentation of some of these drafts, BITACORA was able to improve the annotation of many members of these families and detected thousands of new chemoreceptors encoded in genome sequences. The program generates an output file in the general feature format (GFF) files, with both curated and novel gene models, and a FASTA file with the predicted proteins. These outputs can be easily integrated in genomic annotation editors, greatly facilitating subsequent manual annotation and downstream evolutionary analyses.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Zhi-Qiang Du ◽  
Hao Liang ◽  
Xiao-Man Liu ◽  
Yun-Hua Liu ◽  
Chonglong Wang ◽  
...  

AbstractSuccessful early embryo development requires the correct reprogramming and configuration of gene networks by the timely and faithful execution of zygotic genome activation (ZGA). However, the regulatory principle of molecular elements and circuits fundamental to embryo development remains largely obscure. Here, we profiled the transcriptomes of single zygotes and blastomeres, obtained from in vitro fertilized (IVF) or parthenogenetically activated (PA) porcine early embryos (1- to 8-cell), focusing on the gene expression dynamics and regulatory networks associated with maternal-to-zygote transition (MZT) (mainly maternal RNA clearance and ZGA). We found that minor and major ZGAs occur at 1-cell and 4-cell stages for both IVF and PA embryos, respectively. Maternal RNAs gradually decay from 1- to 8-cell embryos. Top abundantly expressed genes (CDV3, PCNA, CDR1, YWHAE, DNMT1, IGF2BP3, ARMC1, BTG4, UHRF2 and gametocyte-specific factor 1-like) in both IVF and PA early embryos identified are of vital roles for embryo development. Differentially expressed genes within IVF groups are different from that within PA groups, indicating bi-parental and maternal-only embryos have specific sets of mRNAs distinctly decayed and activated. Pathways enriched from DEGs showed that RNA associated pathways (RNA binding, processing, transport and degradation) could be important. Moreover, mitochondrial RNAs are found to be actively transcribed, showing dynamic expression patterns, and for DNA/H3K4 methylation and transcription factors as well. Taken together, our findings provide an important resource to investigate further the epigenetic and genome regulation of MZT events in early embryos of pigs.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Kai Zhao ◽  
Song Chen ◽  
Wenjing Yao ◽  
Zihan Cheng ◽  
Boru Zhou ◽  
...  

Abstract Background The bZIP gene family, which is widely present in plants, participates in varied biological processes including growth and development and stress responses. How do the genes regulate such biological processes? Systems biology is powerful for mechanistic understanding of gene functions. However, such studies have not yet been reported in poplar. Results In this study, we identified 86 poplar bZIP transcription factors and described their conserved domains. According to the results of phylogenetic tree, we divided these members into 12 groups with specific gene structures and motif compositions. The corresponding genes that harbor a large number of segmental duplication events are unevenly distributed on the 17 poplar chromosomes. In addition, we further examined collinearity between these genes and the related genes from six other species. Evidence from transcriptomic data indicated that the bZIP genes in poplar displayed different expression patterns in roots, stems, and leaves. Furthermore, we identified 45 bZIP genes that respond to salt stress in the three tissues. We performed co-expression analysis on the representative genes, followed by gene set enrichment analysis. The results demonstrated that tissue differentially expressed genes, especially the co-expressing genes, are mainly involved in secondary metabolic and secondary metabolite biosynthetic processes. However, salt stress responsive genes and their co-expressing genes mainly participate in the regulation of metal ion transport, and methionine biosynthetic. Conclusions Using comparative genomics and systems biology approaches, we, for the first time, systematically explore the structures and functions of the bZIP gene family in poplar. It appears that the bZIP gene family plays significant roles in regulation of poplar development and growth and salt stress responses through differential gene networks or biological processes. These findings provide the foundation for genetic breeding by engineering target regulators and corresponding gene networks into poplar lines.


2018 ◽  
Author(s):  
Αλέξανδρος Τσακογιάννης

The differences between sexes and the concept of sex determination have always fascinated, yet troubled philosophers and scientists. Among the animals that reproduce sexually, teleost fishes show a very wide repertoire of reproductive modes. Except for the gonochoristic species, fish are the only vertebrates in which hermaphroditism appears naturally. Hermaphroditism refers to the capability of an organism to reproduce both as male and female in its life cycle and there are various forms of it. In sequential hermaphroditism, an individual begins as female first and then can change sex to become male (protogyny), or vice versa (protandry). The diverse sex-phenotypes of fish are regulated by a variety of sex determination mechanisms, along a continuum of environmental and heritable factors. The vast majority of sexually dimorphic traits result from the differential expression of genes that are present in both sexes. To date, studies regarding the sex-specific differences in gene expression have been conducted mainly in sex determination systems of model fish species that are well characterized at the genomic level, with distinguishable heteromorphic sex chromosomes, exhibiting genetic sex determination and gonochorism. Among teleosts, the Sparidae family is considered to be one of the most diversified families regarding its reproductive systems, and thus is a unique model for comparative studies to understand the molecular mechanisms underlying different sexual motifs. In this study, using RNA sequencing, we studied the transcriptome from gonads and brains of both sexes in five sparid species, representatives of four different reproductive styles. Specifically, we explored the sex-specific expression patterns of a gonochoristic species: the common dentex Dentex dentex, two protogynous hermaphrodites: the red porgy Pagrus pagrus and the common pandora Pagellus erythrinus, the rudimentary hermaphrodite sharpsnout seabream Diplodus puntazzo, and the protandrous gilthead seabream Sparus aurata. We found minor sex-related expression differences indicating a more homogeneous and sexually plastic brain, whereas there was a plethora of sex biased gene expression in the gonads. The functional divergence of the two gonadal types is reflected in their transcriptomic profiles, in terms of the number of genes differentially expressed, as well as the expression magnitude (i.e. fold-change differences). The observation of almost double the number of up-regulated genes in males compared to females indicates a male-biased expression tendency. Focusing on the pathways and genes implicated in sex determination/differentiation, we aimed to unveil the molecular pathways through which these non-model fish species develop a masculine or a feminine character. We observed the implicated pathways and major gene families (e.g. Wnt/b-catenin pathway and Retinoic-acid signaling pathway, Notch, TGFβ) behind sex-biased expression and the recruitment of known sex-related genes either to male or female type of gonads in these fish. (e.g Dmrt1, Sox9, Sox3, Cyp19a, Filgla, Ctnnb1, Gsdf9, Stra6 etc.). We also carefully investigated the presence of genes reported to be involved in sex determination/differentiation mechanisms in other vertebrates and fish and compared their expression patterns in the species under study. The expression profiling exposed known candidate molecular-players/genes establishing the common female (Cyp19a1, Sox3, Figla, Gdf9, Cyp26a, Ctnnb1, Dnmt1, Stra6) and male identity (Dmrt1, Sox9, Dnmt3aa, Rarb, Raraa, Hdac8, Tdrd7) of the gonad in these sparids. Additionally, we focused on those contributing to a species-specific manner either to female (Wnt4a, Dmrt2a, Foxl2 etc.) or to male (Amh, Dmrt3a, Cyp11b etc.) characters, and discussed the expression patterns of factors that belong to important pathways and/or gene families in the SD context, in our species gonadal transcriptomes. Taken together, most of the studied genes form part of the cascade of sex determination, differentiation, and reproduction across teleosts. In this study, we focused on genes that are active when sex is established (sex-maintainers), revealing the basic “gene-toolkit” & gene-networks underlying functional sex in these five sparids. Comparing related species with alternative reproductive styles, we saw different combinations of genes with conserved sex-linked roles and some “handy” molecular players, in a “partially- conserved” or “modulated” network formulating the male and female phenotype. The knowledge obtained in this study and tools developed during the process have set the groundwork for future experiments that can improve the sex control of this species and help the in-deep understanding the complex process of sex differentiation in the more flexible multi-component systems as these studied here.


Author(s):  
David Fichtmueller ◽  
Walter G. Berendsohn ◽  
Gabriele Droege ◽  
Falko Glöckler ◽  
Anton Güntsch ◽  
...  

The TDWG standard ABCD (Access to Biological Collections Data task group 2007) was aimed at harmonizing terminologies used for modelling biological collection information and is used as a comprehensive data format for transferring collection and observation data between software components. The project ABCD 3.0 (A community platform for the development and documentation of the ABCD standard for natural history collections) was financed by the German Research Council (DFG). It addressed the transformation of ABCD into a semantic web-compliant ontology by deconstructing the XML-schema into individually addressable RDF (Resource Description Framework) resources published via the TDWG Terms Wiki (https://terms.tdwg.org/wiki/ABCD_2). In a second step, informal properties and concept-relations described by the original ABCD-schema were transformed into a machine-readable ontology and revised (Güntsch et al. 2016). The project was successfully finished in January 2019. The ABCD 3 setup allows for the creation of standard-conforming application schemas. The XML variant of ABCD 3.0 was restructured, simplified and made more consistent in terms of element names and types as compared to version 2.x. The XML elements are connected to their semantic concepts using the W3C SAWSDL (Semantic Annotation for Web Services Description Language and XML Schema) standard. The creation of specialized applications schemas is encouraged, the first use case was the application schema for zoology. It will also be possible to generate application schemas that break the traditional unit-centric structure of ABCD. Further achievements of the project include creating a Wikibase instance as the editing platform, with related tools for maintenance queries, such as checking for inconsistencies in the ontology and automated export into RDF. This allows for fast iterations of new or updated versions, e.g. when additional mappings to other standards are done. The setup is agnostic to the data standard created, it can therefore also be used to create or model other standards. Mappings to other standards like Darwin Core (https://dwc.tdwg.org/) and Audubon Core (https://tdwg.github.io/ac/) are now machine readable as well. All XPaths (XML Paths) of ABCD 3.0 XML have been mapped to all variants of ABCD 2.06 and 2.1, which will ease transition to the new standard. The ABCD 3 Ontology will also be uploaded to the GFBio Terminology Server (Karam et al. 2016), where individual concepts can be easily searched or queried, allowing for better interactive modelling of ABCD concepts. ABCD documentation now adheres to TDWG’s Standards Documentation Standard (SDS, https://www.tdwg.org/standards/sds/) and is located at https://abcd.tdwg.org/. The new site is hosted on Github: https://github.com/tdwg/abcd/tree/gh-pages.


Author(s):  
Karen E. Boschen ◽  
Travis S. Ptacek ◽  
Matthew E. Berginski ◽  
Jeremy M. Simon ◽  
Scott E. Parnell

Fetal Alcohol Spectrum Disorders (FASD) are a serious public health concern, affecting approximately 5% of live births in the US. The more severe craniofacial and central nervous system malformations characteristic of FASD are caused by alcohol exposure during gastrulation (embryonic day 7 in mice; 3rd week of human pregnancy). Genetics are a known contributor to differences in alcohol sensitivity in humans and in animal models of FASD. Our study profiled gene expression in gastrulation-stage embryos from two commonly used, genetically similar mouse substrains, C57BL/6J and C57BL/6NHsd, that differ in alcohol sensitivity. First, we established normal gene expression patterns at three finely resolved timepoints during gastrulation and developed a web-based interactive tool. Baseline transcriptional differences across strains were associated with immune signaling, indicative of their molecular divergence. Second, we examined the gene networks impacted by alcohol in each strain. Alcohol was associated with a more pronounced transcriptional effect in the 6J's vs. 6N's, matching the 6J's increased susceptibility. The 6J strain exhibited down-regulation of cell proliferation and morphogenic signaling pathways and up-regulation of pathways related to cell death and craniofacial defects, while 6N's show enrichment of hypoxia (up) and cellular metabolism (down) pathways. Collectively, these datasets 1) provide insight into the changing transcriptional landscape across gastrulation in two commonly used mouse strains, 2) establish a valuable resource that enables the discovery of candidate genes that may modify susceptibility to prenatal alcohol exposure that can be validated in humans, and 3) identify novel pathogenic mechanisms potentially involved in alcohol's impact on development.


Author(s):  
Crescenzio Gallo

The possible applications of modeling and simulation in the field of bioinformatics are very extensive, ranging from understanding basic metabolic paths to exploring genetic variability. Experimental results carried out with DNA microarrays allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. In this chapter, the authors examine various methods for analyzing gene expression data, addressing the important topics of (1) selecting the most differentially expressed genes, (2) grouping them by means of their relationships, and (3) classifying samples based on gene expressions.


2019 ◽  
Vol 19 (1) ◽  
Author(s):  
Tao Xue ◽  
Han Zhang ◽  
Yuanyuan Zhang ◽  
Shuqin Wei ◽  
Qiujie Chao ◽  
...  

Abstract Background Pinellia ternata is native to China and has been used as a traditional herb due to its antiemetic, antitussive, analgesic, and anxiolytic effects. When exposed to strong light intensity and high temperature during the reproductive growth process, P. ternata withers in a phenomenon known as “sprout tumble”, which largely limits tuber production. Shade was previously found to delay sprout tumble formation (STF); however, no information exists regarding this process at the molecular level. Hence, we determined the genes involved in tuber development and STF in P. ternata. Results Compared to that with natural sun-light (control), shade significantly induced chlorophyll accumulation, increased chlorophyll fluorescence parameters including initial fluorescence, maximal fluorescence, and qP, and dramatically repressed chlorophyll a:b and NPQ. Catalase (CAT) activity was largely induced by shade, and tuber products were largely increased in this environment. Transcriptome profiles of P. ternata grown in natural sun-light and shaded environments were analyzed by a combination of next generation sequencing (NGS) and third generation single-molecule real-time (SMRT) sequencing. Corrections of SMRT long reads based on NGS short reads yielded 136,163 non-redundant transcripts, with an average N50 length of 2578 bp. In total, 6738 deferentially-expressed genes (DEGs) were obtained from the comparisons, specifically D5S vs D5CK, D20S vs D20CK, D20S vs D5S, and D20CK vs D5CK, of which, 6384 DEGs (94.8%) were generated from the D20S vs D20CK comparison. Gene annotation and functional analyses revealed that these genes were related to auxin signal transduction, polysaccharide and sugar metabolism, phenylpropanoid biosynthesis, and photosynthesis. Moreover, the expression of genes enriched in photosynthesis appeared to be significantly altered by shade. The expression patterns of 16 candidate genes were consistent with changes in their transcript abundance as identified by RNA-Seq, and these might contribute to STF and tuber production. Conclusion The full-length transcripts identified in this study have provided a more accurate depiction of P. ternata gene transcription. Further, we identified potential genes involved in STF and tuber growth. Such data could serve as a genetic resource and a foundation for further research on this important traditional herb.


2020 ◽  
Vol 36 (9) ◽  
pp. 2649-2656 ◽  
Author(s):  
Van Dinh Tran ◽  
Alessandro Sperduti ◽  
Rolf Backofen ◽  
Fabrizio Costa

Abstract Motivation The identification of disease–gene associations is a task of fundamental importance in human health research. A typical approach consists in first encoding large gene/protein relational datasets as networks due to the natural and intuitive property of graphs for representing objects’ relationships and then utilizing graph-based techniques to prioritize genes for successive low-throughput validation assays. Since different types of interactions between genes yield distinct gene networks, there is the need to integrate different heterogeneous sources to improve the reliability of prioritization systems. Results We propose an approach based on three phases: first, we merge all sources in a single network, then we partition the integrated network according to edge density introducing a notion of edge type to distinguish the parts and finally, we employ a novel node kernel suitable for graphs with typed edges. We show how the node kernel can generate a large number of discriminative features that can be efficiently processed by linear regularized machine learning classifiers. We report state-of-the-art results on 12 disease–gene associations and on a time-stamped benchmark containing 42 newly discovered associations. Availability and implementation Source code: https://github.com/dinhinfotech/DiGI.git. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 18 (06) ◽  
pp. 2050038
Author(s):  
Jorge Parraga-Alava ◽  
Mario Inostroza-Ponta

Using a prior biological knowledge of relationships and genetic functions for gene similarity, from repository such as the Gene Ontology (GO), has shown good results in multi-objective gene clustering algorithms. In this scenario and to obtain useful clustering results, it would be helpful to know which measure of biological similarity between genes should be employed to yield meaningful clusters that have both similar expression patterns (co-expression) and biological homogeneity. In this paper, we studied the influence of the four most used GO-based semantic similarity measures in the performance of a multi-objective gene clustering algorithm. We used four publicly available datasets and carried out comparative studies based on performance metrics for the multi-objective optimization field and clustering performance indexes. In most of the cases, using Jiang–Conrath and Wang similarities stand in terms of multi-objective metrics. In clustering properties, Resnik similarity allows to achieve the best values of compactness and separation and therefore of co-expression of groups of genes. Meanwhile, in biological homogeneity, the Wang similarity reports greater number of significant GO terms. However, statistical, visual, and biological significance tests showed that none of the GO-based semantic similarity measures stand out above the rest in order to significantly improve the performance of the multi-objective gene clustering algorithm.


Sign in / Sign up

Export Citation Format

Share Document