scholarly journals An integrated text mining framework for metabolic interaction network reconstruction

PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e1811 ◽  
Author(s):  
Preecha Patumcharoenpol ◽  
Narumol Doungpan ◽  
Asawin Meechai ◽  
Bairong Shen ◽  
Jonathan H. Chan ◽  
...  

Text mining (TM) in the field of biology is fast becoming a routine analysis for the extraction and curation of biological entities (e.g., genes, proteins, simple chemicals) as well as their relationships. Due to the wide applicability of TM in situations involving complex relationships, it is valuable to apply TM to the extraction of metabolic interactions (i.e., enzyme and metabolite interactions) through metabolic events. Here we present an integrated TM framework containing two modules for the extraction of metabolic events (Metabolic Event Extraction module—MEE) and for the construction of a metabolic interaction network (Metabolic Interaction Network Reconstruction module—MINR). The proposed integrated TM framework performed well based on standard measures of recall, precision and F-score. Evaluation of the MEE module using the constructed Metabolic Entities (ME) corpus yielded F-scores of 59.15% and 48.59% for the detection of metabolic events for production and consumption, respectively. As for the testing of the entity tagger for Gene and Protein (GP) and metabolite with the test corpus, the obtained F-score was greater than 80% for the Superpathway of leucine, valine, and isoleucine biosynthesis. Mapping of enzyme and metabolite interactions through network reconstruction showed a fair performance for the MINR module on the test corpus with F-score >70%. Finally, an application of our integrated TM framework on a big-scale data (i.e., EcoCyc extraction data) for reconstructing a metabolic interaction network showed reasonable precisions at 69.93%, 70.63% and 46.71% for enzyme, metabolite and enzyme–metabolite interaction, respectively. This study presents the first open-source integrated TM framework for reconstructing a metabolic interaction network. This framework can be a powerful tool that helps biologists to extract metabolic events for further reconstruction of a metabolic interaction network. The ME corpus, test corpus, source code, and virtual machine image with pre-configured software are available atwww.sbi.kmutt.ac.th/ preecha/metrecon.


2020 ◽  
pp. 1686-1704
Author(s):  
Emna Hkiri ◽  
Souheyl Mallat ◽  
Mounir Zrigui

The event extraction task consists in determining and classifying events within an open-domain text. It is very new for the Arabic language, whereas it attained its maturity for some languages such as English and French. Events extraction was also proved to help Natural Language Processing tasks such as Information Retrieval and Question Answering, text mining, machine translation etc… to obtain a higher performance. In this article, we present an ongoing effort to build a system for event extraction from Arabic texts using Gate platform and other tools.



2020 ◽  
Author(s):  
Debarati Roychowdhury ◽  
Samir Gupta ◽  
Xihan Qin ◽  
Cecilia N. Arighi ◽  
K. Vijay-Shanker

AbstractMotivationmicroRNAs (miRNAs) are essential gene regulators and their dysregulation often leads to diseases. Easy access to miRNA information is crucial for interpreting generated experimental data, connecting facts across publications, and developing new hypotheses built on previous knowledge. Here, we present emiRIT, a text mining-based resource, which presents miRNA information mined from the literature through a user-friendly interface.ResultsWe collected 149,233 miRNA-PubMed ID pairs from Medline between January 1997 to May 2020. emiRIT currently contains miRNA-gene regulation (60,491 relations); miRNA-disease (cancer) (12,300 relations); miRNA-biological process and pathways (23,390 relations); and circulatory miRNAs in extracellular locations (3,782 relations). Biological entities and their relation to miRNAs were extracted from Medline abstracts using publicly available and in-house developed text mining tools, and the entities were normalized to facilitate querying and integration. We built a database and an interface to store and access the integrated data, respectively.ConclusionWe provide an up-to-date and user-friendly resource to facilitate access to comprehensive miRNA information from the literature on a large-scale, enabling users to navigate through different roles of miRNA and examine them in a context specific to their information needs. To assess our resource’s information coverage, in the absence of gold standards, we have conducted two case studies focusing on the target and differential expression information of miRNAs in the context of diseases. Database URL: https://research.bioinformatics.udel.edu/emirit/



2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Pritam Kundu ◽  
Bharat Manna ◽  
Subham Majumder ◽  
Amit Ghosh

Abstract The structural complexity of lignocellulosic biomass hinders the extraction of cellulose, and it has remained a challenge for decades in the biofuel production process. However, wood-feeding organisms like termite have developed an efficient natural lignocellulolytic system with the help of specialized gut microbial symbionts. Despite having an enormous amount of high-throughput metagenomic data, specific contributions of each individual microbe to achieve this lignocellulolytic functionality remains unclear. The metabolic cross-communication and interdependence that drives the community structure inside the gut microbiota are yet to be explored. We have contrived a species-wide metabolic interaction network of the termite gut-microbiome to have a system-level understanding of metabolic communication. Metagenomic data of Nasutitermes corniger have been analyzed to identify microbial communities in different gut segments. A comprehensive metabolic cross-feeding network of 205 microbes and 265 metabolites was developed using published experimental data. Reconstruction of inter-species influence network elucidated the role of 37 influential microbes to maintain a stable and functional microbiota. Furthermore, in order to understand the natural lignocellulose digestion inside N. corniger gut, the metabolic functionality of each influencer was assessed, which further elucidated 15 crucial hemicellulolytic microbes and their corresponding enzyme machinery.



2020 ◽  
Author(s):  
Nikhil Maroli ◽  
Balu Bhasuran ◽  
Jeyakumar Natarajan ◽  
Ponmalai Kolandaivel

<p></p><p></p><p>A novel coronavirus (SARS-CoV-2) has caused a major outbreak in human all over the world. There are several proteins interplay during the entry and replication of this virus in human. Here, we have used text mining and named entity recognition method to identify co-occurrence of the important COVID 19 genes/proteins in the interaction network based on the frequency of the interaction. Network analysis revealed a set of genes/proteins, highly dense genes/protein clusters and sub-networks of Angiotensin-converting enzyme 2 (ACE2), Helicase, spike (S) protein (trimeric), membrane (M) protein, envelop (E) protein, and the nucleocapsid (N) protein. The isolated proteins are screened against procyanidin-a flavonoid from plants using molecular docking. Further, molecular dynamics simulation of critical proteins such as ACE2, Mpro and spike proteins are performed to elucidate the inhibition mechanism. The strong network of hydrogen bonds and hydrophobic interactions along with van der Waals interactions inhibit receptors, which are essential to the entry and replication of the SARS-CoV-2. The binding energy which largely arises from van der Waals interactions is calculated (ACE2=-50.21 ± 6.3, Mpro=-89.50 ± 6.32 and spike=-23.06 ± 4.39) through molecular mechanics Poisson-Boltzmann surface area also confirm the affinity of procyanidin towards the critical receptors.</p><p></p><p></p>



2015 ◽  
Vol 9 ◽  
pp. BBI.S35237 ◽  
Author(s):  
Apichat Suratanee ◽  
Kitiporn Plaimas

Categorizing human diseases provides higher efficiency and accuracy for disease diagnosis, prognosis, and treatment. Disease-disease association (DDA) is a precious information that indicates the large-scale structure of complex relationships of diseases. However, the number of known and reliable associations is very small. Therefore, identification of DDAs is a challenging task in systems biology and medicine. Here, we developed a novel network-based scoring algorithm called DDA to identify the relationships between diseases in a large-scale study. Our method is developed based on a random walk prioritization in a protein-protein interaction network. This approach considers not only whether two diseases directly share associated genes but also the statistical relationships between two different diseases using known disease-related genes. Predicted associations were validated by known DDAs from a database and literature supports. The method yielded a good performance with an area under the curve of 71% and outperformed other standard association indices. Furthermore, novel DDAs and relationships among diseases from the clusters analysis were reported. This method is efficient to identify disease-disease relationships on an interaction network and can also be generalized to other association studies to further enhance knowledge in medical studies.



Sign in / Sign up

Export Citation Format

Share Document