biological databases
Recently Published Documents


TOTAL DOCUMENTS

320
(FIVE YEARS 83)

H-INDEX

20
(FIVE YEARS 5)

Information ◽  
2022 ◽  
Vol 13 (1) ◽  
pp. 27
Author(s):  
Diego Garat ◽  
Dina Wonsever

In order to provide open access to data of public interest, it is often necessary to perform several data curation processes. In some cases, such as biological databases, curation involves quality control to ensure reliable experimental support for biological sequence data. In others, such as medical records or judicial files, publication must not interfere with the right to privacy of the persons involved. There are also interventions in the published data with the aim of generating metadata that enable a better experience of querying and navigation. In all cases, the curation process constitutes a bottleneck that slows down general access to the data, so it is of great interest to have automatic or semi-automatic curation processes. In this paper, we present a solution aimed at the automatic curation of our National Jurisprudence Database, with special focus on the process of the anonymization of personal information. The anonymization process aims to hide the names of the participants involved in a lawsuit without losing the meaning of the narrative of facts. In order to achieve this goal, we need, not only to recognize person names but also resolve co-references in order to assign the same label to all mentions of the same person. Our corpus has significant differences in the spelling of person names, so it was clear from the beginning that pre-existing tools would not be able to reach a good performance. The challenge was to find a good way of injecting specialized knowledge about person names syntax while taking profit of previous capabilities of pre-trained tools. We fine-tuned an NER analyzer and we built a clusterization algorithm to solve co-references between named entities. We present our first results, which, for both tasks, are promising: We obtained a 90.21% of F1-micro in the NER task—from a 39.99% score before retraining the same analyzer in our corpus—and a 95.95% ARI score in clustering for co-reference resolution.


Author(s):  
Isadora Louise Alves da Costa Ribeiro Quintans ◽  
João Victor Alcoforado de Araújo ◽  
Lívia Noêmia Morais Rocha ◽  
Annie Elisabeth Beltrão de Andrade ◽  
Thaís Gaudencio do Rêgo ◽  
...  

: Antimicrobial peptides (AMPs) are small, ribosomally synthesized proteins found in nearly all forms of life. In plants, AMPs play a central role in plant defense due to their distinct physicochemical properties. Due to their broad-spectrum antimicrobial activity and rapid killing action, plant AMPs have become important candidates for the development of new drugs to control plant and animal pathogens that are resistant to multiple drugs. Further research is required to explore the potential uses of these natural compounds. Computational strategies have been increasingly used to understand key aspects of antimicrobial peptides. These strategies will help to minimize the time and cost of "wet-lab" experimentation. Researchers have developed various tools and databases to provide updated information on AMPs. However, despite the increased availability of antimicrobial peptide resources in biological databases, finding AMPs from plants can still be a difficult task. The number of plant AMP sequences in current databases is still small and yet often redundant. To facilitate further characterization of plant AMPs, we have summarized information on the location, distribution, and annotations of plant AMPs available in the most relevant databases for AMPs research. We also mapped and categorized the bioinformatics tools available in these databases. We expect that this will allow researchers to advance in the discovery and development of new plant AMPs with potent biological properties. We hope to provide insights to further expand the application of AMPs in the fields of biotechnology, pharmacy, and agriculture.


2021 ◽  
pp. 11-31
Author(s):  
Basant K. Tiwary
Keyword(s):  

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Sunmyoung Lee ◽  
Tamiko Ono ◽  
Kiyoko Aoki-Kinoshita

Abstract Background The abundance of glycomics data that have accumulated has led to the development of many useful databases to aid in the understanding of the function of the glycans and their impact on cellular activity. At the same time, the endeavor for data sharing between glycomics databases with other biological databases have contributed to the creation of new knowledgebases. However, different data types in data description have impeded the data sharing for knowledge integration. To solve this matter, Semantic Web techniques including Resource Description Framework (RDF) and ontology development have been adopted by various groups to standardize the format for data exchange. These semantic data have contributed to the expansion of knowledgebases and hold promises of providing data that can be intelligently processed. On the other hand, bench biologists who are experts in experimental finding are end users and data producers. Therefore, it is indispensable to reduce the technical barrier required for bench biologists to manipulate their experimental data to be compatible with standard formats for data sharing. Results There are many essential concepts and practical techniques for data integration but there is no method to enable researchers to easily apply Semantic Web techniques to their experimental data. We implemented our procedure on unformatted information of E.coli O-antigen structures collected from the web and show how this information can be expressed as formatted data applicable to Semantic Web standards. In particular, we described the E-coli O-antigen biosynthesis pathway using the BioPAX ontology developed to support data exchange between pathway databases. Conclusions The method we implemented to semantically describe O-antigen biosynthesis should be helpful for biologists to understand how glycan information, including relevant pathway reaction data, can be easily shared. We hope this method can contribute to lower the technical barrier that is required when experimental findings are formulated into formal representations and can lead bench scientists to readily participate in the construction of new knowledgebases that are integrated with existing ones. Such integration over the Semantic Web will enable future work in artificial intelligence and machine learning to enable computers to infer new relationships and hypotheses in the life sciences.


2021 ◽  
Vol 17 (11) ◽  
pp. e1009550
Author(s):  
Marzia Di Filippo ◽  
Chiara Damiani ◽  
Dario Pescini

Metabolic network models are increasingly being used in health care and industry. As a consequence, many tools have been released to automate their reconstruction process de novo. In order to enable gene deletion simulations and integration of gene expression data, these networks must include gene-protein-reaction (GPR) rules, which describe with a Boolean logic relationships between the gene products (e.g., enzyme isoforms or subunits) associated with the catalysis of a given reaction. Nevertheless, the reconstruction of GPRs still remains a largely manual and time consuming process. Aiming at fully automating the reconstruction process of GPRs for any organism, we propose the open-source python-based framework GPRuler. By mining text and data from 9 different biological databases, GPRuler can reconstruct GPRs starting either from just the name of the target organism or from an existing metabolic model. The performance of the developed tool is evaluated at small-scale level for a manually curated metabolic model, and at genome-scale level for three metabolic models related to Homo sapiens and Saccharomyces cerevisiae organisms. By exploiting these models as benchmarks, the proposed tool shown its ability to reproduce the original GPR rules with a high level of accuracy. In all the tested scenarios, after a manual investigation of the mismatches between the rules proposed by GPRuler and the original ones, the proposed approach revealed to be in many cases more accurate than the original models. By complementing existing tools for metabolic network reconstruction with the possibility to reconstruct GPRs quickly and with a few resources, GPRuler paves the way to the study of context-specific metabolic networks, representing the active portion of the complete network in given conditions, for organisms of industrial or biomedical interest that have not been characterized metabolically yet.


2021 ◽  
Author(s):  
Rajdeep Singh

Bioinformatics is a new branch of the science world. Bioinformatics is a multidisciplinary approach. We use bioinformatics to understand biology information and save it into the biological database.Apply data science on biological databases, discover a new drug, and modify extinction drugs to improve human life.


Author(s):  
Chuming Chen ◽  
Karen E Ross ◽  
Sachin Gavali ◽  
Julie E Cowart ◽  
Cathy H Wu

Abstract Summary The global response to the COVID-19 pandemic has led to a rapid increase of scientific literature on this deadly disease. Extracting knowledge from biomedical literature and integrating it with relevant information from curated biological databases is essential to gain insight into COVID-19 etiology, diagnosis, and treatment. We used Semantic Web technology RDF to integrate COVID-19 knowledge mined from literature by iTextMine, PubTator, and SemRep with relevant biological databases and formalized the knowledge in a standardized and computable COVID-19 Knowledge Graph (KG). We published the COVID-19 KG via a SPARQL endpoint to support federated queries on the Semantic Web and developed a knowledge portal with browsing and searching interfaces. We also developed a RESTful API to support programmatic access and provided RDF dumps for download. Availability and implementation The COVID-19 Knowledge Graph is publicly available under CC-BY 4.0 license at https://research.bioinformatics.udel.edu/covid19kg/.


2021 ◽  
Vol 19 (3) ◽  
pp. e27
Author(s):  
Pierre Larmande ◽  
Yusha Liu ◽  
Xinzhi Yao ◽  
Jingbo Xia

Due to the rapid evolution of high-throughput technologies, a tremendous amount of data is being produced in the biological domain, which poses a challenging task for information extraction and natural language understanding. Biological named entity recognition (NER) and named entity normalisation (NEN) are two common tasks aiming at identifying and linking biologically important entities such as genes or gene products mentioned in the literature to biological databases. In this paper, we present an updated version of OryzaGP, a gene and protein dataset for rice species created to help natural language processing (NLP) tools in processing NER and NEN tasks. To create the dataset, we selected more than 15,000 abstracts associated with articles previously curated for rice genes. We developed four dictionaries of gene and protein names associated with database identifiers. We used these dictionaries to annotate the dataset. We also annotated the dataset using pre-trained NLP models. Finally, we analysed the annotation results and discussed how to improve OryzaGP.


2021 ◽  
Vol 19 (3) ◽  
pp. e22
Author(s):  
Oscar Lithgow-Serrano ◽  
Joseph Cornelius ◽  
Vani Kanjirangat ◽  
Carlos-Francisco Méndez-Cruz ◽  
Fabio Rinaldi

Automatic document classification for highly interrelated classes is a demanding task that becomes more challenging when there is little labeled data for training. Such is the case of the coronavirus disease 2019 (COVID-19) Clinical repository—a repository of classified and translated academic articles related to COVID-19 and relevant to the clinical practice—where a 3-way classification scheme is being applied to COVID-19 literature. During the 7th Biomedical Linked Annotation Hackathon (BLAH7) hackathon, we performed experiments to explore the use of named-entity-recognition (NER) to improve the classification. We processed the literature with OntoGene’s Biomedical Entity Recogniser (OGER) and used the resulting identified Named Entities (NE) and their links to major biological databases as extra input features for the classifier. We compared the results with a baseline model without the OGER extracted features. In these proof-of-concept experiments, we observed a clear gain on COVID-19 literature classification. In particular, NE’s origin was useful to classify document types and NE’s type for clinical specialties. Due to the limitations of the small dataset, we can only conclude that our results suggests that NER would benefit this classification task. In order to accurately estimate this benefit, further experiments with a larger dataset would be needed.


2021 ◽  
Vol 5 (1) ◽  
Author(s):  
Alvhild Alette Bjørkum ◽  
Ana Carrasco Duran ◽  
Berven Frode ◽  
Dola Sinha Roy ◽  
Karen Rosendahl ◽  
...  

Abstract Background The aim of this study was to discover significantly changed proteins in human blood serum after loss of 6 h sleep at night. Furthermore, to reveal affected biological process- and molecular function categories that might be clinically relevant, by exploring systems biological databases. Methods Eight females were recruited by volunteer request. Peripheral venous whole blood was sampled at 04:00 am, after 6 h of sleep and after 6 h of sleep deprivation. We used within-subjects design (all subjects were their own control). Blood serum from each subject was depleted before protein digestion by trypsin and iTRAQ labeling. Labled peptides were analyzed by mass spectrometry (LTQ OritrapVelos Elite) connected to a LC system (Dionex Ultimate NCR-3000RS). Results We identified 725 proteins in human blood serum. 34 proteins were significantly differentially expressed after 6 h of sleep deprivation at night. Out of 34 proteins, 14 proteins were up-regulated, and 20 proteins were down-regulated. We emphasized the functionality of the 16 proteins commonly differentiated in all 8 subjects and the relation to pathological conditions. In addition, we discussed Histone H4 (H4) and protein S100-A6/Calcyclin (S10A6) that were upregulated more than 1.5-fold. Finally, we discussed affected biological process- and molecular function categories. Conclusions Overall, our study suggest that acute sleep deprivation, at least in females, affects several known biological processes- and molecular function categories and associates to proteins that also are changed under pathological conditions like impaired coagulation, oxidative stress, immune suppression, neurodegenerative related disorder, and cancer. Data are available via ProteomeXchange with identifier PXD021004.


Sign in / Sign up

Export Citation Format

Share Document