database curation
Recently Published Documents


TOTAL DOCUMENTS

34
(FIVE YEARS 8)

H-INDEX

8
(FIVE YEARS 0)

Biomolecules ◽  
2021 ◽  
Vol 11 (11) ◽  
pp. 1591
Author(s):  
Prashant Srivastava ◽  
Saptarshi Bej ◽  
Kristina Yordanova ◽  
Olaf Wolkenhauer

For any molecule, network, or process of interest, keeping up with new publications on these is becoming increasingly difficult. For many cellular processes, the amount molecules and their interactions that need to be considered can be very large. Automated mining of publications can support large-scale molecular interaction maps and database curation. Text mining and Natural-Language-Processing (NLP)-based techniques are finding their applications in mining the biological literature, handling problems such as Named Entity Recognition (NER) and Relationship Extraction (RE). Both rule-based and Machine-Learning (ML)-based NLP approaches have been popular in this context, with multiple research and review articles examining the scope of such models in Biological Literature Mining (BLM). In this review article, we explore self-attention-based models, a special type of Neural-Network (NN)-based architecture that has recently revitalized the field of NLP, applied to biological texts. We cover self-attention models operating either at the sentence level or an abstract level, in the context of molecular interaction extraction, published from 2019 onwards. We conducted a comparative study of the models in terms of their architecture. Moreover, we also discuss some limitations in the field of BLM that identifies opportunities for the extraction of molecular interactions from biological text.


Author(s):  
Prashant Srivastava ◽  
Saptarshi Bej ◽  
Kristina Yordanova ◽  
Olaf Wolkenhauer

For any molecule, network, or process of interest, to keep up with new publications on these, is becoming increasingly difficult. For many cellular processes, molecules and their interactions that need to be considered can be very large. Automated mining of publications can support large scale molecular interaction maps and database curation. Text mining and Natural Language Processing (NLP)-based techniques are finding their applications in mining the biological literature, handling problems such as Named Entity Recognition (NER) and Relationship Extraction (RE). Both rule-based and machine learning (ML)-based NLP approaches have been popular in this context, with multiple research and review articles examining the scope of such models in Biological Literature Mining (BLM). In this review article, we explore self-attention based models, a special type of neural network (NN)-based architectures that have recently revitalized the field of NLP, applied to biological texts. We cover self-attention models operating either at a sentence level or an abstract level, in the context of molecular interaction extraction, published from 2019 onwards. We conduct a comparative study of the models in terms of their architecture. Moreover, we also discuss some limitations in the field of BLM that identifies opportunities for the extraction of molecular interactions from biological text.


2021 ◽  
Author(s):  
Jack D. Evans ◽  
Volodymyr Bon ◽  
Irena Senkovska ◽  
Stefan Kaskel

<div>New advanced adsorbents are a crucial driver for the development of energy and environmental applications. Tremendous potential is provided by machine learning and data mining techniques, as these approaches can identify the most appropriate adsorbent for a particular application. However, the current scientific reporting of adsorption isotherms in graphs and figures is not adequate to reproduce original experimentally measured data.</div><div><br></div><div>This report proposes the specification of a new standard adsorption information file (AIF) inspired by the ubiquitous crystallographic information file (CIF) and based on the self-defining text archive and retrieval (STAR) procedure, also used to represent biological nuclear magnetic resonance experiments (NMR-STAR). The AIF is a flexible and easily extended free-format archive file that is readily human and machine readable</div><div>and is simple to edit using a basic text editor or parse for database curation. This format represents the first steps toward an open adsorption data format as a basis for a decentralized adsorption data library.</div><div><br></div><div>An open format facilitates the electronic transmission of adsorption data between laboratories, journals and larger databases, which is key in the effort to increase open science in the field of porous materials in the future.</div>


2021 ◽  
Author(s):  
Jack D. Evans ◽  
Volodymyr Bon ◽  
Irena Senkovska ◽  
Stefan Kaskel

<div>New advanced adsorbents are a crucial driver for the development of energy and environmental applications. Tremendous potential is provided by machine learning and data mining techniques, as these approaches can identify the most appropriate adsorbent for a particular application. However, the current scientific reporting of adsorption isotherms in graphs and figures is not adequate to reproduce original experimentally measured data.</div><div><br></div><div>This report proposes the specification of a new standard adsorption information file (AIF) inspired by the ubiquitous crystallographic information file (CIF) and based on the self-defining text archive and retrieval (STAR) procedure, also used to represent biological nuclear magnetic resonance experiments (NMR-STAR). The AIF is a flexible and easily extended free-format archive file that is readily human and machine readable</div><div>and is simple to edit using a basic text editor or parse for database curation. This format represents the first steps toward an open adsorption data format as a basis for a decentralized adsorption data library.</div><div><br></div><div>An open format facilitates the electronic transmission of adsorption data between laboratories, journals and larger databases, which is key in the effort to increase open science in the field of porous materials in the future.</div>


Metabolites ◽  
2020 ◽  
Vol 10 (9) ◽  
pp. 368
Author(s):  
Huan Jin ◽  
Joshua M. Mitchell ◽  
Hunter N. B. Moseley

Metabolic flux analysis requires both a reliable metabolic model and reliable metabolic profiles in characterizing metabolic reprogramming. Advances in analytic methodologies enable production of high-quality metabolomics datasets capturing isotopic flux. However, useful metabolic models can be difficult to derive due to the lack of relatively complete atom-resolved metabolic networks for a variety of organisms, including human. Here, we developed a neighborhood-specific graph coloring method that creates unique identifiers for each atom in a compound facilitating construction of an atom-resolved metabolic network. What is more, this method is guaranteed to generate the same identifier for symmetric atoms, enabling automatic identification of possible additional mappings caused by molecular symmetry. Furthermore, a compound coloring identifier derived from the corresponding atom coloring identifiers can be used for compound harmonization across various metabolic network databases, which is an essential first step in network integration. With the compound coloring identifiers, 8865 correspondences between KEGG (Kyoto Encyclopedia of Genes and Genomes) and MetaCyc compounds are detected, with 5451 of them confirmed by other identifiers provided by the two databases. In addition, we found that the Enzyme Commission numbers (EC) of reactions can be used to validate possible correspondence pairs, with 1848 unconfirmed pairs validated by commonality in reaction ECs. Moreover, we were able to detect various issues and errors with compound representation in KEGG and MetaCyc databases by compound coloring identifiers, demonstrating the usefulness of this methodology for database curation.


2020 ◽  
Author(s):  
Huan Jin ◽  
Joshua M. Mitchell ◽  
Hunter N.B. Moseley

AbstractMetabolic flux analysis requires both a reliable metabolic model and metabolic profiles in characterizing metabolic reprogramming. Advances in analytic methodologies enable production of high-quality metabolomics datasets capturing isotopic flux. However, useful metabolic models can be difficult to derive due to the lack of relatively complete atom-resolved metabolic networks for a variety of organisms, including human. Here, we developed a graph coloring method that creates unique identifiers for each atom in a compound facilitating construction of an atom-resolved metabolic network. What is more, this method is guaranteed to generate the same identifier for symmetric atoms, enabling automatic identification of possible additional mappings caused by molecular symmetry. Furthermore, a compound coloring identifier derived from the corresponding atom coloring identifiers can be used for compound harmonization across various metabolic network databases, which is an essential first step in network integration. With the compound coloring identifiers, 8865 correspondences between KEGG and MetaCyc compounds are detected, with 5451 of them confirmed by other identifiers provided by the two databases. In addition, we found that the Enzyme Commission numbers (EC) of reactions can be used to validate possible correspondence pairs, with 1848 unconfirmed pairs validated by commonality in reaction ECs. Moreover, we were able to detect various issues and errors with compound representation in KEGG and MetaCyc databases by compound coloring identifiers, demonstrating the usefulness of this methodology for database curation.


2020 ◽  
pp. 194-225
Author(s):  
Giovanna Ilaria Passeri ◽  
Daniela Trisciuzzi ◽  
Domenico Alberga ◽  
Lydia Siragusa ◽  
Francesco Leonetti ◽  
...  

Virtual screening represents an effective computational strategy to rise-up the chances of finding new bioactive compounds by accelerating the time needed to move from an initial intuition to market. Classically, the most pursued approaches rely on ligand- and structure-based studies, the former employed when structural data information about the target is missing while the latter employed when X-ray/NMR solved or homology models are instead available for the target. The authors will focus on the most advanced techniques applied in this area. In particular, they will survey the key concepts of virtual screening by discussing how to properly select chemical libraries, how to make database curation, how to applying and- and structure-based techniques, how to wisely use post-processing methods. Emphasis will be also given to the most meaningful databases used in VS protocols. For the ease of discussion several examples will be presented.


2018 ◽  
Vol 19 (1) ◽  
Author(s):  
Juan Miguel Cejuela ◽  
Shrikant Vinchurkar ◽  
Tatyana Goldberg ◽  
Madhukar Sollepura Prabhu Shankar ◽  
Ashish Baghudana ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document