BioReader: a text mining tool for performing classification of biomedical literature

AbstractInterpretation of a given variant’s pathogenicity is one of the most profound challenges to realizing the promise of genomic medicine. A large amount of information about associations between variants and diseases used by curators and researchers for interpreting variant pathogenicity is buried in biomedical literature. The development of text-mining tools that can extract relevant information from the literature will speed up and assist the variant interpretation curation process. In this work, we present a text-mining tool, MACE2k that extracts evidence sentences containing associations between variants and diseases from full-length PMC Open Access articles. We use different machine learning models (classical and deep learning) to identify evidence sentences with variant-disease associations. Evaluation shows promising results with the best F1-score of 82.9% and AUC-ROC of 73.9%. Classical ML models had a better recall (96.6% for Random Forest) compared to deep learning models. The deep learning model, Convolutional Neural Network had the best precision (75.6%), which is essential for any curation task.

Download Full-text

ProPheno: An online dataset for completely characterizing the human protein-phenotype landscape in biomedical literature

10.7287/peerj.preprints.27479v1 ◽

2019 ◽

Author(s):

Morteza Pourreza Shahri ◽

Indika Kahanda

Keyword(s):

Text Mining ◽

Predictive Models ◽

Complex Diseases ◽

Biomedical Literature ◽

Human Protein ◽

Mining Tool ◽

Text Mining Tool

Identifying protein-phenotype relations is of paramount importance for applications such as uncovering rare and complex diseases. One of the best resources that captures the protein-phenotype relationships is the biomedical literature. In this work, we introduce ProPheno, a comprehensive online dataset composed of human protein/phenotype mentions extracted from the complete corpora of Medline and PubMed. Moreover, it includes co-occurrences of protein-phenotype pairs within different spans of text such as sentences and paragraphs. We use ProPheno for completely characterizing the human protein-phenotype landscape in biomedical literature. ProPheno, the reported findings and the gained insight has implications for (1) biocurators for expediting their curation efforts, (2) researches for quickly finding relevant articles, and (3) text mining tool developers for training their predictive models. The RESTful API of ProPheno is freely available at http://propheno.cs.montana.edu.

Download Full-text

ProPheno 1.0: An online dataset for accelerating the complete characterization of the human protein-phenotype landscape in biomedical literature

10.7287/peerj.preprints.27479 ◽

2019 ◽

Author(s):

Morteza Pourreza Shahri ◽

Indika Kahanda

Keyword(s):

Open Access ◽

Text Mining ◽

Predictive Models ◽

Complete Characterization ◽

Biomedical Literature ◽

Human Protein ◽

Pubmed Central ◽

Mining Tool ◽

Text Mining Tool

Identifying protein-phenotype relations is of paramount importance for applications such as uncovering rare and complex diseases. One of the best resources that captures the protein-phenotype relationships is the biomedical literature. In this work, we introduce ProPheno, a comprehensive online dataset composed of human protein/phenotype mentions extracted from the complete corpora of Medline and PubMed Central Open Access. Moreover, it includes co-occurrences of protein-phenotype pairs within different spans of text such as sentences and paragraphs. We use ProPheno for completely characterizing the human protein-phenotype landscape in biomedical literature. ProPheno, the reported findings and the gained insight has implications for (1) biocurators for expediting their curation efforts, (2) researches for quickly finding relevant articles, and (3) text mining tool developers for training their predictive models. The RESTful API of ProPheno is freely available at http://propheno.cs.montana.edu.

Download Full-text

On expert curation and sustainability: UniProtKB/Swiss-Prot as a case study

10.1101/094011 ◽

2016 ◽

Cited By ~ 3

Author(s):

Sylvain Poux ◽

Cecilia N. Arighi ◽

Michele Magrane ◽

Alex Bateman ◽

Chih-Hsuan Wei ◽

...

Keyword(s):

Text Mining ◽

Scientific Community ◽

Large Fraction ◽

Scientific Research ◽

Biomedical Literature ◽

Complete Picture ◽

Essential Component ◽

Mining Tool ◽

Text Mining Tool

AbstractMOTIVATIONBiological knowledgebases, such as UniProtKB/Swiss-Prot, constitute an essential component of daily scientific research by offering distilled, summarized, and computable knowledge extracted from the literature by expert curators. While knowledgebases play an increasingly important role in the scientific community, the question of their sustainability is raised due to the growth of biomedical literature.RESULTSBy using UniProtKB/Swiss-Prot as a case study, we address this question by using different literature triage approaches. With the assistance of the PubTator text-mining tool, we tagged more than 10,000 articles to assess the ratio of papers relevant for curation. We first show that curators read and evaluate many more papers than they curate, and that measuring the number of curated publications is insufficient to provide a complete picture. We show that a large fraction of published papers found in PubMed is not relevant for curation in UniProtKB/Swiss-Prot and demonstrate that, despite appearances, expert curation is sustainable.AVAILABILITYUniProt is freely available at http://www.uniprot.org/[email protected]

Download Full-text

ProPheno 1.0: An online dataset for accelerating the complete characterization of the human protein-phenotype landscape in biomedical literature

10.7287/peerj.preprints.27479v2 ◽

2019 ◽

Cited By ~ 2

Author(s):

Morteza Pourreza Shahri ◽

Indika Kahanda

Keyword(s):

Open Access ◽

Text Mining ◽

Predictive Models ◽

Complete Characterization ◽

Biomedical Literature ◽

Human Protein ◽

Pubmed Central ◽

Mining Tool ◽

Text Mining Tool

Identifying protein-phenotype relations is of paramount importance for applications such as uncovering rare and complex diseases. One of the best resources that captures the protein-phenotype relationships is the biomedical literature. In this work, we introduce ProPheno, a comprehensive online dataset composed of human protein/phenotype mentions extracted from the complete corpora of Medline and PubMed Central Open Access. Moreover, it includes co-occurrences of protein-phenotype pairs within different spans of text such as sentences and paragraphs. We use ProPheno for completely characterizing the human protein-phenotype landscape in biomedical literature. ProPheno, the reported findings and the gained insight has implications for (1) biocurators for expediting their curation efforts, (2) researches for quickly finding relevant articles, and (3) text mining tool developers for training their predictive models. The RESTful API of ProPheno is freely available at http://propheno.cs.montana.edu.

Download Full-text

Identification of Treosulfan-Induced Myalgia in Pediatric Hematopoietic Stem Cell Transplantation Using an Electronic Health Record Text Mining Tool

Transplantation and Cellular Therapy ◽

10.1016/s2666-6367(21)00220-7 ◽

2021 ◽

Vol 27 (3) ◽

pp. S178-S179

Author(s):

Eileen van der Stoep ◽

Dagmar Berghuis ◽

Robbert Bredius ◽

Emilie Buddingh ◽

Alex Mohseny ◽

...

Keyword(s):

Stem Cell ◽

Text Mining ◽

Hematopoietic Stem Cell Transplantation ◽

Stem Cell Transplantation ◽

Electronic Health Record ◽

Cell Transplantation ◽

Health Record ◽

Hematopoietic Stem ◽

Mining Tool ◽

Text Mining Tool

Download Full-text

Evaluation of carcinogenic modes of action for pesticides in fruit on the Swedish market using a text-mining tool

Frontiers in Pharmacology ◽

10.3389/fphar.2014.00145 ◽

2014 ◽

Vol 5 ◽

Cited By ~ 6

Author(s):

Ilona Silins ◽

Anna Korhonen ◽

Ulla Stenius

Keyword(s):

Text Mining ◽

Modes Of Action ◽

Mining Tool ◽

Text Mining Tool

Download Full-text

MedMiner: An Internet Text-Mining Tool for Biomedical Information, with Application to Gene Expression Profiling

BioTechniques ◽

10.2144/99276bc03 ◽

1999 ◽

Vol 27 (6) ◽

pp. 1210-1217 ◽

Cited By ~ 117

Author(s):

L. Tanabe ◽

U. Scherf ◽

L.H. Smith ◽

J.K. Lee ◽

L. Hunter ◽

...

Keyword(s):

Gene Expression ◽

Text Mining ◽

Gene Expression Profiling ◽

Expression Profiling ◽

Mining Tool ◽

Text Mining Tool

Download Full-text

SicknessMiner: a deep-learning-driven text-mining tool to abridge disease-disease associations

BMC Bioinformatics ◽

10.1186/s12859-021-04397-w ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Nícia Rosário-Ferreira ◽

Victor Guimarães ◽

Vítor S. Costa ◽

Irina S. Moreira

Keyword(s):

Text Mining ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Disease Similarity ◽

Disease Associations ◽

Named Entity Normalization ◽

Mining Tool ◽

Or Gene ◽

Text Mining Tool

Abstract Background Blood cancers (BCs) are responsible for over 720 K yearly deaths worldwide. Their prevalence and mortality-rate uphold the relevance of research related to BCs. Despite the availability of different resources establishing Disease-Disease Associations (DDAs), the knowledge is scattered and not accessible in a straightforward way to the scientific community. Here, we propose SicknessMiner, a biomedical Text-Mining (TM) approach towards the centralization of DDAs. Our methodology encompasses Named Entity Recognition (NER) and Named Entity Normalization (NEN) steps, and the DDAs retrieved were compared to the DisGeNET resource for qualitative and quantitative comparison. Results We obtained the DDAs via co-mention using our SicknessMiner or gene- or variant-disease similarity on DisGeNET. SicknessMiner was able to retrieve around 92% of the DisGeNET results and nearly 15% of the SicknessMiner results were specific to our pipeline. Conclusions SicknessMiner is a valuable tool to extract disease-disease relationship from RAW input corpus.

Download Full-text