Accelerating Text Mining Using Domain-Specific Stop Word Lists

BO-LSTM: Classifying relations via long short-term memory networks along biomedical ontologies

10.1101/336719 ◽

2018 ◽

Author(s):

Andre Lamurias ◽

Luka A. Clarke ◽

Francisco M. Couto

Keyword(s):

Deep Learning ◽

Text Mining ◽

Drug Interactions ◽

Short Term Memory ◽

Biomedical Ontologies ◽

Short Term ◽

Term Memory ◽

Domain Specific ◽

Learning Techniques ◽

Long Short Term Memory

AbstractRecent studies have proposed deep learning techniques, namely recurrent neural networks, to improve biomedical text mining tasks. However, these techniques rarely take advantage of existing domain-specific resources, such as ontologies. In Life and Health Sciences there is a vast and valuable set of such resources publicly available, which are continuously being updated. Biomedical ontologies are nowadays a mainstream approach to formalize existing knowledge about entities, such as genes, chemicals, phenotypes, and disorders. These resources contain supplementary information that may not be yet encoded in training data, particularly in domains with limited labeled data.We propose a new model, BO-LSTM, that takes advantage of domain-specific ontologies, by representing each entity as the sequence of its ancestors in the ontology. We implemented BO-LSTM as a recurrent neural network with long short-term memory units and using an open biomedical ontology, which in our case-study was Chemical Entities of Biological Interest (ChEBI). We assessed the performance of BO-LSTM on detecting and classifying drug-drug interactions in a publicly available corpus from an international challenge, composed of 792 drug descriptions and 233 scientific abstracts. By using the domain-specific ontology in addition to word embeddings and WordNet, BO-LSTM improved both the F1-score of the detection and classification of drug-drug interactions, particularly in a document set with a limited number of annotations. Our findings demonstrate that besides the high performance of current deep learning techniques, domain-specific ontologies can still be useful to mitigate the lack of labeled data.Author summaryA high quantity of biomedical information is only available in documents such as scientific articles and patents. Due to the rate at which new documents are produced, we need automatic methods to extract useful information from them. Text mining is a subfield of information retrieval which aims at extracting relevant information from text. Scientific literature is a challenge to text mining because of the complexity and specificity of the topics approached. In recent years, deep learning has obtained promising results in various text mining tasks by exploring large datasets. On the other hand, ontologies provide a detailed and sound representation of a domain and have been developed to diverse biomedical domains. We propose a model that combines deep learning algorithms with biomedical ontologies to identify relations between concepts in text. We demonstrate the potential of this model to extract drug-drug interactions from abstracts and drug descriptions. This model can be applied to other biomedical domains using an annotated corpus of documents and an ontology related to that domain to train a new classifier.

Download Full-text

Incremental Ontology Population and Enrichment through Semantic-based Text Mining

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2015070103 ◽

2015 ◽

Vol 11 (3) ◽

pp. 44-66 ◽

Cited By ~ 9

Author(s):

Saira Gillani ◽

Andrea Ko

Keyword(s):

Text Mining ◽

Domain Knowledge ◽

Learning System ◽

Domain Specific ◽

Automatic Categorization ◽

New Concepts ◽

Domain Specific Knowledge ◽

E Learning ◽

Ontology Population ◽

It Audit

Higher education and professional trainings often apply innovative e-learning systems, where ontologies are used for structuring domain knowledge. To provide up-to-date knowledge for the students, ontology has to be maintained regularly. It is especially true for IT audit and security domain, because technology is changing fast. However manual ontology population and enrichment is a complex task that require professional experience involving a lot of efforts. The authors' paper deals with the challenges and possible solutions for semi-automatic ontology enrichment and population. ProMine has two main contributions; one is the semantic-based text mining approach for automatically identifying domain-specific knowledge elements; the other is the automatic categorization of these extracted knowledge elements by using Wiktionary. ProMine ontology enrichment solution was applied in IT audit domain of an e-learning system. After ten cycles of the application ProMine, the number of automatically identified new concepts are tripled and ProMine categorized new concepts with high precision and recall.

Download Full-text

Optimal stop word selection for text mining in critical infrastructure domain

2015 Resilience Week (RWS) ◽

10.1109/rweek.2015.7287440 ◽

2015 ◽

Cited By ~ 6

Author(s):

Kasun Amarasinghe ◽

Milos Manic ◽

Ryan Hruska

Keyword(s):

Text Mining ◽

Critical Infrastructure ◽

Stop Word ◽

Selection For ◽

Word Selection

Download Full-text

Domain specific information retrieval and text mining in medical document

Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics - BCB '15 ◽

10.1145/2808719.2808726 ◽

2015 ◽

Cited By ~ 2

Author(s):

Sanghoon Lee ◽

Yanjun Zhao ◽

Mohamed Eid Mahmoud Masoud ◽

Maria Valero ◽

Semra Kul ◽

...

Keyword(s):

Information Retrieval ◽

Text Mining ◽

Specific Information ◽

Domain Specific ◽

Medical Document

Download Full-text

Some critical remarks on the stop word lists of ISI publications

Journal of Documentation ◽

10.1108/eum0000000007101 ◽

2001 ◽

Vol 57 (6) ◽

pp. 798-808

Author(s):

D.T. Tomov

Keyword(s):

Stop Word ◽

Word Lists

Download Full-text

Ontology Maintenance Through Semantic Text Mining

Innovations, Developments, and Applications of Semantic Web and Information Systems - Advances in Web Technologies and Engineering ◽

10.4018/978-1-5225-5042-6.ch013 ◽

2018 ◽

pp. 350-371 ◽

Cited By ~ 1

Author(s):

Andrea Ko ◽

Saira Gillani

Keyword(s):

Text Mining ◽

Learning System ◽

Professional Experience ◽

Complex Task ◽

Domain Specific ◽

New Concepts ◽

Domain Specific Knowledge ◽

E Learning ◽

Ontology Population ◽

It Audit

Manual ontology population and enrichment is a complex task that require professional experience involving a lot of efforts. The authors' paper deals with the challenges and possible solutions for semi-automatic ontology enrichment and population. ProMine has two main contributions; one is the semantic-based text mining approach for automatically identifying domain-specific knowledge elements; the other is the automatic categorization of these extracted knowledge elements by using Wiktionary. ProMine ontology enrichment solution was applied in IT audit domain of an e-learning system. After seven cycles of the application ProMine, the number of automatically identified new concepts are significantly increased and ProMine categorized new concepts with high precision and recall.

Download Full-text