Exploiting external/domain knowledge to enhance traditional text mining using graph-based methods

2021 ◽  
Author(s):  
Xiaodan Zhang
Author(s):  
Saira Gillani ◽  
Andrea Ko

Higher education and professional trainings often apply innovative e-learning systems, where ontologies are used for structuring domain knowledge. To provide up-to-date knowledge for the students, ontology has to be maintained regularly. It is especially true for IT audit and security domain, because technology is changing fast. However manual ontology population and enrichment is a complex task that require professional experience involving a lot of efforts. The authors' paper deals with the challenges and possible solutions for semi-automatic ontology enrichment and population. ProMine has two main contributions; one is the semantic-based text mining approach for automatically identifying domain-specific knowledge elements; the other is the automatic categorization of these extracted knowledge elements by using Wiktionary. ProMine ontology enrichment solution was applied in IT audit domain of an e-learning system. After ten cycles of the application ProMine, the number of automatically identified new concepts are tripled and ProMine categorized new concepts with high precision and recall.


2011 ◽  
pp. 2074-2084
Author(s):  
Francisco M. Couto

This chapter introduces the use of Text Mining in scientific literature for biological research, with a special focus on automatic gene and protein annotation. This field became recently a major topic in Bioinformatics, motivated by the opportunity brought by tapping the BioLiterature with automatic text processing software. The chapter describes the main approaches adopted and analyzes systems that have been developed for automatically annotating genes or proteins. To illustrate how text-mining tools fit in biological databases curation processes, the chapter presents a tool that assists protein annotation. Besides the promising advances of Text Mining of BioLiterature, many problems need to be addressed. This chapter presents the main open problems in using text-mining tools for automatic annotation of genes and proteins, and discusses how a more efficient integration of existing domain knowledge can improve the performance of these tools.


Author(s):  
Francisco M. Couto ◽  
Mario J. Silva

This chapter introduces the use of Text Mining in scientific literature for biological research, with a special focus on automatic gene and protein annotation. This field became recently a major topic in Bioinformatics, motivated by the opportunity brought by tapping the BioLiterature with automatic text processing software. The chapter describes the main approaches adopted and analyzes systems that have been developed for automatically annotating genes or proteins. To illustrate how text-mining tools fit in biological databases curation processes, the chapter presents a tool that assists protein annotation. Besides the promising advances of Text Mining of BioLiterature, many problems need to be addressed. This chapter presents the main open problems in using text-mining tools for automatic annotation of genes and proteins, and discusses how a more efficient integration of existing domain knowledge can improve the performance of these tools.


2021 ◽  
pp. 297-315
Author(s):  
Alireza Tamaddoni-Nezhad ◽  
David Bohan ◽  
Ghazal Afroozi Milani ◽  
Alan Raybould ◽  
Stephen Muggleton

Humanity is facing existential, societal challenges related to food security, ecosystem conservation, antimicrobial resistance, etc, and Artificial Intelligence (AI) is already playing an important role in tackling these new challenges. Most current AI approaches are limited when it comes to ‘knowledge transfer’ with humans, i.e. it is difficult to incorporate existing human knowledge and also the output knowledge is not human comprehensible. In this chapter we demonstrate how a combination of comprehensible machine learning, text-mining and domain knowledge could enhance human-machine collaboration for the purpose of automated scientific discovery where humans and computers jointly develop and evaluate scientific theories. As a case study, we describe a combination of logic-based machine learning (which included human-encoded ecological background knowledge) and text-mining from scientific publications (to verify machine-learned hypotheses) for the purpose of automated discovery of ecological interaction networks (food-webs) to detect change in agricultural ecosystems using the Farm Scale Evaluations (FSEs) of genetically modified herbicide-tolerant (GMHT) crops dataset. The results included novel food-web hypotheses, some confirmed by subsequent experimental studies (e.g. DNA analysis) and published in scientific journals. These machine-leaned food-webs were also used as the basis of a recent study revealing resilience of agro-ecosystems to changes in farming management using GMHT crops.


2020 ◽  
Vol 33 (5) ◽  
pp. 1357-1380
Author(s):  
Yilu Zhou ◽  
Yuan Xue

PurposeStrategic alliances among organizations are some of the central drivers of innovation and economic growth. However, the discovery of alliances has relied on pure manual search and has limited scope. This paper proposes a text-mining framework, ACRank, that automatically extracts alliances from news articles. ACRank aims to provide human analysts with a higher coverage of strategic alliances compared to existing databases, yet maintain a reasonable extraction precision. It has the potential to discover alliances involving less well-known companies, a situation often neglected by commercial databases.Design/methodology/approachThe proposed framework is a systematic process of alliance extraction and validation using natural language processing techniques and alliance domain knowledge. The process integrates news article search, entity extraction, and syntactic and semantic linguistic parsing techniques. In particular, Alliance Discovery Template (ADT) identifies a number of linguistic templates expanded from expert domain knowledge and extract potential alliances at sentence-level. Alliance Confidence Ranking (ACRank)further validates each unique alliance based on multiple features at document-level. The framework is designed to deal with extremely skewed, noisy data from news articles.FindingsIn evaluating the performance of ACRank on a gold standard data set of IBM alliances (2006–2008) showed that: Sentence-level ADT-based extraction achieved 78.1% recall and 44.7% precision and eliminated over 99% of the noise in news articles. ACRank further improved precision to 97% with the top20% of extracted alliance instances. Further comparison with Thomson Reuters SDC database showed that SDC covered less than 20% of total alliances, while ACRank covered 67%. When applying ACRank to Dow 30 company news articles, ACRank is estimated to achieve a recall between 0.48 and 0.95, and only 15% of the alliances appeared in SDC.Originality/valueThe research framework proposed in this paper indicates a promising direction of building a comprehensive alliance database using automatic approaches. It adds value to academic studies and business analyses that require in-depth knowledge of strategic alliances. It also encourages other innovative studies that use text mining and data analytics to study business relations.


Author(s):  
A. Jimeno-Yepes ◽  
R. Berlanga-Llavori ◽  
D. Rebholz-Schuchmann

Ontologies represent domain knowledge that improves user interaction and interoperability between applications. In addition, ontologies deliver precious input to text mining techniques in the biomedical domain, which might improve the performance in different text mining tasks. This chapter will explore on the mutual benefits for ontologies and text mining techniques. Ontology development is a time consuming task. Most efforts are spent in the acquisition of terms that represent concepts in real life. This process can use the existing scientific literature and the World Wide Web. The identification of concept labels, i.e. terms, from these sources using text mining solutions improves ontology development since the literature resources make reference to existing terms and concepts. Furthermore, automatic text processing techniques profit from ontological resources in different tasks, for example in the disambiguation of terms and the enrichment of terminological resources for the text mining solution. One of the most important text mining tasks that exploits ontological resources consists of the mapping of concepts to terms in textual sources (e.g. named entity recognition, semantic indexing) and the expansion of queries in information retrieval.


2019 ◽  
Vol 27 (2) ◽  
pp. 458-482 ◽  
Author(s):  
Shenghua Zhou ◽  
S. Thomas Ng ◽  
Sang Hoon Lee ◽  
Frank J. Xu ◽  
Yifan Yang

Purpose In the architecture, engineering and construction (AEC) industry, technology developers have difficulties in fully understanding user needs due to the high domain knowledge threshold and the lack of effective and efficient methods to minimise information asymmetry between technology developers and AEC users. The paper aims to discuss this issue. Design/methodology/approach A synthetic approach combining domain knowledge and text mining techniques is proposed to help capture user needs, which is demonstrated using building information modelling (BIM) apps as a case. The synthetic approach includes the: collection and cleansing of BIM apps’ attribute data and users’ comments; incorporation of domain knowledge into the collected comments; performance of a sentiment analysis to distinguish positive and negative comments; exploration of the relationships between user sentiments and BIM apps’ attributes to unveil user preferences; and establishment of a topic model to identify problems frequently raised by users. Findings The results show that those BIM app categories with high user interest but low sentiments or supplies, such as “reality capture”, “interoperability” and “structural simulation and analysis”, should deserve greater efforts and attention from developers. BIM apps with continual updates and of small size are more preferred by users. Problems related to the “support for new Revit”, “import & export” and “external linkage” are most frequently complained by users. Originality/value The main contributions of this work include: the innovative application of text mining techniques to identify user needs to drive BIM apps development; and the development of a synthetic approach to orchestrating domain knowledge, text mining techniques (i.e. sentiment analysis and topic modelling) and statistical methods in order to help extract user needs for promoting the success of emerging technologies in the AEC industry.


Sign in / Sign up

Export Citation Format

Share Document