Implicit Knowledge Discovery in Design Semantic Network by Applying Pythagorean Means on Shortest Path Searching

Author(s):  
Feng Shi ◽  
Liuqing Chen ◽  
Ji Han ◽  
Peter Childs

With the advent of the big-data era, massive textual information stored in electronic and digital documents have become valuable resources for knowledge discovery in the fields of design and engineering. Ontology technologies and semantic networks have been widely applied with text mining techniques including Natural Language Processing (NLP) to extract structured knowledge associations from the large-scale unstructured textual data. However, most existing works mainly focus on how to construct the semantic networks by developing various text mining methods such as statistical approaches and semantic approaches, while few studies are found to focus on how to subsequently analyze and fully utilize the already well-established semantic networks. In this paper, a specific network analysis method is proposed to discover the implicit knowledge associations from the existing semantic network for improving knowledge discovery and design innovation. Pythagorean means are applied with Dijkstra’s shortest path algorithm to discover the implicit knowledge associations either around a single knowledge concept or between two concepts. Six criteria are established to evaluate and rank the correlation degree of the implicit associations. Two engineering case studies were conducted to illustrate the proposed knowledge discovery process, and the results showed the effectiveness of the retrieved implicit knowledge associations on helping providing relevant knowledge from various aspects, and provoking creative ideas for engineering innovation.

2017 ◽  
Vol 139 (11) ◽  
Author(s):  
Feng Shi ◽  
Liuqing Chen ◽  
Ji Han ◽  
Peter Childs

With the advent of the big-data era, massive information stored in electronic and digital forms on the internet become valuable resources for knowledge discovery in engineering design. Traditional document retrieval method based on document indexing focuses on retrieving individual documents related to the query, but is incapable of discovering the various associations between individual knowledge concepts. Ontology-based technologies, which can extract the inherent relationships between concepts by using advanced text mining tools, can be applied to improve design information retrieval in the large-scale unstructured textual data environment. However, few of the public available ontology database stands on a design and engineering perspective to establish the relations between knowledge concepts. This paper develops a “WordNet” focusing on design and engineering associations by integrating the text mining approaches to construct an unsupervised learning ontology network. Subsequent probability and velocity network analysis are applied with different statistical behaviors to evaluate the correlation degree between concepts for design information retrieval. The validation results show that the probability and velocity analysis on our constructed ontology network can help recognize the high related complex design and engineering associations between elements. Finally, an engineering design case study demonstrates the use of our constructed semantic network in real-world project for design relations retrieval.


2008 ◽  
Vol 02 (03) ◽  
pp. 343-364 ◽  
Author(s):  
BRIAN HARRINGTON ◽  
STEPHEN CLARK

Extracting semantic information from multiple natural language sources and combining that information into a single unified resource is an important and fundamental goal for natural language processing. Large scale resources of this kind can be useful for a wide variety of tasks including question answering, word sense disambiguation and knowledge discovery. A single resource representing the information in multiple documents can provide significantly more semantic information than is available from the documents considered independently. The ASKNet system utilises existing NLP tools and resources, together with spreading activation based techniques, to automatically extract semantic information from a large number of English texts, and combines that information into a large scale semantic network. The initial emphasis of the ASKNet system is on wide-coverage, robustness and speed of construction. In this paper we show how a network consisting of over 1.5 million nodes and 3.5 million edges, more than twice as large as any network currently available, can be created in less than 3 days. Evaluation of large-scale semantic networks is a difficult problem. In order to evaluate ASKNet we have developed a novel evaluation metric based on the notion of a network "core" and employed human evaluators to determine the precision of various components of that core. We have applied this evaluation to networks created from randomly chosen articles used by DUC (Document Understanding Conference). The results are highly promising: almost 80% precision in the semantic core of the networks.


2021 ◽  
Vol 1 ◽  
pp. 2621-2630
Author(s):  
Ji Han ◽  
Serhad Sarica ◽  
Feng Shi ◽  
Jianxi Luo

AbstractThere have been growing uses of semantic networks in the past decade, such as leveraging large-scale pre-trained graph knowledge databases for various natural language processing (NLP) tasks in engineering design research. Therefore, the paper provides a survey of the research that has employed semantic networks in the engineering design research community. The survey reveals that engineering design researchers have primarily relied on WordNet, ConceptNet, and other common-sense semantic network databases trained on non-engineering data sources to develop methods or tools for engineering design. Meanwhile, there are emerging efforts to mine large scale technical publication and patent databases to construct engineering-contextualized semantic network databases, e.g., B-Link and TechNet, to support NLP in engineering design. On this basis, we recommend future research directions for the construction and applications of engineering-related semantic networks in engineering design research and practice.


2021 ◽  
Vol 11 (9) ◽  
pp. 4087
Author(s):  
Yaran Jiao ◽  
Chunming Li ◽  
Yinglun Lin

With the popularization of social networks, the abundance of unstructured data regarding environmental complaints is rapidly increasing. This study established a text mining framework for Chinese civil environmental complaints and analyzed the characteristics of environmental complaints, including keywords, sentiment, and semantic networks, with two–year environmental complaints records in Guangzhou city, China. The results show that the keywords of environmental complaints can be effectively extracted, providing an accurate entry point for solving environmental problems; light pollution complaints are the most negative, and electromagnetic radiation complaints have the most fluctuating emotions, which may be due to the diversity of citizens’ perceptions of pollution; the nodes of the semantic network reveal that citizens pay the most attention to pollution sources but the least attention to stakeholders; the edges of the semantic network shows that pollution sources and pollution receptors show the most concerning relationship, and the pollution receptors’ relationships with pollution behaviors, sensory features, stakeholders, and individual health are also highlighted by citizens. Thus, environmental pollution management should not only strengthen the control of pollution sources but also pay attention to these characteristics. This study provides an efficient technical method for unstructured data analysis, which may be helpful for precise and smart environmental management.


Author(s):  
Zhuang Liu ◽  
Degen Huang ◽  
Kaiyu Huang ◽  
Zhuang Li ◽  
Jun Zhao

There is growing interest in the tasks of financial text mining. Over the past few years, the progress of Natural Language Processing (NLP) based on deep learning advanced rapidly. Significant progress has been made with deep learning showing promising results on financial text mining models. However, as NLP models require large amounts of labeled training data, applying deep learning to financial text mining is often unsuccessful due to the lack of labeled training data in financial fields. To address this issue, we present FinBERT (BERT for Financial Text Mining) that is a domain specific language model pre-trained on large-scale financial corpora. In FinBERT, different from BERT, we construct six pre-training tasks covering more knowledge, simultaneously trained on general corpora and financial domain corpora, which can enable FinBERT model better to capture language knowledge and semantic information. The results show that our FinBERT outperforms all current state-of-the-art models. Extensive experimental results demonstrate the effectiveness and robustness of FinBERT. The source code and pre-trained models of FinBERT are available online.


10.28945/4066 ◽  
2018 ◽  
Vol 13 ◽  
pp. 117-135 ◽  
Author(s):  
M. Thangaraj ◽  
M Sivakami

Aim/Purpose: The aim of this paper is to analyze various text classification techniques employed in practice, their strengths and weaknesses, to provide an improved awareness regarding various knowledge extraction possibilities in the field of data mining. Background: Artificial Intelligence is reshaping text classification techniques to better acquire knowledge. However, in spite of the growth and spread of AI in all fields of research, its role with respect to text mining is not well understood yet. Methodology: For this study, various articles written between 2010 and 2017 on “text classification techniques in AI”, selected from leading journals of computer science, were analyzed. Each article was completely read. The research problems related to text classification techniques in the field of AI were identified and techniques were grouped according to the algorithms involved. These algorithms were divided based on the learning procedure used. Finally, the findings were plotted as a tree structure for visualizing the relationship between learning procedures and algorithms. Contribution: This paper identifies the strengths, limitations, and current research trends in text classification in an advanced field like AI. This knowledge is crucial for data scientists. They could utilize the findings of this study to devise customized data models. It also helps the industry to understand the operational efficiency of text mining techniques. It further contributes to reducing the cost of the projects and supports effective decision making. Findings: It has been found more important to study and understand the nature of data before proceeding into mining. The automation of text classification process is required, with the increasing amount of data and need for accuracy. Another interesting research opportunity lies in building intricate text data models with deep learning systems. It has the ability to execute complex Natural Language Processing (NLP) tasks with semantic requirements. Recommendations for Practitioners: Frame analysis, deception detection, narrative science where data expresses a story, healthcare applications to diagnose illnesses and conversation analysis are some of the recommendations suggested for practitioners. Recommendation for Researchers: Developing simpler algorithms in terms of coding and implementation, better approaches for knowledge distillation, multilingual text refining, domain knowledge integration, subjectivity detection, and contrastive viewpoint summarization are some of the areas that could be explored by researchers. Impact on Society: Text classification forms the base of data analytics and acts as the engine behind knowledge discovery. It supports state-of-the-art decision making, for example, predicting an event before it actually occurs, classifying a transaction as ‘Fraudulent’ etc. The results of this study could be used for developing applications dedicated to assisting decision making processes. These informed decisions will help to optimize resources and maximize benefits to the mankind. Future Research: In the future, better methods for parameter optimization will be identified by selecting better parameters that reflects effective knowledge discovery. The role of streaming data processing is still rarely explored when it comes to text classification.


2021 ◽  
pp. 1-45
Author(s):  
Ji Han ◽  
Serhad Sarica ◽  
Feng Shi ◽  
Jianxi Luo

Abstract In the past two decades, there has been increasing use of semantic networks in engineering design for supporting various activities, such as knowledge extraction, prior art search, idea generation and evaluation. Leveraging large-scale pre-trained graph knowledge databases to support engineering design-related natural language processing (NLP) tasks has attracted a growing interest in the engineering design research community. Therefore, this paper aims to provide a survey of the state-of-the-art semantic networks for engineering design and propositions of future research to build and utilize large-scale semantic networks as knowledge bases to support engineering design research and practice. The survey shows that WordNet, ConceptNet and other semantic networks, which contain common-sense knowledge or are trained on non-engineering data sources, are primarily used by engineering design researchers to develop methods and tools. Meanwhile, there are emerging efforts in constructing engineering and technical-contextualized semantic network databases, such as B-Link and TechNet, through retrieving data from technical data sources and employing unsupervised machine learning approaches. On this basis, we recommend six strategic future research directions to advance the development and uses of large-scale semantic networks for artificial intelligence applications in engineering design.


Author(s):  
NESTOR RYCHTYCKYJ ◽  
ROBERT G. REYNOLDS

Evolutionary computation has been successfully applied in a variety of problem domains and applications. In this paper we discuss the use of a specific form of evolutionary computation known as Cultural Algorithms to improve the efficiency of the subsumption algorithm in semantic networks. We identify two complementary methods of using Cultural Algorithms to solve the problem of re-engineering large-scale dynamic semantic networks in order to optimize the efficiency of subsumption: top-down and bottom-up. The top-down re-engineering approach improves subsumption efficiency by reducing the number of attributes that need to be compared for every node without impacting the results. We demonstrate that a Cultural Algorithm approach can be used to identify these defining attributes that are most significant for node retrieval. These results are then utilized within an existing vehicle assembly process planning application that utilizes a semantic network based knowledge base to improve the performance and reduce complexity of the network. It is shown that the results obtained by Cultural Algorithms are at least as good, and in most cases better, than those obtained by the human developers. The advantage of Cultural Algorithms is especially pronounced for those classes in the network that are more complex. The goal of bottom-up approach is to classify the input concepts into new clusters that are most efficient for subsumption and classification. While the resultant subsumption efficiency for the bottom-up approach exceeds that for the top-down approach, it does so by removing structural relationships that made the network understandable to human observers. Like a Rete network in expert systems, it is a compilation of only those relationships that impact subsumption. A direct comparison of the two approaches shows that bottom-up semantic network re-engineering creates a semantic network that is approximately 5 times more efficient than the top-down approach in terms of the cost of subsumption. In conclusion, we will discuss these results and show that some knowledge that is useful to the system users is lost during the bottom-up re-engineering process and that the best approach for re-engineering a semantic network requires a combination of both of these approaches.


2021 ◽  
Vol 12 ◽  
Author(s):  
Qihui Xu ◽  
Magdalena Markowska ◽  
Martin Chodorow ◽  
Ping Li

The study of code-switching (CS) speech has produced a wealth of knowledge in the understanding of bilingual language processing and representation. Here, we approach this issue by using a novel network science approach to map bilingual spontaneous CS speech. In Study 1, we constructed semantic networks on CS speech corpora and conducted community detections to depict the semantic organizations of the bilingual lexicon. The results suggest that the semantic organizations of the two lexicons in CS speech are largely distinct, with a small portion of overlap such that the semantic network community dominated by each language still contains words from the other language. In Study 2, we explored the effect of clustering coefficients on language choice during CS speech, by comparing clustering coefficients of words that were code-switched with their translation equivalents (TEs) in the other language. The results indicate that words where the language is switched have lower clustering coefficients than their TEs in the other language. Taken together, we show that network science is a valuable tool for understanding the overall map of bilingual lexicons as well as the detailed interconnections and organizations between the two languages.


Sign in / Sign up

Export Citation Format

Share Document