Towards the Development of a Knowledge Base for Realizing User-Friendly Data Mining

Author(s):  
Roberto Espinosa ◽  
Diego García-Saiz ◽  
Jose Jacobo Zubcoff ◽  
Jose-Norberto Mazón ◽  
Marta Zorrilla
2016 ◽  
Vol 31 (2) ◽  
pp. 97-123 ◽  
Author(s):  
Alfred Krzywicki ◽  
Wayne Wobcke ◽  
Michael Bain ◽  
John Calvo Martinez ◽  
Paul Compton

AbstractData mining techniques for extracting knowledge from text have been applied extensively to applications including question answering, document summarisation, event extraction and trend monitoring. However, current methods have mainly been tested on small-scale customised data sets for specific purposes. The availability of large volumes of data and high-velocity data streams (such as social media feeds) motivates the need to automatically extract knowledge from such data sources and to generalise existing approaches to more practical applications. Recently, several architectures have been proposed for what we callknowledge mining: integrating data mining for knowledge extraction from unstructured text (possibly making use of a knowledge base), and at the same time, consistently incorporating this new information into the knowledge base. After describing a number of existing knowledge mining systems, we review the state-of-the-art literature on both current text mining methods (emphasising stream mining) and techniques for the construction and maintenance of knowledge bases. In particular, we focus on mining entities and relations from unstructured text data sources, entity disambiguation, entity linking and question answering. We conclude by highlighting general trends in knowledge mining research and identifying problems that require further research to enable more extensive use of knowledge bases.


Author(s):  
Sourabh Parmar

Researchers use transcriptomics analyses for biological data mining, interpretation, and presentation. Galaxy-based tools are utilized to analyze various complex disease transcriptomic data to understand the pathogenesis of the disease, which are user-friendly. This work provides simple methods for differential expression analysis and analysis of these results in gene ontology and pathway enrichment tools like David, WebGestalt. This method is very effective in better analysis and understanding the transcriptomic data. Transcriptomics analysis has been made on rheumatoid arthritis sra data. Rheumatoid arthritis (RA) is a systemic autoimmune disease. T cells and autoantibodies mediate the pathogenesis. This article discusses the genes which are differentially expressed between the healthy (n=50) and diseased (n=51) and the functions of those genes in the pathogenesis of RA.


2011 ◽  
Vol 24 (3) ◽  
pp. 45-60
Author(s):  
Ben Ali ◽  
Samar Mouakket

E-business domains have been considered killer domains for different data analysis techniques. Most researchers have examined data mining (DM) techniques to analyze the databases behind E-business websites. DM has shown interesting results, but this technique presents some restrictions concerning the content of the database and the level of expertise of the users interpreting the results. In this paper, the authors show that successful and more sophisticated results can be obtained using other analysis techniques, such as Online Analytical Processing (OLAP) and Spatial OLAP (SOLAP). Thus, the authors propose a framework that fuses or integrates OLAP with SOLAP techniques in an E-business domain to perform easier and more user-friendly data analysis (non-spatial and spatial) and improve decision making. In addition, the authors apply the framework to an E-business website related to online job seekers in the United Arab Emirates (UAE). The results can be used effectively by decision makers to make crucial decisions in the job market of the UAE.


Axioms ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 49
Author(s):  
Anton Romanov ◽  
Valeria Voronina ◽  
Gleb Guskov ◽  
Irina Moshkina ◽  
Nadezhda Yarushkina

The development of the economy and the transition to industry 4.0 creates new challenges for artificial intelligence methods. Such challenges include the processing of large volumes of data, the analysis of various dynamic indicators, the discovery of complex dependencies in the accumulated data, and the forecasting of the state of processes. The main point of this study is the development of a set of analytical and prognostic methods. The methods described in this article based on fuzzy logic, statistic, and time series data mining, because data extracted from dynamic systems are initially incomplete and have a high degree of uncertainty. The ultimate goal of the study is to improve the quality of data analysis in industrial and economic systems. The advantages of the proposed methods are flexibility and orientation to the high interpretability of dynamic data. The high level of the interpretability and interoperability of dynamic data is achieved due to a combination of time series data mining and knowledge base engineering methods. The merging of a set of rules extracted from the time series and knowledge base rules allow for making a forecast in case of insufficiency of the length and nature of the time series. The proposed methods are also based on the summarization of the results of processes modeling for diagnosing technical systems, forecasting of the economic condition of enterprises, and approaches to the technological preparation of production in a multi-productive production program with the application of type 2 fuzzy sets for time series modeling. Intelligent systems based on the proposed methods demonstrate an increase in the quality and stability of their functioning. This article contains a set of experiments to approve this statement.


Author(s):  
Shamsul I. Chowdhury

Over the last decade data warehousing and data mining tools have evolved from research into a unique and popular applications, ranging from data warehousing and data mining for decision support to business intelligence and other kind of applications. The chapter presents and discusses data warehousing methodologies along with the main components of data mining tools and technologies and how they all could be integrated together for knowledge management in a broader sense. Knowledge management refers to the set of processes developed in an organization to create, extract, transfer, store and apply knowledge. The chapter also focuses on how data mining tools and technologies could be used in extracting knowledge from large databases or data warehouses. Knowledge management increases the ability of an organization to learn from its environment and to incorporate knowledge into the business processes by adapting to new tools and technologies. Knowledge management is also about the reusability of the knowledge that is being extracted and stored in the knowledge base. One way to improve the reusability is to use this knowledge base as front-ends to case-based reasoning (CBR) applications. The chapter further focuses on the reusability issues of knowledge management and presents an integrated framework for knowledge management by combining data mining (DM) tools and technologies with CBR methodologies. The purpose of the integrated framework is to discover, validate, retain, reuse and share knowledge in an organization with its internal users as well as its external users. The framework is independent of application domain and would be suitable for uses in areas, such as data mining and knowledge management in e-government.


Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Adriano Ferrasa ◽  
Mayara M Murata ◽  
Teresa D C G Cofre ◽  
Juliana S Cavallini ◽  
Gustavo Peron ◽  
...  

Abstract Citrus canker type A is a serious disease caused by Xanthomonas citri subsp. citri (X. citri), which is responsible for severe losses to growers and to the citrus industry worldwide. To date, no canker-resistant citrus genotypes are available, and there is limited information regarding the molecular and genetic mechanisms involved in the early stages of the citrus canker development. Here, we present the CitrusKB knowledge base. This is the first in vivo interactome database for different citrus cultivars, and it was produced to provide a valuable resource of information on citrus and their interaction with the citrus canker bacterium X. citri. CitrusKB provides tools for a user-friendly web interface to let users search and analyse a large amount of information regarding eight citrus cultivars with distinct levels of susceptibility to the disease, with controls and infected plants at different stages of infection by the citrus canker bacterium X. citri. Currently, CitrusKB comprises a reference citrus genome and its transcriptome, expressed transcripts, pseudogenes and predicted genomic variations (SNPs and SSRs). The updating process will continue over time by the incorporation of novel annotations and analysis tools. We expect that CitrusKB may substantially contribute to the field of citrus genomics. CitrusKB is accessible at http://bioinfo.deinfo.uepg.br/citrus. Users can download all the generated raw sequences and generated datasets by this study from the CitrusKB website.


2013 ◽  
Vol 645 ◽  
pp. 232-238
Author(s):  
Qing Li ◽  
Wei Yang ◽  
Xiao Nan Ye ◽  
Xiao Xiao Ma

The realization of a device test training system requires the use of a lot of domain knowledge, and building knowledge base will play an important role. In view of the uncertainty, inaccuracy and incompleteness of test data in the testing process, this paper makes the data mining algorithms based on rough set as knowledge acquisition algorithm, and proposes an improved algorithm for insufficient of approximate reduction of rough set knowledge based on the tolerance relation of incomplete information system. The paper studies the design and realization of knowledge base system in the developing of device simulation training system on this basis, and validates the method through a design example of knowledge base of a certain device simulation training system.


Author(s):  
Md. Redone Hassan ◽  
S.K. Obidul Kadir ◽  
Md. Aminul Islam ◽  
Sheikh Abujar ◽  
Raihana Zannat ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document