TtoO

2011 ◽  
pp. 123-144
Author(s):  
Josiane Mothe ◽  
Nathalie Hernandez

This chapter introduces a method re-using a thesaurus built for a given domain, in order to create new resources of a higher semantic level in the form of an ontology. Considering ontologies for data-mining tasks relies on the intuition that the meaning of textual information depends on the conceptual relations between the objects to which they refer rather than on the linguistic and statistical relations of their content. To put forward such advanced mechanisms, the first step is to build the ontologies. The originality of the method is that it is based both on the knowledge extracted from a thesaurus and on the knowledge semi-automatically extracted from a textual corpus. The whole process is semi-automated and experts’ tasks are limited to validating certain steps. In parallel, we have developed mechanisms based on the obtained ontology to accomplish a science monitoring task. An example will be given.

Author(s):  
Ioannis N. Kouris

Software development has various stages, that can be conceptually grouped into two phases namely development and production (Figure 1). The development phase includes requirements engineering, architecting, design, implementation and testing. The production phase on the other hand includes the actual deployment of the end product and its maintenance. Software maintenance is the last and most difficult stage in the software lifecycle (Sommerville, 2001), as well as the most costly one. According to Zelkowitz, Shaw and Gannon (1979) the production phase accounts for 67% of the costs of the whole process, whereas according to Van Vliet (2000) the actual cost of software maintenance has been estimated at more than half of the total software development cost. The development phase is critical in order to facilitate efficient and simple software maintenance. The earlier stages should be done by taking into consideration apart from any functional requirements also the later maintenance task. For example the design stage should plan the structure in a way that can be easily altered. Similarly, the implementation stage should create code that can be easily read, understood, and changed, and should also keep the code length to a minimum. According to Van Vliet (2000) the final source code length generated is the determinant factor for the total cost during maintenance, since obviously the less code is written the easier the maintenance becomes. According to Erdil et al. (2003) there are four major problems that can slow down the whole maintenance process: unstructured code, maintenance programmers having insufficient knowledge of the system, documentation being absent, out of date, or at best insufficient, and software maintenance having a bad image. Thus the success of the maintenance phase relies on these problems being fixed earlier in the life cycle. In real life however when programmers decide to perform some maintenance task on a program such as to fix bugs, to make modifications, to create software updates etc. these are usually done in a state of time and commercial pressures and with the logic of cost reduction, thus finally resulting in a problematic system with ever increased complexity. As a consequence the maintainers spend from 50% up to almost 90% of their time trying to comprehend the program (Erdös and Sneed; 1998, Von Mayrhauser and Vans; 1994, Pigoski, 1996). Providing maintainers with tools and techniques to comprehend the programs has become and is receiving a lot of financial and research interest given the widespread of computers and software in all aspects of life. In this work we briefly present some of the most important techniques proposed in the field thus far and focus primarily on the use of data mining techniques in general and especially on association rules. Accordingly we give some possible solutions to problems faced by these methods.


2001 ◽  
Vol 10 (04) ◽  
pp. 691-713 ◽  
Author(s):  
TUBAO HO ◽  
TRONGDUNG NGUYEN ◽  
DUCDUNG NGUYEN ◽  
SAORI KAWASAKI

The problem of model selection in knowledge discovery and data mining—the selection of appropriate discovered patterns/models or algorithms to achieve such patterns/models—is generally a difficult task for the user as it requires meta-knowledge on algorithms/models and model performance metrics. Viewing knowledge discovery as a human-centered process that requires an effective collaboration between the user and the discovery system, our work aims to make model selection in knowledge discovery easier and more effective. For such a collaboration, our solution is to give the user the ability to try easily various alternatives and to compare competing models quantitatively and qualitatively. The basic idea of our solution is to integrate data and knowledge visualization with the knowledge discovery process in order to the support the participation of the user. We introduce the knowledge discovery system D2MS in which several visualization techniques of data and knowledge are developed and integrated into the steps of the knowledge discovery process. The visualizers in D2MS greatly help the user gain better insight in each step of the knowledge discovery process as well the relationship between data and discovered knowledge in the whole process.


2014 ◽  
Vol 945-949 ◽  
pp. 3369-3375
Author(s):  
Genival Pavanelli ◽  
Maria Teresinha Arns Steiner ◽  
Anderson Roges Teixeira Góes ◽  
Alessandra Memari Pavanelli ◽  
Deise Maria Bertholdi Costa

The process of knowledge management in the several areas of society requires constant attention to the multiplicity of decisions to be made about the activities in organizations that constitute them. To make these decisions one should be cautious in relying only on personal knowledge acquired through professional experience, since the whole process based on this method would be slow, expensive and highly subjective. To assist in this management, it is necessary to use mathematical tools that fulfill the purpose of extracting knowledge from database. This article proposes the application of Greedy Randomized Adaptive Search Procedure (GRASP) as Data Mining (DM) tool within the process called Knowledge Discovery in Databases (KDD) for the task of extracting classification rules in databases.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Konstantinos F. Xylogiannopoulos ◽  
Panagiotis Karampelas ◽  
Reda Alhajj

Abstract Background The first half of 2020 has been marked as the era of COVID-19 pandemic which affected the world globally in almost every aspect of the daily life from societal to economical. To prevent the spread of COVID-19, countries have implemented diverse policies regarding Non-Pharmaceutical Intervention (NPI) measures. This is because in the first stage countries had limited knowledge about the virus and its contagiousness. Also, there was no effective medication or vaccines. This paper studies the effectiveness of the implemented policies and measures against the deaths attributed to the virus between January and May 2020. Methods Data from the European Centre for Disease Prevention and Control regarding the identified cases and deaths of COVID-19 from 48 countries have been used. Additionally, data concerning the NPI measures related policies implemented by the 48 countries and the capacity of their health care systems was collected manually from their national gazettes and official institutes. Data mining, time series analysis, pattern detection, machine learning, clustering methods and visual analytics techniques have been applied to analyze the collected data and discover possible relationships between the implemented NPIs and COVID-19 spread and mortality. Further, we recorded and analyzed the responses of the countries against COVID-19 pandemic, mainly in urban areas which are over-populated and accordingly COVID-19 has the potential to spread easier among humans. Results The data mining and clustering analysis of the collected data showed that the implementation of the NPI measures before the first death case seems to be very effective in controlling the spread of the disease. In other words, delaying the implementation of the NPI measures to after the first death case has practically little effect on limiting the spread of the disease. The success of implementing the NPI measures further depends on the way each government monitored their application. Countries with stricter policing of the measures seems to be more effective in controlling the transmission of the disease. Conclusions The conducted comparative data mining study provides insights regarding the correlation between the early implementation of the NPI measures and controlling COVID-19 contagiousness and mortality. We reported a number of useful observations that could be very helpful to the decision makers or epidemiologists regarding the rapid implementation and monitoring of the NPI measures in case of a future wave of COVID-19 or to deal with other unknown infectious pandemics. Regardless, after the first wave of COVID-19, most countries have decided to lift the restrictions and return to normal. This has resulted in a severe second wave in some countries, a situation which requires re-evaluating the whole process and inspiring lessons for the future.


2019 ◽  
pp. 46-52
Author(s):  
T. I. Makarevich

The given paper considers application of data mining technology in scientific research as one of intellectual analysis methods in the domain field of e-Government. The topicality of the issue is stipulated by the current absence of the researches of the kind in the Republic of Belarus. The paper illustrates how the programme package Rapid Miner and the language R have been applied in text mining. Concept indexing has been admitted as the most resultative form of analyzing domain field ontologies. Formal and linguistic approaches are found most effective in analyzing domain field ontologies. The paper identifies the problems of word redundancy and word polysemy. The prognosis for the further research investigation is in interconnectivity of specialized ontologies studying heterogeneous terms on the basis of artificial intelligence (AI).


10.28945/2697 ◽  
2003 ◽  
Author(s):  
Krzysztof Hauke ◽  
Mievzyslaw L. Owoc ◽  
Maciej Pondel

Data Mining (DM) is a very crucial issue in knowledge discovery processes. The basic facilities to create data mining models were implemented successfully on Oracle 9i as the extension of the database server. DM tools enable developers to create Business Intelligence (BI) applications. As a result Data Mining models can be used as support of knowledge-based management. The main goal of the paper is to present new features of the Oracle platform in building and testing DM models. Authors characterize methods of building and testing Data Mining models available on the Oracle 9i platform, stressing the critical steps of the whole process and presenting examples of practical usage of DM models. Verification techniques of the generated knowledge bases are discussed in the mentioned environment.


Author(s):  
Héctor Oscar Nigro ◽  
Sandra Elizabeth González Císaro

Nowadays one of the most important and challenging problems in Knowledge Discovery Process in Databases (KDD) or Data Mining is the definition of the prior knowledge; this can be originated either from the process or the domain. This contextual information may help select the appropriate information, features or techniques, decrease the space of hypothesis, represent the output in a more comprehensible way and improve the whole process.


2013 ◽  
Vol 303-306 ◽  
pp. 1506-1509
Author(s):  
Xi Lin Bao ◽  
Chen Guo ◽  
Yan Ye ◽  
Qian Yao ◽  
Min Wu

When data mining involves document processing, extracting abstract information from the content of documents has become an essential procedure. The core idea of the algorithms of abstract extraction represented by Luhn is extracting the abstract merely from the sentences which contain frequent words in the essay. However, these algorithms fail to extract from the full text in a deeper semantic level. Therefore, the accuracy of the traditional abstract extraction algorithm needs to be enhanced. In order to improve the accuracy, we propose a method which can improve the performance of the algorithm of candidate key words extraction by using the substitute words and considering the semantic meanings of the candidate key words.


Author(s):  
Adam Albert ◽  
Marie Duží ◽  
Marek Menšík ◽  
Miroslav Pajr ◽  
Vojtěch Patschka

In this paper, we deal with the support in the search for appropriate textual sources. Users ask for an atomic concept that is explicated using machine learning methods applied to different textual sources. Next, we deal with the so-obtained explications to provide even more useful information. To this end, we apply the method of computing association rules. The method is one of the data-mining methods used for information retrieval. Our background theory is the system of Transparent Intensional Logic (TIL); all the concepts are formalised as TIL constructions.


Sign in / Sign up

Export Citation Format

Share Document