Knowledge Discovery Practices and Emerging Applications of Data Mining - Advances in Data Mining and Database Management
Latest Publications


TOTAL DOCUMENTS

15
(FIVE YEARS 0)

H-INDEX

2
(FIVE YEARS 0)

Published By IGI Global

9781609600679, 9781609600693

Author(s):  
Giovanni Giuffrida ◽  
Diego Reforgiato ◽  
Catarina Sismeiro ◽  
Giuseppe Tribulato

In most developed countries competition among mobile phone operators is now focused on switching customers away from competitors with extremely discounted telephony rates. This fierce competitive environment is the result of a saturated market with small or inexistent growth and has caused operators to rely increasingly on Value-Added Services (VAS) for revenue growth. Though mobile phone operators have thousands of different services available to offer to their customers, the contact opportunities to offer these services are limited. In this context, statistical methods and data mining tools can play an important role to optimize content delivery. In this chapter the authors describe novel methods now available to mobile phone operators to optimize targeting and improve profitability from VAS offers.


Author(s):  
Anthony Scime ◽  
Karthik Rajasethupathy ◽  
Kulathur S. Rajasethupathy ◽  
Gregg R. Murray

Data mining is a collection of algorithms for finding interesting and unknown patterns or rules in data. However, different algorithms can result in different rules from the same data. The process presented here exploits these differences to find particularly robust, consistent, and noteworthy rules among much larger potential rule sets. More specifically, this research focuses on using association rules and classification mining to select the persistently strong association rules. Persistently strong association rules are association rules that are verifiable by classification mining the same data set. The process for finding persistent strong rules was executed against two data sets obtained from the American National Election Studies. Analysis of the first data set resulted in one persistent strong rule and one persistent rule, while analysis of the second data set resulted in 11 persistent strong rules and 10 persistent rules. The persistent strong rule discovery process suggests these rules are the most robust, consistent, and noteworthy among the much larger potential rule sets.


Author(s):  
Adnan I. Al Rabea ◽  
Ibrahiem M. M. El Emary

This chapter is interested in discussing and reporting how one can be benefited by using Data Mining and Knowledge Discovery techniques in achieving an acceptable level of quality of service of telecommunication systems. The quality of service is defined as the metrics which is predicated by using the data mining techniques, decision tree, association rules and neural networks. Digital telecommunication networks are highly complex systems and thus their planning, management and optimization are challenging tasks. The user expectations constitute the Quality of Service (QoS). To gain a competitive edge on other operators, the operating personnel have to measure the network in terms of QoS. In current times, there are three data mining methods applied to actual GSM network performance measurements, in which the methods were chosen to help the operating staff to find the essential information in network quality performance measurements. The results of Pekko (2004) show that the analyst can make good use of Rough Sets and Classification and Regression Trees (CART), because their information can be expressed in plain language rules that preserve the variable names of the original measurement. In addition, the CART and the Self-Organizing Map (SOM) provide effective visual means for interpreting the data set.


Author(s):  
A. V. Senthil Kumar ◽  
Adnan Alrabea ◽  
Pedamallu Chandra Sekhar

Over the last couple of years, data mining technology has been successfully employed to various business domains and scientific areas. One of the main unresolved problems that arise during the data mining process is treating data that contains temporal information. A thorough understanding of this concept requires that the data should be viewed as a sequence of events. Temporal sequences exist extensively in different areas that include economics, finance, communication, engineering, medicine, weather forecast and so on. This chapter proposes a technique that is developed to explore frequent temporal itemsets in the database. The basic idea of this technique is to first partition the database into sub-databases in light of either common starting time or common ending time. Then for each partition, the proposed technique is used progressively to accumulate the number of occurrences of each candidate 2-itemsets. A Directed graph is built using the support of these candidate 2-itemsets (combined from all the sub-databases) as a result of generating all candidate temporal k- itemsets in the database. The above technique may help researchers not only to understand about generating frequent large temporal itemsets but also helps in understanding of finding temporal association rules among transactions within relational databases.


Author(s):  
Rahime Belen ◽  
Tugba Taskaya Temizel

Many manually populated very large databases suffer from data quality problems such as missing, inaccurate data and duplicate entries. A recently recognized data quality problem is that of disguised missing data which arises when an explicit code for missing data such as NA (Not Available) is not provided and a legitimate data value is used instead. Presence of these values may affect the outcome of data mining tasks severely such that association mining algorithms or clustering techniques may result in biased inaccurate association rules and invalid clusters respectively. Detection and elimination of these values are necessary but burdensome to be carried out manually. In this chapter, the methods to detect disguised missing values by visual inspection are explained first. Then, the authors describe the methods used to detect these values automatically. Finally, the framework to detect disguised missing data is proposed and a demonstration of the framework on spatial and categorical data sets is provided.


Author(s):  
Bruno Ohana ◽  
Brendan Tierney

Opinion Mining is an emerging field of research concerned with applying computational methods to the treatment of subjectivity in text, with a number of applications in fields such as recommendation systems, contextual advertising and business intelligence. In this chapter the authors survey the area of opinion mining and discuss the SentiWordNet lexicon of sentiment information for terms derived from WordNet. Furthermore, the results of their research in applying this lexicon to sentiment classification of film reviews along with a novel approach that leverages opinion lexicons to build a data set of features used as input to a supervised learning classifier are also presented. The results obtained are in line with other experiments based on manually built opinion lexicons with further improvements obtained by using the novel approach, and are indicative that lexicons built using semi supervised methods such as SentiWordNet can be an important resource in sentiment classification tasks. Considerations on future improvements are also presented based on a detailed analysis of classification results.


Author(s):  
Mikolaj Morzy

An Internet forum is a web application for publishing user-generated content under the form of a discussion. Messages posted to the Internet forum form threads of discussion and contain textual and multimedia contents. An important feature of Internet forums is their social aspect. Internet forums attract dedicated users who build tight social communities. There is an abundance of Internet forums covering all aspects of human activities: politics, sports, entertainment, science, religion, leisure, hobbies, etc. With large user communities forming around popular Internet forums it is important to distinguish between knowledgeable users, who contribute high quality contents, and other types of users, such as casual users or Internet trolls. Therefore, social role discovery becomes an important issue in discovery of valuable knowledge from Internet forums. This chapter provides an overview of Internet forum technology. It discusses the architecture of Internet forums, presents an overview of data volumes involved and outlines technical challenges of scraping Internet forum data. A broad summary of all research conducted on mining and exploring Internet forums for social role discovery is presented. Next, a multi-tier model for Internet forum analysis (statistical analysis, index analysis, and network analysis) is introduced. Social roles are automatically attributed to Internet forum users based on egocentric graphs of user activity. The issues discussed in the chapter are illustrated with real-world examples. The chapter concludes with a brief summary and a future work agenda.


Author(s):  
Fernando Alonso ◽  
Loïc Martínez ◽  
Aurora Pérez ◽  
Juan Pedro Valente

Although expert elicited knowledge and data mining discovered knowledge appear to be completely opposite and competing solutions to the same problems, they are actually complementary concepts. Besides, together they maximize their individual qualities. This chapter highlights how each one profits from the other and illustrates their cooperation in existing systems developed in the medical domain. The authors have identified different types of cooperation that combine elicitation and data mining for knowledge acquisition, use expert knowledge to enact the knowledge discovery, use discovered knowledge to validate expert knowledge, and use discovered knowledge to improve the usability of an expert system. The chapter also describes their experience in combining expert and discovered knowledge in the development of a system for processing medical isokinetics data.


Author(s):  
Udai Shanker ◽  
Abhay N. Singh ◽  
Abhinav Anand ◽  
Saurabh Agrawal

This chapter proposes Shadow Sensitive SWIFT commit protocol for Distributed Real Time Database Systems (DRTDBS), where only abort dependent cohort having deadline beyond a specific value (Tshadow_creation_time) can forks off a replica of itself called a shadow, whenever it borrows dirty value of a data item. The new dependencies Commit-on-Termination external dependency between final commit operations of lender and shadow of its borrower and Begin-on-Abort internal dependency between shadow of borrower and borrower itself are defined. If there is serious problem in commitment of lender, execution of borrower is started with its shadow by sending YES-VOTE message piggy bagged with the new result to its coordinator after aborting it and abort dependency created between lender and borrower due to update-read conflict is reversed to commit dependency between shadow and lender with read-update conflict and commit operation governed by Commit-on-Termination dependency. The performance of Shadow Sensitive SWIFT is compared with shadow PROMPT, SWIFT and DSS-SWIFT commit protocols (Haritsa, Ramamritham, & Gupta, 2000; Shanker, Misra, & Sarje, 2006; Shanker, Misra, Sarje, & Shisondia, 2006) for both main memory resident and disk resident databases with and without communication delay. Simulation results show that the proposed protocol improves the system performance up to 5% as transaction miss percentage.


Author(s):  
Diego Milone ◽  
Georgina Stegmayer ◽  
Matías Gerard ◽  
Laura Kamenetzky ◽  
Mariana López ◽  
...  

The volume of information derived from post genomic technologies is rapidly increasing. Due to the amount of involved data, novel computational methods are needed for the analysis and knowledge discovery into the massive data sets produced by these new technologies. Furthermore, data integration is also gaining attention for merging signals from different sources in order to discover unknown relations. This chapter presents a pipeline for biological data integration and discovery of a priori unknown relationships between gene expressions and metabolite accumulations. In this pipeline, two standard clustering methods are compared against a novel neural network approach. The neural model provides a simple visualization interface for identification of coordinated patterns variations, independently of the number of produced clusters. Several quality measurements have been defined for the evaluation of the clustering results obtained on a case study involving transcriptomic and metabolomic profiles from tomato fruits. Moreover, a method is proposed for the evaluation of the biological significance of the clusters found. The neural model has shown a high performance in most of the quality measures, with internal coherence in all the identified clusters and better visualization capabilities.


Sign in / Sign up

Export Citation Format

Share Document