Data Mining and Analysis in the Engineering Field - Advances in Data Mining and Database Management

The advent of information technologies brought with it the availability of huge amounts of data to be utilized by enterprises. Data mining technologies are used to search vast amounts of data for vital insight regarding business. Data mining is used to acquire business intelligence and to acquire hidden knowledge in large databases or the Internet. Business intelligence can find hidden relations, predict future outcomes, and speculate and allocate resources. This uncovered knowledge helps in gaining competitive advantages, better customer relationships, and even fraud detection. In this chapter, the authors describe how data mining is used to achieve business intelligence. Furthermore, they look into some of the challenges in achieving business intelligence.

Download Full-text

Machine Learning Approaches for Sentiment Analysis

Data Mining and Analysis in the Engineering Field - Advances in Data Mining and Database Management ◽

10.4018/978-1-4666-6086-1.ch011 ◽

2014 ◽

pp. 193-208 ◽

Cited By ~ 9

Author(s):

Basant Agarwal ◽

Namita Mittal

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Opinion Mining ◽

Machine Learning Algorithms ◽

Sentiment Classification ◽

Learning Approaches ◽

Learning Methods ◽

Machine Learning Methods ◽

Knowledge Based ◽

Semantic Orientation

Opinion Mining or Sentiment Analysis is the study that analyzes people's opinions or sentiments from the text towards entities such as products and services. It has always been important to know what other people think. With the rapid growth of availability and popularity of online review sites, blogs', forums', and social networking sites' necessity of analysing and understanding these reviews has arisen. The main approaches for sentiment analysis can be categorized into semantic orientation-based approaches, knowledge-based, and machine-learning algorithms. This chapter surveys the machine learning approaches applied to sentiment analysis-based applications. The main emphasis of this chapter is to discuss the research involved in applying machine learning methods mostly for sentiment classification at document level. Machine learning-based approaches work in the following phases, which are discussed in detail in this chapter for sentiment classification: (1) feature extraction, (2) feature weighting schemes, (3) feature selection, and (4) machine-learning methods. This chapter also discusses the standard free benchmark datasets and evaluation methods for sentiment analysis. The authors conclude the chapter with a comparative study of some state-of-the-art methods for sentiment analysis and some possible future research directions in opinion mining and sentiment analysis.

Download Full-text

Applications of Data Mining in Software Development Life Cycle

Data Mining and Analysis in the Engineering Field - Advances in Data Mining and Database Management ◽

10.4018/978-1-4666-6086-1.ch004 ◽

2014 ◽

pp. 67-79

Author(s):

Naveen Dahiya ◽

Vishal Bhatnagar ◽

Manjeet Singh ◽

Neeti Sangwan

Keyword(s):

Data Mining ◽

Life Cycle ◽

Software Development ◽

Future Research ◽

Data Mining Techniques ◽

Use Of Data ◽

Development Life Cycle ◽

Efficient Information ◽

Software Development Life Cycle ◽

Effective Use

Data mining has proven to be an important technique in terms of efficient information extraction, classification, clustering, and prediction of future trends from a database. The valuable properties of data mining have been put to use in many applications. One such application is Software Development Life Cycle (SDLC), where effective use of data mining techniques has been made by researchers. An exhaustive survey on application of data mining in SDLC has not been done in the past. In this chapter, the authors carry out an in-depth survey of existing literature focused towards application of data mining in SDLC and propose a framework that will classify the work done by various researchers in identification of prominent data mining techniques used in various phases of SDLC and pave the way for future research in the emerging area of data mining in SDLC.

Download Full-text

Visualizing the Bug Distribution Information Available in Software Bug Repositories

Data Mining and Analysis in the Engineering Field - Advances in Data Mining and Database Management ◽

10.4018/978-1-4666-6086-1.ch003 ◽

2014 ◽

pp. 48-66

Author(s):

N. K. Nagwani ◽

S. Verma

Keyword(s):

Open Source ◽

Three Dimensional ◽

Distribution Patterns ◽

Two Dimensional ◽

Software Bugs ◽

Process Visualization ◽

Software Information ◽

Visualization Techniques ◽

Software Bug ◽

Bug Repositories

Software repositories contain a wealth of information that can be analyzed for knowledge extraction. Software bug repositories are one such repository that stores the information about the defects identified during the development of software. Information available in software bug repositories like number of bugs priority-wise, component-wise, status-wise, developers-wise, module-wise, summary-terms-wise, can be visualized with the help of two- or three-dimensional graphs. These visualizations help in understanding the bug distribution patterns, software matrices related to the software bugs, and developer information in the bug-fixing process. Visualization techniques are exploited with the help of open source technologies in this chapter to visualize the bug distribution information available in the software bug repositories. Two-dimensional and three-dimensional graphs are generated using java-based open source APIs, namely Jzy3d (Java Easy 3d) and JFreeChart. Android software bug repository is selected for the experimental demonstrations of graphs. The textual bug attribute information is also visualized using frequencies of frequent terms present in it.

Download Full-text

Implementation of Mining Techniques to Enhance Discovery in Service-Oriented Computing

Data Mining and Analysis in the Engineering Field - Advances in Data Mining and Database Management ◽

10.4018/978-1-4666-6086-1.ch014 ◽

2014 ◽

pp. 249-271

Author(s):

Chellammal Surianarayanan ◽

Gopinath Ganapathy

Keyword(s):

Data Mining ◽

Service Discovery ◽

Service Usage ◽

Data Repositories ◽

Huge Data ◽

Usage Patterns ◽

Service Oriented ◽

Mining Methods ◽

Using Data ◽

Semantic Service

Web services have become the de facto platform for developing enterprise applications using existing interoperable and reusable services that are accessible over networks. Development of any service-based application involves the process of discovering and combining one or more required services (i.e. service discovery) from the available services, which are quite large in number. With the availability of several services, manually discovering required services becomes impractical and time consuming. In applications having composition or dynamic needs, manual discovery even prohibits the usage of services itself. Therefore, effective techniques which extract relevant services from huge service repositories in relatively short intervals of time are crucial. Discovery of service usage patterns and associations/relationships among atomic services would facilitate efficient service composition. Further, with availability of several services, it is more likely to find many matched services for a given query, and hence, efficient methods are required to present the results in useful form to enable the client to choose the best one. Data mining provides well known exploratory techniques to extract relevant and useful information from huge data repositories. In this chapter, an overview of various issues of service discovery and composition and how they can be resolved using data mining methods are presented. Various research works that employ data mining methods for discovery and composition are reviewed and classified. A case study is presented that serves as a proof of concept for how data mining techniques can enhance semantic service discovery.

Download Full-text

A Layered Parameterized Framework for Intelligent Information Retrieval in Dynamic Social Network using Data Mining

Data Mining and Analysis in the Engineering Field - Advances in Data Mining and Database Management ◽

10.4018/978-1-4666-6086-1.ch013 ◽

2014 ◽

pp. 231-248

Author(s):

Shailendra Kumar Sonkar ◽

Vishal Bhatnagar ◽

Rama Krishna Challa

Keyword(s):

Data Mining ◽

Information Retrieval ◽

Social Network ◽

Age Groups ◽

Dynamic Social Network ◽

Intelligent Information Retrieval ◽

Dynamic Social Networks ◽

Intelligent Information ◽

Using Data ◽

Key Parameter

Dynamic social networks contain vast amounts of data, which is changing continuously. A search in a dynamic social network does not guarantee relevant, filtered, and timely information to the users all the time. There should be some sequential processes to apply some techniques and store the information internally that provides the relevant, filtered, and timely information to the users. In this chapter, the authors categorize the social network users into different age groups and identify the suitable and appropriate parameters, then assign these parameters to the already categorized age groups and propose a layered parameterized framework for intelligent information retrieval in dynamic social network using different techniques of data mining. The primary data mining techniques like clustering group the different groups of social network users based on similarities between key parameter items and by classifying the different classes of social network users based on differences among key parameter items, and it can be association rule mining, which finds the frequent social network users from the available users.

Download Full-text

Combining Semantics and Social Knowledge for News Article Summarization

Data Mining and Analysis in the Engineering Field - Advances in Data Mining and Database Management ◽

10.4018/978-1-4666-6086-1.ch012 ◽

2014 ◽

pp. 209-230

Author(s):

Elena Baralis ◽

Luca Cagliero ◽

Saima Jabeen ◽

Alessandro Fiori ◽

Sajid Shah

Keyword(s):

Social Knowledge ◽

News Article ◽

Research Directions ◽

Document Summarization ◽

Online Newspapers ◽

Time News ◽

The Social ◽

Novel Strategy ◽

Short Time ◽

Social Data Analysis

With the diffusion of online newspapers and social media, users are becoming capable of retrieving dozens of news articles covering the same topic in a short time. News article summarization is the task of automatically selecting a worthwhile subset of news' sentences that users could easily explore. Promising research directions in this field are the use of semantics-based models (e.g., ontologies and taxonomies) to identify key document topics and the integration of social data analysis to also consider the current user's interests during summary generation. The chapter overviews the most recent research advances in document summarization and presents a novel strategy to combine ontology-based and social knowledge for addressing the problem of generic (not query-based) multi-document summarization of news articles. To identify the most salient news articles' sentences, an ontology-based text analysis is performed during the summarization process. Furthermore, the social content acquired from real Twitter messages is separately analyzed to also consider the current interests of social network users for sentence evaluation. The combination of ontological and social knowledge allows the generation of accurate and easy-to-read news summaries. Moreover, the proposed summarizer performs better than the evaluated competitors on real news articles and Twitter messages.

Download Full-text

Population-Based Feature Selection for Biomedical Data Classification

Data Mining and Analysis in the Engineering Field - Advances in Data Mining and Database Management ◽

10.4018/978-1-4666-6086-1.ch016 ◽

2014 ◽

pp. 296-326 ◽

Cited By ~ 2

Author(s):

Seyed Jalaleddin Mousavirad ◽

Hossein Ebrahimpour-Komleh

Keyword(s):

Feature Selection ◽

Learning Algorithm ◽

Selection Process ◽

Data Classification ◽

Population Based ◽

Statistical Characteristics ◽

Biomedical Data ◽

Filter Methods ◽

Embedded Methods

Classification of biomedical data plays a significant role in prediction and diagnosis of disease. The existence of redundant and irrelevant features is one of the major problems in biomedical data classification. Excluding these features can improve the performance of classification algorithm. Feature selection is the problem of selecting a subset of features without reducing the accuracy of the original set of features. These algorithms are divided into three categories: wrapper, filter, and embedded methods. Wrapper methods use the learning algorithm for selection of features while filter methods use statistical characteristics of data. In the embedded methods, feature selection process combines with the learning process. Population-based metaheuristics can be applied for wrapper feature selection. In these algorithms, a population of candidate solutions is created. Then, they try to improve the objective function using some operators. This chapter presents the application of population-based feature selection to deal with issues of high dimensionality in the biomedical data classification. The result shows that population-based feature selection has presented acceptable performance in biomedical data classification.

Download Full-text

Rule Optimization of Web-Logs Data Using Evolutionary Technique

Data Mining and Analysis in the Engineering Field - Advances in Data Mining and Database Management ◽

10.4018/978-1-4666-6086-1.ch010 ◽

2014 ◽

pp. 180-192

Author(s):

Manish Kumar ◽

Sumit Kumar

Keyword(s):

Web Server ◽

Web Usage Mining ◽

Web Pages ◽

Web Usage ◽

Data Application ◽

Level Data ◽

Typical Data ◽

Evolutionary Technique ◽

User Access ◽

Access Patterns

Web usage mining can extract useful information from Weblogs to discover user access patterns of Web pages. Web usage mining itself can be classified further depending on the kind of usage data. This may consider Web server data, application server data, or application level data. Web server data corresponds to the user logs that are collected at Web servers. Some of the typical data collected at Web server are the URL requested, the IP address from which the request originated, and timestamp. Weblog data is required to be cleaned, condensed, and transformed in order to retrieve and analyze significant and useful information. This chapter analyzes access frequent patterns by applying the FP-growth algorithm, which is further optimized by using Genetic Algorithm (GA) and fuzzy logic.

Download Full-text

Mathematical Statistical Examinations on Script Relics

Data Mining and Analysis in the Engineering Field - Advances in Data Mining and Database Management ◽

10.4018/978-1-4666-6086-1.ch008 ◽

2014 ◽

pp. 142-158 ◽

Cited By ~ 2

Author(s):

Gábor Hosszú

Keyword(s):

Cluster Analysis ◽

Statistical Methods ◽

Carpathian Basin ◽

Clustering Methods ◽

Writing Systems ◽

Eurasian Steppe ◽

Hidden Correlations ◽

The Common

This chapter presents statistical evaluations of script relics. Its concept is exploiting mathematical statistical methods to extract hidden correlations among different script relics. Examining the genealogy of the graphemes of scripts is necessary for exploring the evolution of the writing systems, reading undeciphered inscriptions, and deciphering undeciphered scripts. The chapter focuses on the cluster analysis as one of the most popular mathematical statistical methods. The chapter presents the application of the clustering in the classification of Rovash (pronounced “rove-ash,” an alternative spelling: Rovas) relics. The various Rovash scripts were used by nations in the Eurasian Steppe and in the Carpathian Basin. The specialty of the Rovash paleography is that the Rovash script family shows a vital evolution during the last centuries; therefore, it is ideal to test the models of the evolution of the glyphs. The most important Rovash script is the Szekely-Hungarian Rovash. Cluster analysis algorithms are applied for determining the common sets among the significant Szekely-Hungarian Rovash alphabets. The determined Rovash relic ties prove the usefulness of the clustering methods in the Rovash paleography.

Download Full-text

Data Mining and Analysis in the Engineering Field - Advances in Data Mining and Database Management
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Overview of Business Intelligence through Data Mining

Machine Learning Approaches for Sentiment Analysis

Applications of Data Mining in Software Development Life Cycle

Visualizing the Bug Distribution Information Available in Software Bug Repositories

Implementation of Mining Techniques to Enhance Discovery in Service-Oriented Computing

A Layered Parameterized Framework for Intelligent Information Retrieval in Dynamic Social Network using Data Mining

Combining Semantics and Social Knowledge for News Article Summarization

Population-Based Feature Selection for Biomedical Data Classification

Rule Optimization of Web-Logs Data Using Evolutionary Technique

Mathematical Statistical Examinations on Script Relics

Export Citation Format

Data Mining and Analysis in the Engineering Field - Advances in Data Mining and Database ManagementLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Overview of Business Intelligence through Data Mining

Machine Learning Approaches for Sentiment Analysis

Applications of Data Mining in Software Development Life Cycle

Visualizing the Bug Distribution Information Available in Software Bug Repositories

Implementation of Mining Techniques to Enhance Discovery in Service-Oriented Computing

A Layered Parameterized Framework for Intelligent Information Retrieval in Dynamic Social Network using Data Mining

Combining Semantics and Social Knowledge for News Article Summarization

Population-Based Feature Selection for Biomedical Data Classification

Rule Optimization of Web-Logs Data Using Evolutionary Technique

Mathematical Statistical Examinations on Script Relics

Data Mining and Analysis in the Engineering Field - Advances in Data Mining and Database Management
Latest Publications