Data mining of heterogeneous data with research challenges

AbstractData mining is a popular research area that has been studied by many researchers and focuses on finding unforeseen and important information in large databases. One of the popular data structures used to represent large heterogeneous data in the field of data mining is graphs. So, graph mining is one of the most popular subdivisions of data mining. Subgraphs that are more frequently encountered than the user-defined threshold in a database are called frequent subgraphs. Frequent subgraphs in a database can give important information about this database. Using this information, data can be classified, clustered and indexed. The purpose of this survey is to examine frequent subgraph mining algorithms (i) in terms of frequent subgraph discovery process phases such as candidate generation and frequency calculation, (ii) categorize the algorithms according to their general attributes such as input type, dynamicity of graphs, result type, algorithmic approach they are based on, algorithmic design and graph representation as well as (iii) to discuss the performance of algorithms in comparison to each other and the challenges faced by the algorithms recently.

Download Full-text

Heterogeneous Text and Numerical Data Mining with Possible Applications in Business and Financial Sectors

Data Mining ◽

10.4018/978-1-4666-2455-9.ch042 ◽

2013 ◽

pp. 816-836

Author(s):

Farid Bourennani ◽

Shahryar Rahnamayan

Keyword(s):

Data Mining ◽

Quantitative Data ◽

World Wide ◽

Numerical Data ◽

Heterogeneous Data ◽

Research Centers ◽

Unified Approach ◽

Data Types ◽

Qualitative And Quantitative ◽

Uniform Manner

Nowadays, many world-wide universities, research centers, and companies share their own data electronically. Naturally, these data are from heterogeneous types such as text, numerical data, multimedia, and others. From user side, this data should be accessed in a uniform manner, which implies a unified approach for representing and processing data. Furthermore, unified processing of the heterogeneous data types can lead to richer semantic results. In this chapter, we present a unified pre-processing approach that leads to generation of richer semantics of qualitative and quantitative data.

Download Full-text

Behavioral Targeting Online Advertising

Advances in Multimedia and Interactive Technologies - Online Multimedia Advertising ◽

10.4018/978-1-60960-189-8.ch012 ◽

2011 ◽

pp. 213-232 ◽

Cited By ~ 2

Author(s):

Jun Yan ◽

Dou Shen ◽

Teresa Mah ◽

Ning Liu ◽

Zheng Chen ◽

...

Keyword(s):

Data Mining ◽

Data Collection ◽

Predictive Modeling ◽

Rapid Growth ◽

Online Advertising ◽

Research Challenges ◽

Behavioral Targeting ◽

The Us ◽

Collection Data ◽

Complex Technology

With the rapid growth of the online advertising market, Behavioral Targeting (BT), which delivers advertisements to users based on understanding of their needs through their behaviors, is attracting more attention. The amount of spend on behaviorally targeted ad spending in the US is projected to reach $4.4 billion in 2012 (Hallerman, 2008). BT is a complex technology, which involves data collection, data mining, audience segmentation, contextual page analysis, predictive modeling and so on. This chapter gives an overview of Behavioral Targeting by introducing the Behavioral Targeting business, followed by classic BT research challenges and solution proposals. We will also point out BT research challenges which are currently under-explored in both industry and academia.

Download Full-text

Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch004 ◽

2008 ◽

pp. 26-49 ◽

Cited By ~ 1

Author(s):

Yong Shi ◽

Yi Peng ◽

Gang Kou ◽

Zhengxin Chen

Keyword(s):

Data Mining ◽

Linear Programming ◽

Credit Card ◽

Programming Model ◽

Real Life ◽

Multiple Criteria ◽

Linear Programming Model ◽

Multiple Criteria Optimization ◽

Research Challenges ◽

Challenges And Opportunities

This chapter provides an overview of a series of multiple criteria optimization-based data mining methods, which utilize multiple criteria programming (MCP) to solve data mining problems, and outlines some research challenges and opportunities for the data mining community. To achieve these goals, this chapter first introduces the basic notions and mathematical formulations for multiple criteria optimization-based classification models, including the multiple criteria linear programming model, multiple criteria quadratic programming model, and multiple criteria fuzzy linear programming model. Then it presents the real-life applications of these models in credit card scoring management, HIV-1 associated dementia (HAD) neuronal dam-age and dropout, and network intrusion detection. Finally, the chapter discusses research challenges and opportunities.

Download Full-text

Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications

Data Mining Applications for Empowering Knowledge Societies ◽

10.4018/978-1-59904-657-0.ch001 ◽

2009 ◽

pp. 1-25

Author(s):

Yong Shi ◽

Yi Peng ◽

Gang Kou ◽

Zhengxin Chen

Keyword(s):

Data Mining ◽

Linear Programming ◽

Programming Model ◽

Neuronal Damage ◽

Real Life ◽

Multiple Criteria ◽

Linear Programming Model ◽

Multiple Criteria Optimization ◽

Research Challenges ◽

Challenges And Opportunities

This chapter provides an overview of a series of multiple criteria optimization-based data mining methods, which utilize multiple criteria programming (MCP) to solve data mining problems, and outlines some research challenges and opportunities for the data mining community. To achieve these goals, this chapter first introduces the basic notions and mathematical formulations for multiple criteria optimization- based classification models, including the multiple criteria linear programming model, multiple criteria quadratic programming model, and multiple criteria fuzzy linear programming model. Then it presents the real-life applications of these models in credit card scoring management, HIV-1 associated dementia (HAD) neuronal damage and dropout, and network intrusion detection. Finally, the chapter discusses research challenges and opportunities.

Download Full-text

Visual Data Mining of Multimedia Data for Social and Behavioral Studies

Information Visualization ◽

10.1057/ivs.2008.32 ◽

2009 ◽

Vol 8 (1) ◽

pp. 56-70 ◽

Cited By ~ 13

Author(s):

Chen Yu ◽

Yiwen Zhong ◽

Thomas Smith ◽

Ikhyun Park ◽

Weixia Huang

Keyword(s):

Data Mining ◽

Visual Analysis ◽

Hybrid Approach ◽

Heterogeneous Data ◽

Multimedia Data ◽

Visual Exploration ◽

Visual Data Mining ◽

Flexible Tool ◽

Visualization System ◽

Behavioral Studies

With advances in computing techniques, a large amount of high-resolution high-quality multimedia data (video and audio, and so on) has been collected in research laboratories in various scientific disciplines, particularly in cognitive and behavioral studies. How to automatically and effectively discover new knowledge from rich multimedia data poses a compelling challenge because most state-of-the-art data mining techniques can only search and extract pre-defined patterns or knowledge from complex heterogeneous data. In light of this challenge, we propose a hybrid approach that allows scientists to use data mining as a first pass, and then forms a closed loop of visual analysis of current results followed by more data mining work inspired by visualization, the results of which can be in turn visualized and lead to the next round of visual exploration and analysis. In this way, new insights and hypotheses gleaned from the raw data and the current level of analysis can contribute to further analysis. As a first step toward this goal, we implement a visualization system with three critical components: (1) a smooth interface between visualization and data mining; (2) a flexible tool to explore and query temporal data derived from raw multimedia data; and (3) a seamless interface between raw multimedia data and derived data. We have developed various ways to visualize both temporal correlations and statistics of multiple derived variables as well as conditional and high-order statistics. Our visualization tool allows users to explore, compare and analyze multi-stream derived variables and simultaneously switch to access raw multimedia data.

Download Full-text

A flexible architecture for data mining from heterogeneous data sources in automated production systems

2017 IEEE International Conference on Industrial Technology (ICIT) ◽

10.1109/icit.2017.7915517 ◽

2017 ◽

Cited By ~ 8

Author(s):

Emanuel Trunzer ◽

Iris Kirchen ◽

Jens Folmer ◽

Gennadiy Koltun ◽

Birgit Vogel-Heuser

Keyword(s):

Data Mining ◽

Production Systems ◽

Heterogeneous Data ◽

Data Sources ◽

Automated Production ◽

Heterogeneous Data Sources ◽

Flexible Architecture

Download Full-text

Performance Analysis of Student Healthcare Dataset using Classification Algorithm

Journal of Applied and Emerging Sciences ◽

10.36785/buitems.jaes.278 ◽

2019 ◽

pp. 130-137

Keyword(s):

Data Mining ◽

Decision Tree ◽

Heterogeneous Data ◽

Health Data ◽

Classification Algorithm ◽

Parametric Data ◽

Diagnosis Method ◽

Iot Devices ◽

Tools And Techniques

Nowadays health is considered as a backbone in terms of performance based on Internet of things (IoT devices), which turned out to be important in diagnosing health level of person with the type of disease a person is suffering with plus its severity level. Basically, IoT sensors operate on medical devices produce large volume of dynamic data. The fluctuation in health data, which forced to use data mining tools and techniques for extracting useful data. Therefore, for applying data mining techniques, heterogeneous data needs to be preprocessed. Therefore, by refining the collection of data, health parametric data mining yields better results with associated benefits. The decision tree is proposed in order to consolidate the health attributes of the students to decide the metrics of health scale. This could lead to evaluate the level of performance of the student in class. After mining the student’s health data it is passed to K-Fold cross validation check, so that to determine the accuracy, error rate, precision and recall. The proposed method is considered as an enhanced diagnosis method with fixed patterns for decision tree to make precise decisions. By considering a case study of student’s health prediction based on certain attributes with its levels, the diagnostic such as pattern based using K-NN and decision tree algorithm are tested on trained dataset using WEKA tool. At the end, the comparison of different algorithms will be reflected to generalize the introduction of optimized classification algorithm.

Download Full-text

Research Challenges and Technology Progress of Data Mining with Bigdata

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206274 ◽

2019 ◽

pp. 308-315

Author(s):

Pushpa Mannava

Keyword(s):

Data Mining ◽

Big Data ◽

Business Intelligence ◽

Elapsed Time ◽

Research Challenges ◽

The Common

'Big Data' has spread quickly in the framework of Data Mining as well as Business Intelligence. This brand-new circumstance can be de?ned by means of those troubles that can not be efficiently or ef?ciently resolved making use of the common computing resources that we currently have. We have to highlight that Big Data does not simply imply huge volumes of data but likewise the requirement for scalability, i.e., to make sure a response in an acceptable elapsed time. This paper discusses about the research challenges and technology progress of data mining with big data.

Download Full-text

Data mining of heterogeneous data with research challenges

PCA for heterogeneous data sets in a distributed data mining

A qualitative survey on frequent subgraph mining

Heterogeneous Text and Numerical Data Mining with Possible Applications in Business and Financial Sectors

Behavioral Targeting Online Advertising

Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications

Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications

Visual Data Mining of Multimedia Data for Social and Behavioral Studies

A flexible architecture for data mining from heterogeneous data sources in automated production systems

Performance Analysis of Student Healthcare Dataset using Classification Algorithm

Research Challenges and Technology Progress of Data Mining with Bigdata

Export Citation Format