Data mining of heterogeneous data with research challenges

Author(s):  
Monika Kalra ◽  
Niranjan Lal
2018 ◽  
Vol 8 (1) ◽  
pp. 194-209 ◽  
Author(s):  
Büsra Güvenoglu ◽  
Belgin Ergenç Bostanoglu

AbstractData mining is a popular research area that has been studied by many researchers and focuses on finding unforeseen and important information in large databases. One of the popular data structures used to represent large heterogeneous data in the field of data mining is graphs. So, graph mining is one of the most popular subdivisions of data mining. Subgraphs that are more frequently encountered than the user-defined threshold in a database are called frequent subgraphs. Frequent subgraphs in a database can give important information about this database. Using this information, data can be classified, clustered and indexed. The purpose of this survey is to examine frequent subgraph mining algorithms (i) in terms of frequent subgraph discovery process phases such as candidate generation and frequency calculation, (ii) categorize the algorithms according to their general attributes such as input type, dynamicity of graphs, result type, algorithmic approach they are based on, algorithmic design and graph representation as well as (iii) to discuss the performance of algorithms in comparison to each other and the challenges faced by the algorithms recently.


Data Mining ◽  
2013 ◽  
pp. 816-836
Author(s):  
Farid Bourennani ◽  
Shahryar Rahnamayan

Nowadays, many world-wide universities, research centers, and companies share their own data electronically. Naturally, these data are from heterogeneous types such as text, numerical data, multimedia, and others. From user side, this data should be accessed in a uniform manner, which implies a unified approach for representing and processing data. Furthermore, unified processing of the heterogeneous data types can lead to richer semantic results. In this chapter, we present a unified pre-processing approach that leads to generation of richer semantics of qualitative and quantitative data.


Author(s):  
Jun Yan ◽  
Dou Shen ◽  
Teresa Mah ◽  
Ning Liu ◽  
Zheng Chen ◽  
...  

With the rapid growth of the online advertising market, Behavioral Targeting (BT), which delivers advertisements to users based on understanding of their needs through their behaviors, is attracting more attention. The amount of spend on behaviorally targeted ad spending in the US is projected to reach $4.4 billion in 2012 (Hallerman, 2008). BT is a complex technology, which involves data collection, data mining, audience segmentation, contextual page analysis, predictive modeling and so on. This chapter gives an overview of Behavioral Targeting by introducing the Behavioral Targeting business, followed by classic BT research challenges and solution proposals. We will also point out BT research challenges which are currently under-explored in both industry and academia.


2008 ◽  
pp. 26-49 ◽  
Author(s):  
Yong Shi ◽  
Yi Peng ◽  
Gang Kou ◽  
Zhengxin Chen

This chapter provides an overview of a series of multiple criteria optimization-based data mining methods, which utilize multiple criteria programming (MCP) to solve data mining problems, and outlines some research challenges and opportunities for the data mining community. To achieve these goals, this chapter first introduces the basic notions and mathematical formulations for multiple criteria optimization-based classification models, including the multiple criteria linear programming model, multiple criteria quadratic programming model, and multiple criteria fuzzy linear programming model. Then it presents the real-life applications of these models in credit card scoring management, HIV-1 associated dementia (HAD) neuronal dam-age and dropout, and network intrusion detection. Finally, the chapter discusses research challenges and opportunities.


Author(s):  
Yong Shi ◽  
Yi Peng ◽  
Gang Kou ◽  
Zhengxin Chen

This chapter provides an overview of a series of multiple criteria optimization-based data mining methods, which utilize multiple criteria programming (MCP) to solve data mining problems, and outlines some research challenges and opportunities for the data mining community. To achieve these goals, this chapter first introduces the basic notions and mathematical formulations for multiple criteria optimization- based classification models, including the multiple criteria linear programming model, multiple criteria quadratic programming model, and multiple criteria fuzzy linear programming model. Then it presents the real-life applications of these models in credit card scoring management, HIV-1 associated dementia (HAD) neuronal damage and dropout, and network intrusion detection. Finally, the chapter discusses research challenges and opportunities.


2009 ◽  
Vol 8 (1) ◽  
pp. 56-70 ◽  
Author(s):  
Chen Yu ◽  
Yiwen Zhong ◽  
Thomas Smith ◽  
Ikhyun Park ◽  
Weixia Huang

With advances in computing techniques, a large amount of high-resolution high-quality multimedia data (video and audio, and so on) has been collected in research laboratories in various scientific disciplines, particularly in cognitive and behavioral studies. How to automatically and effectively discover new knowledge from rich multimedia data poses a compelling challenge because most state-of-the-art data mining techniques can only search and extract pre-defined patterns or knowledge from complex heterogeneous data. In light of this challenge, we propose a hybrid approach that allows scientists to use data mining as a first pass, and then forms a closed loop of visual analysis of current results followed by more data mining work inspired by visualization, the results of which can be in turn visualized and lead to the next round of visual exploration and analysis. In this way, new insights and hypotheses gleaned from the raw data and the current level of analysis can contribute to further analysis. As a first step toward this goal, we implement a visualization system with three critical components: (1) a smooth interface between visualization and data mining; (2) a flexible tool to explore and query temporal data derived from raw multimedia data; and (3) a seamless interface between raw multimedia data and derived data. We have developed various ways to visualize both temporal correlations and statistics of multiple derived variables as well as conditional and high-order statistics. Our visualization tool allows users to explore, compare and analyze multi-stream derived variables and simultaneously switch to access raw multimedia data.


Nowadays health is considered as a backbone in terms of performance based on Internet of things (IoT devices), which turned out to be important in diagnosing health level of person with the type of disease a person is suffering with plus its severity level. Basically, IoT sensors operate on medical devices produce large volume of dynamic data. The fluctuation in health data, which forced to use data mining tools and techniques for extracting useful data. Therefore, for applying data mining techniques, heterogeneous data needs to be preprocessed. Therefore, by refining the collection of data, health parametric data mining yields better results with associated benefits. The decision tree is proposed in order to consolidate the health attributes of the students to decide the metrics of health scale. This could lead to evaluate the level of performance of the student in class. After mining the student’s health data it is passed to K-Fold cross validation check, so that to determine the accuracy, error rate, precision and recall. The proposed method is considered as an enhanced diagnosis method with fixed patterns for decision tree to make precise decisions. By considering a case study of student’s health prediction based on certain attributes with its levels, the diagnostic such as pattern based using K-NN and decision tree algorithm are tested on trained dataset using WEKA tool. At the end, the comparison of different algorithms will be reflected to generalize the introduction of optimized classification algorithm.


Author(s):  
Pushpa Mannava

'Big Data' has spread quickly in the framework of Data Mining as well as Business Intelligence. This brand-new circumstance can be de?ned by means of those troubles that can not be efficiently or ef?ciently resolved making use of the common computing resources that we currently have. We have to highlight that Big Data does not simply imply huge volumes of data but likewise the requirement for scalability, i.e., to make sure a response in an acceptable elapsed time. This paper discusses about the research challenges and technology progress of data mining with big data.


Sign in / Sign up

Export Citation Format

Share Document