Solving Heterogeneous Big Data Mining Problems Using Multi-Objective Optimization

Nowadays, we have access to unprecedented quantities of data composed of heterogeneous data types (HDT). Heterogeneous data mining (HDM) is a new research area that focuses on the processing of HDT. Usually, input data is transformed into an algebraic model before data processing. However, how to combine the representations of HDT into a single model for a unified processing of big data is an open question. In this article, the authors attempt to find answers to this question by solving a data integration (DI) problem which involves the processing of seven HDT. They propose to solve the DI problem by combining multi-objective optimization and self-organizing maps to find optimal parameters settings for most accurate HDM results. The preliminary results are promising, and a post processing algorithm is proposed which makes the DI operations much simpler and more accurate.

Download Full-text

Noval Stream Data Mining Framework under the Background of Big Data

Cybernetics and Information Technologies ◽

10.1515/cait-2016-0053 ◽

2016 ◽

Vol 16 (5) ◽

pp. 69-77 ◽

Cited By ~ 2

Author(s):

Wenquan Yi ◽

Fei Teng ◽

Jianfeng Xu

Keyword(s):

Data Mining ◽

Big Data ◽

Research Area ◽

Stream Data ◽

Data Set ◽

Stream Data Mining ◽

Big Data Applications ◽

Clustering And Classification ◽

Mining Methods ◽

New Research

Abstract Stream data mining has been a hot topic for research in the data mining research area in recent years, as it has an extensive application prospect in big data ages. Research on stream data mining mainly focuses on frequent item sets mining, clustering and classification. However, traditional steam data mining methods are not effective enough for handling high dimensional data set because these methods are not fit for the characteristics of stream data. So, these traditional stream data mining methods need to be enhanced for big data applications. To resolve this issue, a hybrid framework is proposed for big steam data mining. In this framework, online and offline model are organized for different tasks, the interior of each model is rationally organized according to different mining tasks. This framework provides a new research idea and macro perspective for stream data mining under the background of big data.

Download Full-text

Big data quality framework: a holistic approach to continuous quality management

Journal Of Big Data ◽

10.1186/s40537-021-00468-0 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Ikbal Taleb ◽

Mohamed Adel Serhani ◽

Chafik Bouhaddioui ◽

Rachida Dssouli

Keyword(s):

Big Data ◽

Quality Management ◽

Data Quality ◽

Value Added ◽

Holistic Approach ◽

Research Area ◽

Heterogeneous Data ◽

Data Generation ◽

Continuous Quality ◽

Quality Profile

AbstractBig Data is an essential research area for governments, institutions, and private agencies to support their analytics decisions. Big Data refers to all about data, how it is collected, processed, and analyzed to generate value-added data-driven insights and decisions. Degradation in Data Quality may result in unpredictable consequences. In this case, confidence and worthiness in the data and its source are lost. In the Big Data context, data characteristics, such as volume, multi-heterogeneous data sources, and fast data generation, increase the risk of quality degradation and require efficient mechanisms to check data worthiness. However, ensuring Big Data Quality (BDQ) is a very costly and time-consuming process, since excessive computing resources are required. Maintaining Quality through the Big Data lifecycle requires quality profiling and verification before its processing decision. A BDQ Management Framework for enhancing the pre-processing activities while strengthening data control is proposed. The proposed framework uses a new concept called Big Data Quality Profile. This concept captures quality outline, requirements, attributes, dimensions, scores, and rules. Using Big Data profiling and sampling components of the framework, a faster and efficient data quality estimation is initiated before and after an intermediate pre-processing phase. The exploratory profiling component of the framework plays an initial role in quality profiling; it uses a set of predefined quality metrics to evaluate important data quality dimensions. It generates quality rules by applying various pre-processing activities and their related functions. These rules mainly aim at the Data Quality Profile and result in quality scores for the selected quality attributes. The framework implementation and dataflow management across various quality management processes have been discussed, further some ongoing work on framework evaluation and deployment to support quality evaluation decisions conclude the paper.

Download Full-text

A qualitative survey on frequent subgraph mining

Open Computer Science ◽

10.1515/comp-2018-0018 ◽

2018 ◽

Vol 8 (1) ◽

pp. 194-209 ◽

Cited By ~ 1

Author(s):

Büsra Güvenoglu ◽

Belgin Ergenç Bostanoglu

Keyword(s):

Data Mining ◽

Graph Mining ◽

Research Area ◽

Heterogeneous Data ◽

Graph Representation ◽

Frequent Subgraph Mining ◽

Subgraph Mining ◽

Frequent Subgraph ◽

Input Type ◽

Frequent Subgraphs

AbstractData mining is a popular research area that has been studied by many researchers and focuses on finding unforeseen and important information in large databases. One of the popular data structures used to represent large heterogeneous data in the field of data mining is graphs. So, graph mining is one of the most popular subdivisions of data mining. Subgraphs that are more frequently encountered than the user-defined threshold in a database are called frequent subgraphs. Frequent subgraphs in a database can give important information about this database. Using this information, data can be classified, clustered and indexed. The purpose of this survey is to examine frequent subgraph mining algorithms (i) in terms of frequent subgraph discovery process phases such as candidate generation and frequency calculation, (ii) categorize the algorithms according to their general attributes such as input type, dynamicity of graphs, result type, algorithmic approach they are based on, algorithmic design and graph representation as well as (iii) to discuss the performance of algorithms in comparison to each other and the challenges faced by the algorithms recently.

Download Full-text

Big Data Models and the Public Sector

Web Services ◽

10.4018/978-1-5225-7501-6.ch007 ◽

2019 ◽

pp. 105-126

Author(s):

N. Nawin Sona

Keyword(s):

Machine Learning ◽

Data Mining ◽

Big Data ◽

Public Sector ◽

Predictive Analytics ◽

Data Types ◽

The Public ◽

Emerging Trends ◽

Wide Range ◽

Learning Data

This chapter aims to give an overview of the wide range of Big Data approaches and technologies today. The data features of Volume, Velocity, and Variety are examined against new database technologies. It explores the complexity of data types, methodologies of storage, access and computation, current and emerging trends of data analysis, and methods of extracting value from data. It aims to address the need for clarity regarding the future of RDBMS and the newer systems. And it highlights the methods in which Actionable Insights can be built into public sector domains, such as Machine Learning, Data Mining, Predictive Analytics and others.

Download Full-text

Heterogeneous Text and Numerical Data Mining with Possible Applications in Business and Financial Sectors

Data Mining ◽

10.4018/978-1-4666-2455-9.ch042 ◽

2013 ◽

pp. 816-836

Author(s):

Farid Bourennani ◽

Shahryar Rahnamayan

Keyword(s):

Data Mining ◽

Quantitative Data ◽

World Wide ◽

Numerical Data ◽

Heterogeneous Data ◽

Research Centers ◽

Unified Approach ◽

Data Types ◽

Qualitative And Quantitative ◽

Uniform Manner

Nowadays, many world-wide universities, research centers, and companies share their own data electronically. Naturally, these data are from heterogeneous types such as text, numerical data, multimedia, and others. From user side, this data should be accessed in a uniform manner, which implies a unified approach for representing and processing data. Furthermore, unified processing of the heterogeneous data types can lead to richer semantic results. In this chapter, we present a unified pre-processing approach that leads to generation of richer semantics of qualitative and quantitative data.

Download Full-text

A systematic literature review of data science, data analytics and machine learning applied to healthcare engineering systems

Management Decision ◽

10.1108/md-01-2020-0035 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Roberto Salazar-Reyna ◽

Fernando Gonzalez-Aleu ◽

Edgar M.A. Granda-Gutierrez ◽

Jenny Diaz-Ramirez ◽

Jose Arturo Garza-Reyes ◽

...

Keyword(s):

Machine Learning ◽

Data Mining ◽

Big Data ◽

Literature Review ◽

Systematic Literature Review ◽

Data Analytics ◽

Research Area ◽

Engineering Systems ◽

Content Type ◽

Healthcare Engineering

PurposeThe objective of this paper is to assess and synthesize the published literature related to the application of data analytics, big data, data mining and machine learning to healthcare engineering systems.Design/methodology/approachA systematic literature review (SLR) was conducted to obtain the most relevant papers related to the research study from three different platforms: EBSCOhost, ProQuest and Scopus. The literature was assessed and synthesized, conducting analysis associated with the publications, authors and content.FindingsFrom the SLR, 576 publications were identified and analyzed. The research area seems to show the characteristics of a growing field with new research areas evolving and applications being explored. In addition, the main authors and collaboration groups publishing in this research area were identified throughout a social network analysis. This could lead new and current authors to identify researchers with common interests on the field.Research limitations/implicationsThe use of the SLR methodology does not guarantee that all relevant publications related to the research are covered and analyzed. However, the authors' previous knowledge and the nature of the publications were used to select different platforms.Originality/valueTo the best of the authors' knowledge, this paper represents the most comprehensive literature-based study on the fields of data analytics, big data, data mining and machine learning applied to healthcare engineering systems.

Download Full-text

Comprehensive Survey on Privacy Preserving Association Rule Mining: Models, Approaches, Techniques and Algorithms

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213014500043 ◽

2014 ◽

Vol 23 (05) ◽

pp. 1450004 ◽

Cited By ~ 5

Author(s):

Ibrahim S. Alwatban ◽

Ahmed Z. Emam

Keyword(s):

Data Mining ◽

Association Rules ◽

Association Rule ◽

Association Rule Mining ◽

Research Area ◽

Privacy Preserving ◽

Rule Mining ◽

Privacy Preserving Data Mining ◽

Comprehensive Survey ◽

New Research

In recent years, a new research area known as privacy preserving data mining (PPDM) has emerged and captured the attention of many researchers interested in preventing the privacy violations that may occur during data mining. In this paper, we provide a review of studies on PPDM in the context of association rules (PPARM). This paper systematically defines the scope of this survey and determines the PPARM models. The problems of each model are formally described, and we discuss the relevant approaches, techniques and algorithms that have been proposed in the literature. A profile of each model and the accompanying algorithms are provided with a comparison of the PPARM models.

Download Full-text

A Kind of Evolutionary Multi-Objective Optimization Algorithm Based on AIS

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.442.419 ◽

2013 ◽

Vol 442 ◽

pp. 419-423

Author(s):

Ming Song Li

Keyword(s):

Immune System ◽

Optimization Algorithm ◽

Clonal Selection ◽

Research Area ◽

Multi Objective Optimization ◽

Multi Objective ◽

Objective Space ◽

Speed Up ◽

Important Research Area ◽

Intelligent Information

Problem of multi-objective optimization based on Artificial Immune System (AIS) is an important research area of current evolutionary computing. Starting from the intelligent information processing mechanism of immune theory and the immune system itself, a kind of evolutionary multi-objective optimization algorithm based on AIS is proposed. Clonal selection, scattered crossover and hypermutation based on the learning mechanism are characteristics of the algorithm. Algorithm implements clonal selection according to the distribution of individuals in the objective space, which benefit obtaining Pareto optimal boundary distributed more widely and speed up the convergence. Compared with the existing algorithms, the algorithm has been greatly improved in convergence, diversity, and distribution of solutions.

Download Full-text