Solving Heterogeneous Big Data Mining Problems Using Multi-Objective Optimization

2019 ◽  
Vol 10 (4) ◽  
pp. 18-37
Author(s):  
Farid Bourennani

Nowadays, we have access to unprecedented quantities of data composed of heterogeneous data types (HDT). Heterogeneous data mining (HDM) is a new research area that focuses on the processing of HDT. Usually, input data is transformed into an algebraic model before data processing. However, how to combine the representations of HDT into a single model for a unified processing of big data is an open question. In this article, the authors attempt to find answers to this question by solving a data integration (DI) problem which involves the processing of seven HDT. They propose to solve the DI problem by combining multi-objective optimization and self-organizing maps to find optimal parameters settings for most accurate HDM results. The preliminary results are promising, and a post processing algorithm is proposed which makes the DI operations much simpler and more accurate.

2016 ◽  
Vol 16 (5) ◽  
pp. 69-77 ◽  
Author(s):  
Wenquan Yi ◽  
Fei Teng ◽  
Jianfeng Xu

Abstract Stream data mining has been a hot topic for research in the data mining research area in recent years, as it has an extensive application prospect in big data ages. Research on stream data mining mainly focuses on frequent item sets mining, clustering and classification. However, traditional steam data mining methods are not effective enough for handling high dimensional data set because these methods are not fit for the characteristics of stream data. So, these traditional stream data mining methods need to be enhanced for big data applications. To resolve this issue, a hybrid framework is proposed for big steam data mining. In this framework, online and offline model are organized for different tasks, the interior of each model is rationally organized according to different mining tasks. This framework provides a new research idea and macro perspective for stream data mining under the background of big data.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Ikbal Taleb ◽  
Mohamed Adel Serhani ◽  
Chafik Bouhaddioui ◽  
Rachida Dssouli

AbstractBig Data is an essential research area for governments, institutions, and private agencies to support their analytics decisions. Big Data refers to all about data, how it is collected, processed, and analyzed to generate value-added data-driven insights and decisions. Degradation in Data Quality may result in unpredictable consequences. In this case, confidence and worthiness in the data and its source are lost. In the Big Data context, data characteristics, such as volume, multi-heterogeneous data sources, and fast data generation, increase the risk of quality degradation and require efficient mechanisms to check data worthiness. However, ensuring Big Data Quality (BDQ) is a very costly and time-consuming process, since excessive computing resources are required. Maintaining Quality through the Big Data lifecycle requires quality profiling and verification before its processing decision. A BDQ Management Framework for enhancing the pre-processing activities while strengthening data control is proposed. The proposed framework uses a new concept called Big Data Quality Profile. This concept captures quality outline, requirements, attributes, dimensions, scores, and rules. Using Big Data profiling and sampling components of the framework, a faster and efficient data quality estimation is initiated before and after an intermediate pre-processing phase. The exploratory profiling component of the framework plays an initial role in quality profiling; it uses a set of predefined quality metrics to evaluate important data quality dimensions. It generates quality rules by applying various pre-processing activities and their related functions. These rules mainly aim at the Data Quality Profile and result in quality scores for the selected quality attributes. The framework implementation and dataflow management across various quality management processes have been discussed, further some ongoing work on framework evaluation and deployment to support quality evaluation decisions conclude the paper.


2018 ◽  
Vol 8 (1) ◽  
pp. 194-209 ◽  
Author(s):  
Büsra Güvenoglu ◽  
Belgin Ergenç Bostanoglu

AbstractData mining is a popular research area that has been studied by many researchers and focuses on finding unforeseen and important information in large databases. One of the popular data structures used to represent large heterogeneous data in the field of data mining is graphs. So, graph mining is one of the most popular subdivisions of data mining. Subgraphs that are more frequently encountered than the user-defined threshold in a database are called frequent subgraphs. Frequent subgraphs in a database can give important information about this database. Using this information, data can be classified, clustered and indexed. The purpose of this survey is to examine frequent subgraph mining algorithms (i) in terms of frequent subgraph discovery process phases such as candidate generation and frequency calculation, (ii) categorize the algorithms according to their general attributes such as input type, dynamicity of graphs, result type, algorithmic approach they are based on, algorithmic design and graph representation as well as (iii) to discuss the performance of algorithms in comparison to each other and the challenges faced by the algorithms recently.


Web Services ◽  
2019 ◽  
pp. 105-126
Author(s):  
N. Nawin Sona

This chapter aims to give an overview of the wide range of Big Data approaches and technologies today. The data features of Volume, Velocity, and Variety are examined against new database technologies. It explores the complexity of data types, methodologies of storage, access and computation, current and emerging trends of data analysis, and methods of extracting value from data. It aims to address the need for clarity regarding the future of RDBMS and the newer systems. And it highlights the methods in which Actionable Insights can be built into public sector domains, such as Machine Learning, Data Mining, Predictive Analytics and others.


Data Mining ◽  
2013 ◽  
pp. 816-836
Author(s):  
Farid Bourennani ◽  
Shahryar Rahnamayan

Nowadays, many world-wide universities, research centers, and companies share their own data electronically. Naturally, these data are from heterogeneous types such as text, numerical data, multimedia, and others. From user side, this data should be accessed in a uniform manner, which implies a unified approach for representing and processing data. Furthermore, unified processing of the heterogeneous data types can lead to richer semantic results. In this chapter, we present a unified pre-processing approach that leads to generation of richer semantics of qualitative and quantitative data.


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Roberto Salazar-Reyna ◽  
Fernando Gonzalez-Aleu ◽  
Edgar M.A. Granda-Gutierrez ◽  
Jenny Diaz-Ramirez ◽  
Jose Arturo Garza-Reyes ◽  
...  

PurposeThe objective of this paper is to assess and synthesize the published literature related to the application of data analytics, big data, data mining and machine learning to healthcare engineering systems.Design/methodology/approachA systematic literature review (SLR) was conducted to obtain the most relevant papers related to the research study from three different platforms: EBSCOhost, ProQuest and Scopus. The literature was assessed and synthesized, conducting analysis associated with the publications, authors and content.FindingsFrom the SLR, 576 publications were identified and analyzed. The research area seems to show the characteristics of a growing field with new research areas evolving and applications being explored. In addition, the main authors and collaboration groups publishing in this research area were identified throughout a social network analysis. This could lead new and current authors to identify researchers with common interests on the field.Research limitations/implicationsThe use of the SLR methodology does not guarantee that all relevant publications related to the research are covered and analyzed. However, the authors' previous knowledge and the nature of the publications were used to select different platforms.Originality/valueTo the best of the authors' knowledge, this paper represents the most comprehensive literature-based study on the fields of data analytics, big data, data mining and machine learning applied to healthcare engineering systems.


2014 ◽  
Vol 23 (05) ◽  
pp. 1450004 ◽  
Author(s):  
Ibrahim S. Alwatban ◽  
Ahmed Z. Emam

In recent years, a new research area known as privacy preserving data mining (PPDM) has emerged and captured the attention of many researchers interested in preventing the privacy violations that may occur during data mining. In this paper, we provide a review of studies on PPDM in the context of association rules (PPARM). This paper systematically defines the scope of this survey and determines the PPARM models. The problems of each model are formally described, and we discuss the relevant approaches, techniques and algorithms that have been proposed in the literature. A profile of each model and the accompanying algorithms are provided with a comparison of the PPARM models.


2013 ◽  
Vol 442 ◽  
pp. 419-423
Author(s):  
Ming Song Li

Problem of multi-objective optimization based on Artificial Immune System (AIS) is an important research area of current evolutionary computing. Starting from the intelligent information processing mechanism of immune theory and the immune system itself, a kind of evolutionary multi-objective optimization algorithm based on AIS is proposed. Clonal selection, scattered crossover and hypermutation based on the learning mechanism are characteristics of the algorithm. Algorithm implements clonal selection according to the distribution of individuals in the objective space, which benefit obtaining Pareto optimal boundary distributed more widely and speed up the convergence. Compared with the existing algorithms, the algorithm has been greatly improved in convergence, diversity, and distribution of solutions.


Sign in / Sign up

Export Citation Format

Share Document