Advanced Dimensionality Reduction Method for Big Data

2016 ◽

pp. 198-210 ◽

Cited By ~ 1

Author(s):

Sufal Das ◽

Hemanta Kumar Kalita

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Big Data ◽

Dimensionality Reduction ◽

Association Rule ◽

Computation Time ◽

Distribution Management ◽

Redundant Data ◽

Data Mining Algorithms ◽

Mining Algorithms

The growing glut of data in the worlds of science, business and government create an urgent need for consideration of big data. Big data is a term that describes large volumes of high velocity, complex and variable data that require advanced techniques and technologies to enable the capture, storage, distribution, management, and analysis of the information. Big data challenge is becoming one of the most exciting opportunities for the next years. Data mining algorithms like association rule mining perform an exhaustive search to find all rules satisfying some constraints. it is clear that it is difficult to identify the most effective rule from big data. A novel method for feature selection and extraction has been introduced for big data using genetic algorithm. Dimensionality reduction can be considered a problem of global combinatorial optimization in machine learning, which reduces the number of features, removes irrelevant, noisy and redundant data, to obtain the accuracy and saves the computation time and simplifies the result. A genetic algorithm was developed based approach utilizing a feedback linkage between feature selection and association rule using MapReduce for big data.

Download Full-text

Finding Efficient Positive and Negative Itemsets Using Interestingness Measures

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.36.24133 ◽

2018 ◽

Vol 7 (4.36) ◽

pp. 533

Author(s):

P. Asha ◽

T. Prem Jacob ◽

A. Pravin

Keyword(s):

Association Rule ◽

Association Rule Mining ◽

Data Gathering ◽

Unstructured Data ◽

Rule Mining ◽

Interestingness Measures ◽

Data Mining Algorithms ◽

Data Formats ◽

Mining Algorithms ◽

Rare Itemsets

Currently, data gathering techniques have increased through which unstructured data creeps in, along with well defined data formats. Mining these data and bringing out useful patterns seems difficult. Various data mining algorithms were put forth for this purpose. The associated patterns generated by the association rule mining algorithms are large in number. Every ARM focuses on positive rule mining and very few literature has focussed on rare_itemsets_mining. The work aims at retrieving the rare itemsets that are of most interest to the user by utilizing various interestingness measures. Both positive and negative itemset mining would be focused in this work.

Download Full-text

Recent Neuro-Fuzzy Approaches for Feature Selection and Classification

Exploring Critical Approaches of Evolutionary Computation - Advances in Computer and Electrical Engineering ◽

10.4018/978-1-5225-5832-3.ch001 ◽

2019 ◽

pp. 1-19 ◽

Cited By ~ 2

Author(s):

Heisnam Rohen Singh ◽

Saroj Kr Biswas ◽

Monali Bordoloi

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Computation Time ◽

Prediction Performance ◽

Learning Problems ◽

Fuzzy Approach ◽

Symbolic Forms ◽

Redundant Data ◽

Neuro Fuzzy ◽

And Performance

Classification is the task of assigning objects to one of several predefined categories. However, developing a classification system is mostly hampered by the size of data. With the increase in the dimension of data, the chance of irrelevant, redundant, and noisy features or attributes also increases. Feature selection acts as a catalyst in reducing computation time and dimensionality, enhancing prediction performance or accuracy, and curtailing irrelevant or redundant data. The neuro-fuzzy approach is used for feature selection and classification with better insight by representing knowledge in symbolic forms. The neuro-fuzzy approach combines the merits of neural network and fuzzy logic to solve many complex machine learning problems. The objective of this article is to provide a generic introduction and a recent survey to neuro-fuzzy approaches for feature selection and classification in a wide area of machine learning problems. Some of the existing neuro-fuzzy models are also applied to standard datasets to demonstrate their applicability and performance.

Download Full-text

A Survey of Feature Selection Techniques

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch289 ◽

2011 ◽

pp. 1888-1895 ◽

Cited By ~ 17

Author(s):

Barak Chizi ◽

Lior Rokach ◽

Oded Maimon

Keyword(s):

Data Mining ◽

Feature Selection ◽

Mining Method ◽

Data Set ◽

Data Mining Method ◽

Data Mining Algorithms ◽

Wrapper Approach ◽

Computationally Intensive ◽

Filter Approach ◽

Mining Algorithms

Dimensionality (i.e., the number of data set attributes or groups of attributes) constitutes a serious obstacle to the efficiency of most data mining algorithms (Maimon and Last, 2000). The main reason for this is that data mining algorithms are computationally intensive. This obstacle is sometimes known as the “curse of dimensionality” (Bellman, 1961). The objective of Feature Selection is to identify features in the data-set as important, and discard any other feature as irrelevant and redundant information. Since Feature Selection reduces the dimensionality of the data, data mining algorithms can be operated faster and more effectively by using Feature Selection. In some cases, as a result of feature selection, the performance of the data mining method can be improved. The reason for that is mainly a more compact, easily interpreted representation of the target concept. The filter approach (Kohavi , 1995; Kohavi and John ,1996) operates independently of the data mining method employed subsequently -- undesirable features are filtered out of the data before learning begins. These algorithms use heuristics based on general characteristics of the data to evaluate the merit of feature subsets. A sub-category of filter methods that will be refer to as rankers, are methods that employ some criterion to score each feature and provide a ranking. From this ordering, several feature subsets can be chosen by manually setting There are three main approaches for feature selection: wrapper, filter and embedded. The wrapper approach (Kohavi, 1995; Kohavi and John,1996), uses an inducer as a black box along with a statistical re-sampling technique such as cross-validation to select the best feature subset according to some predictive measure. The embedded approach (see for instance Guyon and Elisseeff, 2003) is similar to the wrapper approach in the sense that the features are specifically selected for a certain inducer, but it selects the features in the process of learning.

Download Full-text

Association Rule Mining of Relational Data

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch015 ◽

2011 ◽

pp. 87-93

Author(s):

Anne Denton

Keyword(s):

Data Mining ◽

Data Structures ◽

Association Rule ◽

Association Rule Mining ◽

Relational Data ◽

Rule Mining ◽

Data Mining Algorithms ◽

Mining Algorithms ◽

Relational Database Management ◽

Relational Database Management Systems

Most data of practical relevance are structured in more complex ways than is assumed in traditional data mining algorithms, which are based on a single table. The concept of relations allows for discussing many data structures such as trees and graphs. Relational data have much generality and are of significant importance, as demonstrated by the ubiquity of relational database management systems. It is, therefore, not surprising that popular data mining techniques, such as association rule mining, have been generalized to relational data. An important aspect of the generalization process is the identification of challenges that are new to the generalized setting.

Download Full-text

Big Data Mining Algorithms for Predicting Dynamic Product Price by Online Analysis

Advances in Intelligent Systems and Computing - Computational Intelligence in Data Mining ◽

10.1007/978-981-13-8676-3_59 ◽

2019 ◽

pp. 701-708

Author(s):

Manjushree Nayak ◽

Bhavana Narain

Keyword(s):

Data Mining ◽

Big Data ◽

Product Price ◽

Online Analysis ◽

Big Data Mining ◽

Data Mining Algorithms ◽

Mining Algorithms

Download Full-text

Parallel Primitives for Vendor-Agnostic Implementation of Big Data Mining Algorithms

2018 32nd International Conference on Advanced Information Networking and Applications Workshops (WAINA) ◽

10.1109/waina.2018.00118 ◽

2018 ◽

Author(s):

Cesare Bandirali ◽

Stefano Lodi ◽

Gianluca Moro ◽

Andrea Pagliarani ◽

Claudio Sartori

Keyword(s):

Data Mining ◽

Big Data ◽

Big Data Mining ◽

Data Mining Algorithms ◽

Mining Algorithms

Download Full-text

A New Approach for Wrapper Feature Selection Using Genetic Algorithm for Big Data

Proceedings in Adaptation, Learning and Optimization - Intelligent and Evolutionary Systems ◽

10.1007/978-3-319-27000-5_6 ◽

2015 ◽

pp. 75-83 ◽

Cited By ~ 2

Author(s):

Waad Bouaguel

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Big Data ◽

New Approach ◽

Wrapper Feature Selection

Download Full-text

A Hybrid Feature Selection Algorithm for Big Data Dimensionality Reduction

International Journal of Advanced Intelligence Paradigms ◽

10.1504/ijaip.2021.10027472 ◽

2021 ◽

Vol 19 (1/2) ◽

pp. 1

Author(s):

B. Bharathi ◽

M. D. Anto Praveena

Keyword(s):

Feature Selection ◽

Big Data ◽

Dimensionality Reduction ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Data Dimensionality Reduction

Download Full-text

A NEW ENSEMBLE METHOD FOR FEATURE RANKING IN TEXT MINING

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213013500103 ◽

2013 ◽

Vol 22 (03) ◽

pp. 1350010 ◽

Cited By ~ 6

Author(s):

SABEREH SADEGHI ◽

HAMID BEIGY

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Dimensionality Reduction ◽

Ensemble Methods ◽

Ease Of Use ◽

Convergence Time ◽

Feature Ranking ◽

Initial Population ◽

Feature Subset ◽

Ranking Methods

Dimensionality reduction is a necessary task in data mining when working with high dimensional data. A type of dimensionality reduction is feature selection. Feature selection based on feature ranking has received much attention by researchers. The major reasons are its scalability, ease of use, and fast computation. Feature ranking methods can be divided into different categories and may use different measures for ranking features. Recently, ensemble methods have entered in the field of ranking and achieved more accuracy among others. Accordingly, in this paper a Heterogeneous ensemble based algorithm for feature ranking is proposed. The base ranking methods in this ensemble structure are chosen from different categories like information theoretic, distance based, and statistical methods. The results of the base ranking methods are then fused into a final feature subset by means of genetic algorithm. The diversity of the base methods improves the quality of initial population of the genetic algorithm and thus reducing the convergence time of the genetic algorithm. In most of ranking methods, it's the user's task to determine the threshold for choosing the appropriate subset of features. It is a problem, which may cause the user to try many different values to select a good one. In the proposed algorithm, the difficulty of determining a proper threshold by the user is decreased. The performance of the algorithm is evaluated on four different text datasets and the experimental results show that the proposed method outperforms all other five feature ranking methods used for comparison. One advantage of the proposed method is that it is independent to the classification method used for classification.

Download Full-text