Evolutionary Approach to Dimensionality Reduction

Author(s):  
Amit Saxena ◽  
Megha Kothari ◽  
Navneet Pandey

Excess of data due to different voluminous storage and online devices has become a bottleneck to seek meaningful information therein and we are information wise rich but knowledge wise poor. One of the major problems in extracting knowledge from large databases is the size of dimension i.e. number of features, of databases. More often than not, it is observed that some features do not affect the performance of a classifier. There could be features that are derogatory in nature and degrade the performance of classifiers used subsequently for dimensionality reduction (DR). Thus one can have redundant features, bad features and highly correlated features. Removing such features not only improves the performance of the system but also makes the learning task much simpler. Data mining as a multidisciplinary joint effort from databases, machine learning, and statistics, is championing in turning mountains of data into nuggets (Mitra, Murthy, & Pal, 2002).

Author(s):  
P. K. Nizar Banu ◽  
H. Inbarani

As websites increase in complexity, locating needed information becomes a difficult task. Such difficulty is often related to the websites’ design but also ineffective and inefficient navigation processes. Research in web mining addresses this problem by applying techniques from data mining and machine learning to web data and documents. In this study, the authors examine web usage mining, applying data mining techniques to web server logs. Web usage mining has gained much attention as a potential approach to fulfill the requirement of web personalization. In this paper, the authors propose K-means biclustering, rough biclustering and fuzzy biclustering approaches to disclose the duality between users and pages by grouping them in both dimensions simultaneously. The simultaneous clustering of users and pages discovers biclusters that correspond to groups of users that exhibit highly correlated ratings on groups of pages. The results indicate that the fuzzy C-means biclustering algorithm best and is able to detect partial matching of preferences.


2017 ◽  
Vol 14 (S339) ◽  
pp. 201-201
Author(s):  
M. Lochner

AbstractIn the last decade Astronomy has been transformed by a deluge of data that will grow exponentially when near-future telescopes such as LSST and the SKA begin routine observing. Astroinformatics, a broad field encompassing many techniques in statistics, machine learning and data mining, is the key to extracting meaningful information from large amounts of data. This talk outlined Astroinformatics as a field, and gave a few examples of the use of machine learning and Bayesian statistics from my own work in survey Astronomy. The era of massive surveys in which we now find ourselves has the potential to revolutionise completely many fields, including time-domain Astronomy, but only if coupled with the powerful tools of Astroinformatics.


Author(s):  
Niall Rooney

The concept of ensemble learning has its origins in research from the late 1980s/early 1990s into combining a number of artificial neural networks (ANNs) models for regression tasks. Ensemble learning is now a widely deployed and researched topic within the area of machine learning and data mining. Ensemble learning, as a general definition, refers to the concept of being able to apply more than one learning model to a particular machine learning problem using some method of integration. The desired goal of course is that the ensemble as a unit will outperform any of its individual members for the given learning task. Ensemble learning has been extended to cover other learning tasks such as classification (refer to Kuncheva, 2004 for a detailed overview of this area), online learning (Fern & Givan, 2003) and clustering (Strehl & Ghosh, 2003). The focus of this article is to review ensemble learning with respect to regression, where by regression, we refer to the supervised learning task of creating a model that relates a continuous output variable to a vector of input variables.


2021 ◽  
Author(s):  
Neeraj Kumar ◽  
Upendra Kumar

Abstract Information and Communication Technologies, to a long extent, have a major influence on our social life, economy as well as on worldwide security. Holistically, computer networks embrace the Information Technology. Although the world is never free from people having malicious intents i.e. cyber criminals, network intruders etc. To counter this, Intrusion Detection System (IDS) plays a very significant role in identifying the network intrusions by performing various data analysis tasks. In order to develop robust IDS with accuracy in intrusion detection, various papers have been published over the years using different classification techniques of Data Mining (DM) and Machine Learning (ML) based hybrid approach. The present paper is an in-depth analysis of two focal aspects of Network Intrusion Detection System that includes various pre-processing methods in the form of dimensionality reduction and an assortment of classification techniques. This paper also includes comparative algorithmic analysis of DM and ML techniques, which applied to design an intelligent IDS. An experiment al comparative analysis has been carried out in support the verdicts of this work using ‘Python’ language on ‘kddcup99’ dataset as benchmark . Experimental analysis had been done in which we had found more impact on dimensionality reduction and MLP performed well in the true classification to establish secure network. The motive behind this effort is to detect different kinds of malware as early as possible with accuracy, to provide enhanced observant among various existing techniques that may help the fascinated researchers for future potential works.


Author(s):  
P. K. Nizar Banu ◽  
H. Inbarani

As websites increase in complexity, locating needed information becomes a difficult task. Such difficulty is often related to the websites’ design but also ineffective and inefficient navigation processes. Research in web mining addresses this problem by applying techniques from data mining and machine learning to web data and documents. In this study, the authors examine web usage mining, applying data mining techniques to web server logs. Web usage mining has gained much attention as a potential approach to fulfill the requirement of web personalization. In this paper, the authors propose K-means biclustering, rough biclustering and fuzzy biclustering approaches to disclose the duality between users and pages by grouping them in both dimensions simultaneously. The simultaneous clustering of users and pages discovers biclusters that correspond to groups of users that exhibit highly correlated ratings on groups of pages. The results indicate that the fuzzy C-means biclustering algorithm best and is able to detect partial matching of preferences.


2017 ◽  
Vol 10 (2) ◽  
pp. 282-290
Author(s):  
Samir Singha ◽  
Syed Hassan

The performance of data mining and machine learning tasks can be significantly degraded due to the presence of noisy, irrelevant and high dimensional data containing large number of features. A large amount of real world data consist of noise or missing values. While collecting data, there may be many irrelevant features that are collected by the storage repositories. These redundant and irrelevant feature values distorts the classification principle and simultaneously increases calculations overhead and decreases the prediction ability of the classifier. The high-dimensionality of such datasets possesses major bottleneck in the field of data mining, statistics, machine learning. Among several methods of dimensionality reduction, attribute or feature selection technique is often used in dimensionality reduction. Since the k-NN algorithm is sensitive to irrelevant attributes therefore its performance degrades significantly when a dataset contains missing values or noisy data. However, this weakness of the k-NN algorithm can be minimized when combined with the other feature selection techniques. In this research we combine the Correlation based Feature Selection (CFS) with k-Nearest Neighbour (k-NN) Classification algorithm to find better result in classification when the dataset contains missing values or noisy data. The reduced attribute set decreases the time required for classification. The research shows that when dimensionality reduction is done using CFS and classified with k-NN algorithm, dataset with nil or very less noise may have negative impact in the classification accuracy, when compared with classification accuracy of k-NN algorithm alone. When additional noise is introduced to these datasets, the performance of k-NN degrades significantly. When these noisy datasets are classified using CFS and k-NN together, the percentage in classification accuracy is improved.


2020 ◽  
Author(s):  
Mohammed J. Zaki ◽  
Wagner Meira, Jr
Keyword(s):  

2019 ◽  
Vol 12 (3) ◽  
pp. 171-179 ◽  
Author(s):  
Sachin Gupta ◽  
Anurag Saxena

Background: The increased variability in production or procurement with respect to less increase of variability in demand or sales is considered as bullwhip effect. Bullwhip effect is considered as an encumbrance in optimization of supply chain as it causes inadequacy in the supply chain. Various operations and supply chain management consultants, managers and researchers are doing a rigorous study to find the causes behind the dynamic nature of the supply chain management and have listed shorter product life cycle, change in technology, change in consumer preference and era of globalization, to name a few. Most of the literature that explored bullwhip effect is found to be based on simulations and mathematical models. Exploring bullwhip effect using machine learning is the novel approach of the present study. Methods: Present study explores the operational and financial variables affecting the bullwhip effect on the basis of secondary data. Data mining and machine learning techniques are used to explore the variables affecting bullwhip effect in Indian sectors. Rapid Miner tool has been used for data mining and 10-fold cross validation has been performed. Weka Alternating Decision Tree (w-ADT) has been built for decision makers to mitigate bullwhip effect after the classification. Results: Out of the 19 selected variables affecting bullwhip effect 7 variables have been selected which have highest accuracy level with minimum deviation. Conclusion: Classification technique using machine learning provides an effective tool and techniques to explore bullwhip effect in supply chain management.


Sign in / Sign up

Export Citation Format

Share Document