Evolutionary Approach to Dimensionality Reduction

2011 ◽

Vol 4 (1) ◽

pp. 53-66 ◽

Cited By ~ 1

Author(s):

P. K. Nizar Banu ◽

H. Inbarani

Keyword(s):

Machine Learning ◽

Data Mining ◽

Web Mining ◽

Web Usage Mining ◽

Web Personalization ◽

Partial Matching ◽

Web Usage ◽

Needed Information ◽

Highly Correlated ◽

Web Server Logs

As websites increase in complexity, locating needed information becomes a difficult task. Such difficulty is often related to the websites’ design but also ineffective and inefficient navigation processes. Research in web mining addresses this problem by applying techniques from data mining and machine learning to web data and documents. In this study, the authors examine web usage mining, applying data mining techniques to web server logs. Web usage mining has gained much attention as a potential approach to fulfill the requirement of web personalization. In this paper, the authors propose K-means biclustering, rough biclustering and fuzzy biclustering approaches to disclose the duality between users and pages by grouping them in both dimensions simultaneously. The simultaneous clustering of users and pages discovers biclusters that correspond to groups of users that exhibit highly correlated ratings on groups of pages. The results indicate that the fuzzy C-means biclustering algorithm best and is able to detect partial matching of preferences.

Download Full-text

Unlocking the Universe with Astroinformatics

Proceedings of the International Astronomical Union ◽

10.1017/s1743921318002570 ◽

2017 ◽

Vol 14 (S339) ◽

pp. 201-201

Author(s):

M. Lochner

Keyword(s):

Machine Learning ◽

Data Mining ◽

Bayesian Statistics ◽

Time Domain ◽

Meaningful Information ◽

Broad Field ◽

Near Future ◽

The Universe

AbstractIn the last decade Astronomy has been transformed by a deluge of data that will grow exponentially when near-future telescopes such as LSST and the SKA begin routine observing. Astroinformatics, a broad field encompassing many techniques in statistics, machine learning and data mining, is the key to extracting meaningful information from large amounts of data. This talk outlined Astroinformatics as a field, and gave a few examples of the use of machine learning and Bayesian statistics from my own work in survey Astronomy. The era of massive surveys in which we now find ourselves has the potential to revolutionise completely many fields, including time-domain Astronomy, but only if coupled with the powerful tools of Astroinformatics.

Download Full-text

Ensemble Learning for Regression

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch120 ◽

2011 ◽

pp. 777-782

Author(s):

Niall Rooney

Keyword(s):

Machine Learning ◽

Data Mining ◽

Neural Networks ◽

Ensemble Learning ◽

Learning Task ◽

General Definition ◽

Learning Tasks ◽

Continuous Output ◽

Input Variables ◽

The Given

The concept of ensemble learning has its origins in research from the late 1980s/early 1990s into combining a number of artificial neural networks (ANNs) models for regression tasks. Ensemble learning is now a widely deployed and researched topic within the area of machine learning and data mining. Ensemble learning, as a general definition, refers to the concept of being able to apply more than one learning model to a particular machine learning problem using some method of integration. The desired goal of course is that the ensemble as a unit will outperform any of its individual members for the given learning task. Ensemble learning has been extended to cover other learning tasks such as classification (refer to Kuncheva, 2004 for a detailed overview of this area), online learning (Fern & Givan, 2003) and clustering (Strehl & Ghosh, 2003). The focus of this article is to review ensemble learning with respect to regression, where by regression, we refer to the supervised learning task of creating a model that relates a continuous output variable to a vector of input variables.

Download Full-text

Diverse Analysis of Data Mining and Machine Learning Algorithms to Secure Computer Network

10.21203/rs.3.rs-305354/v1 ◽

2021 ◽

Author(s):

Neeraj Kumar ◽

Upendra Kumar

Keyword(s):

Machine Learning ◽

Data Mining ◽

Intrusion Detection ◽

Dimensionality Reduction ◽

Intrusion Detection System ◽

Detection System ◽

Machine Learning Algorithms ◽

Classification Techniques ◽

Network Intrusion ◽

Depth Analysis

Abstract Information and Communication Technologies, to a long extent, have a major influence on our social life, economy as well as on worldwide security. Holistically, computer networks embrace the Information Technology. Although the world is never free from people having malicious intents i.e. cyber criminals, network intruders etc. To counter this, Intrusion Detection System (IDS) plays a very significant role in identifying the network intrusions by performing various data analysis tasks. In order to develop robust IDS with accuracy in intrusion detection, various papers have been published over the years using different classification techniques of Data Mining (DM) and Machine Learning (ML) based hybrid approach. The present paper is an in-depth analysis of two focal aspects of Network Intrusion Detection System that includes various pre-processing methods in the form of dimensionality reduction and an assortment of classification techniques. This paper also includes comparative algorithmic analysis of DM and ML techniques, which applied to design an intelligent IDS. An experiment al comparative analysis has been carried out in support the verdicts of this work using ‘Python’ language on ‘kddcup99’ dataset as benchmark . Experimental analysis had been done in which we had found more impact on dimensionality reduction and MLP performed well in the true classification to establish secure network. The motive behind this effort is to detect different kinds of malware as early as possible with accuracy, to provide enhanced observant among various existing techniques that may help the fascinated researchers for future potential works.

Download Full-text

Analysis of Click Stream Patterns using Soft Biclustering Approaches

Systems Approach Applications for Developments in Information Technology ◽

10.4018/978-1-4666-1562-5.ch015 ◽

2012 ◽

pp. 212-224

Author(s):

P. K. Nizar Banu ◽

H. Inbarani

Keyword(s):

Machine Learning ◽

Data Mining ◽

Web Mining ◽

Web Server ◽

Web Usage Mining ◽

Web Personalization ◽

Partial Matching ◽

Web Usage ◽

Highly Correlated ◽

Web Server Logs

As websites increase in complexity, locating needed information becomes a difficult task. Such difficulty is often related to the websites’ design but also ineffective and inefficient navigation processes. Research in web mining addresses this problem by applying techniques from data mining and machine learning to web data and documents. In this study, the authors examine web usage mining, applying data mining techniques to web server logs. Web usage mining has gained much attention as a potential approach to fulfill the requirement of web personalization. In this paper, the authors propose K-means biclustering, rough biclustering and fuzzy biclustering approaches to disclose the duality between users and pages by grouping them in both dimensions simultaneously. The simultaneous clustering of users and pages discovers biclusters that correspond to groups of users that exhibit highly correlated ratings on groups of pages. The results indicate that the fuzzy C-means biclustering algorithm best and is able to detect partial matching of preferences.

Download Full-text

ENHANCING THE CLASSIFICATION ACCURACY OF NOISY DATASET BY FUSING CORRELATION BASED FEATURE SELECTION WITH K-NEAREST NEIGHBOUR

Oriental journal of computer science and technology ◽

10.13005/ojcst/10.02.05 ◽

2017 ◽

Vol 10 (2) ◽

pp. 282-290

Author(s):

Samir Singha ◽

Syed Hassan

Keyword(s):

Machine Learning ◽

Data Mining ◽

Feature Selection ◽

Dimensionality Reduction ◽

Classification Accuracy ◽

Missing Values ◽

Noisy Data ◽

Nearest Neighbour ◽

Prediction Ability ◽

Correlation Based Feature Selection

The performance of data mining and machine learning tasks can be significantly degraded due to the presence of noisy, irrelevant and high dimensional data containing large number of features. A large amount of real world data consist of noise or missing values. While collecting data, there may be many irrelevant features that are collected by the storage repositories. These redundant and irrelevant feature values distorts the classification principle and simultaneously increases calculations overhead and decreases the prediction ability of the classifier. The high-dimensionality of such datasets possesses major bottleneck in the field of data mining, statistics, machine learning. Among several methods of dimensionality reduction, attribute or feature selection technique is often used in dimensionality reduction. Since the k-NN algorithm is sensitive to irrelevant attributes therefore its performance degrades significantly when a dataset contains missing values or noisy data. However, this weakness of the k-NN algorithm can be minimized when combined with the other feature selection techniques. In this research we combine the Correlation based Feature Selection (CFS) with k-Nearest Neighbour (k-NN) Classification algorithm to find better result in classification when the dataset contains missing values or noisy data. The reduced attribute set decreases the time required for classification. The research shows that when dimensionality reduction is done using CFS and classified with k-NN algorithm, dataset with nil or very less noise may have negative impact in the classification accuracy, when compared with classification accuracy of k-NN algorithm alone. When additional noise is introduced to these datasets, the performance of k-NN degrades significantly. When these noisy datasets are classified using CFS and k-NN together, the percentage in classification accuracy is improved.

Download Full-text

Data Mining and Machine Learning

10.1017/9781108564175 ◽

2020 ◽

Cited By ~ 2

Author(s):

Mohammed J. Zaki ◽

Wagner Meira, Jr

Keyword(s):

Machine Learning ◽

Data Mining

Download Full-text

Instant medical care and drug suggestion service using data mining and machine learning based intelligent self-diagnosis medical system

International Journal of Advanced Life Sciences ◽

10.26627/ijals/2017/10.03.0022 ◽

2017 ◽

Vol 10 (03) ◽

pp. 318-325

Author(s):

sudha M

Keyword(s):

Machine Learning ◽

Data Mining ◽

Medical Care ◽

Medical System ◽

Using Data

Download Full-text

Machine Learning and Data Mining Activity Results when using Projectiles in Different Sports

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2020/103932020 ◽

2020 ◽

Vol 9 (3) ◽

pp. 3157-3160

Author(s):

Burov Alexey Gennadievich

Keyword(s):

Machine Learning ◽

Data Mining ◽

Mining Activity

Download Full-text

Classification of Operational and Financial Variables Affecting the Bullwhip Effect in Indian Sectors: A Machine Learning Approach

Recent Patents on Computer Science ◽

10.2174/2213275911666181012121059 ◽

2019 ◽

Vol 12 (3) ◽

pp. 171-179 ◽

Cited By ~ 6

Author(s):

Sachin Gupta ◽

Anurag Saxena

Keyword(s):

Machine Learning ◽

Data Mining ◽

Supply Chain ◽

Supply Chain Management ◽

Product Life Cycle ◽

Consumer Preference ◽

Bullwhip Effect ◽

Machine Learning Techniques ◽

Chain Management ◽

Financial Variables

Background: The increased variability in production or procurement with respect to less increase of variability in demand or sales is considered as bullwhip effect. Bullwhip effect is considered as an encumbrance in optimization of supply chain as it causes inadequacy in the supply chain. Various operations and supply chain management consultants, managers and researchers are doing a rigorous study to find the causes behind the dynamic nature of the supply chain management and have listed shorter product life cycle, change in technology, change in consumer preference and era of globalization, to name a few. Most of the literature that explored bullwhip effect is found to be based on simulations and mathematical models. Exploring bullwhip effect using machine learning is the novel approach of the present study. Methods: Present study explores the operational and financial variables affecting the bullwhip effect on the basis of secondary data. Data mining and machine learning techniques are used to explore the variables affecting bullwhip effect in Indian sectors. Rapid Miner tool has been used for data mining and 10-fold cross validation has been performed. Weka Alternating Decision Tree (w-ADT) has been built for decision makers to mitigate bullwhip effect after the classification. Results: Out of the 19 selected variables affecting bullwhip effect 7 variables have been selected which have highest accuracy level with minimum deviation. Conclusion: Classification technique using machine learning provides an effective tool and techniques to explore bullwhip effect in supply chain management.

Download Full-text