Performance Assessment of Learning Algorithms on Multi-Domain Data Sets

This article describes how for the last few decades, data mining research has had significant progress in a wide spectrum of applications. Research in prediction of multi-domain data sets is a challenging task due to the imbalanced, voluminous, conflicting, and complex nature of data sets. A learning algorithm is the most important technique for solving these problems. The learning algorithms are widely used for classification purposes. But choosing the learners that perform best for data sets of particular domains is a challenging task in data mining. This article provides a comparative performance assessment of various state-of-the-art learning algorithms over multi-domain data sets to search the effective classifier(s) for a particular domain, e.g., artificial, natural, semi-natural, etc. In the present article, a total of 14 real world data sets are selected from University of California, Irvine (UCI) machine learning repository for conducting experiments using three competent individual learners and their hybrid combinations.

Download Full-text

Theoretical and Empirical Analysis of a Spatial EA Parallel Boosting Algorithm

Evolutionary Computation ◽

10.1162/evco_a_00202 ◽

2018 ◽

Vol 26 (1) ◽

pp. 43-66 ◽

Cited By ~ 1

Author(s):

Uday Kamath ◽

Carlotta Domeniconi ◽

Kenneth De Jong

Keyword(s):

Real World ◽

Learning Algorithm ◽

Learning Algorithms ◽

Real World Data ◽

Meta Level ◽

Meta Learning ◽

Robustness To Noise ◽

Boosting Algorithm ◽

Efficient Learning ◽

Empirical Analyses

Many real-world problems involve massive amounts of data. Under these circumstances learning algorithms often become prohibitively expensive, making scalability a pressing issue to be addressed. A common approach is to perform sampling to reduce the size of the dataset and enable efficient learning. Alternatively, one customizes learning algorithms to achieve scalability. In either case, the key challenge is to obtain algorithmic efficiency without compromising the quality of the results. In this article we discuss a meta-learning algorithm (PSBML) that combines concepts from spatially structured evolutionary algorithms (SSEAs) with concepts from ensemble and boosting methodologies to achieve the desired scalability property. We present both theoretical and empirical analyses which show that PSBML preserves a critical property of boosting, specifically, convergence to a distribution centered around the margin. We then present additional empirical analyses showing that this meta-level algorithm provides a general and effective framework that can be used in combination with a variety of learning classifiers. We perform extensive experiments to investigate the trade-off achieved between scalability and accuracy, and robustness to noise, on both synthetic and real-world data. These empirical results corroborate our theoretical analysis, and demonstrate the potential of PSBML in achieving scalability without sacrificing accuracy.

Download Full-text

Preventing Disparate Treatment in Sequential Decision Making

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/311 ◽

2018 ◽

Cited By ~ 1

Author(s):

Hoda Heidari ◽

Andreas Krause

Keyword(s):

Decision Making ◽

Learning Algorithm ◽

Feature Space ◽

Sequential Decision Making ◽

Data Sets ◽

Sequential Decision ◽

Real World Data ◽

Time Step ◽

Job Application ◽

Disparate Treatment

We study fairness in sequential decision making environments, where at each time step a learning algorithm receives data corresponding to a new individual (e.g. a new job application) and must make an irrevocable decision about him/her (e.g. whether to hire the applicant) based on observations made so far. In order to prevent cases of disparate treatment, our time-dependent notion of fairness requires algorithmic decisions to be consistent: if two individuals are similar in the feature space and arrive during the same time epoch, the algorithm must assign them to similar outcomes. We propose a general framework for post-processing predictions made by a black-box learning model, that guarantees the resulting sequence of outcomes is consistent. We show theoretically that imposing consistency will not significantly slow down learning. Our experiments on two real-world data sets illustrate and confirm this finding in practice.

Download Full-text

A NOVEL FLOW-BASED METHOD USING GREY RELATIONAL ANALYSIS FOR PATTERN CLASSIFICATION

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622013500041 ◽

2013 ◽

Vol 12 (01) ◽

pp. 75-93 ◽

Cited By ~ 6

Author(s):

YI-CHUNG HU

Keyword(s):

Grey Relational Analysis ◽

Learning Algorithm ◽

Pairwise Comparison ◽

The Other ◽

Data Sets ◽

Classification Problems ◽

Real World Data ◽

Relational Analysis ◽

The Difference ◽

Grey Relational

Flow-based methods based on the outranking relation theory are extensively used in multiple criteria classification problems. Flow-based methods usually employed an overall preference index representing the flow to measure the intensity of preference for one pattern over another pattern. A traditional flow obtained by the pairwise comparison may not be complete since it does not globally consider the differences on each criterion between all the other patterns and the latter. That is, a traditional flow merely locally considers the difference on each criterion between two patterns. In contrast with traditional flows, the relationship-based flow is newly proposed by employing the grey relational analysis to assess the flow from one pattern to another pattern by considering the differences on each criterion between all the other patterns and the latter. A genetic algorithm-based learning algorithm is designed to determine the relative weights of respective criteria to derive the overall relationship index of a pattern. Our method is tested on several real-world data sets. Its performance is comparable to that of other well-known classifiers and flow-based methods.

Download Full-text

Wearable Devices Data for Activity Prediction Using Machine Learning Algorithms

International Journal of Big Data and Analytics in Healthcare ◽

10.4018/ijbdah.2019010103 ◽

2019 ◽

Vol 4 (1) ◽

pp. 32-46

Author(s):

Lakshmi Prayaga ◽

Krishna Devulapalli ◽

Chandra Prayaga

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Wearable Devices ◽

Machine Learning Algorithms ◽

Embedded Sensors ◽

Data Sets ◽

Activity Prediction ◽

Related Data ◽

Recent Trends

Wearable devices are contributing heavily towards the proliferation of data and creating a rich minefield for data analytics. Recent trends in the design of wearable devices include several embedded sensors which also provide useful data for many applications. This research presents results obtained from studying human-activity related data, collected from wearable devices. The activities considered for this study were working at the computer, standing and walking, standing, walking, walking up and down the stairs, and talking while walking. The research entails the use of a portion of the data to train machine learning algorithms and build a model. The rest of the data is used as test data for predicting the activity of an individual. Details of data collection, processing, and presentation are also discussed. After studying the literature and the data sets, a Random Forest machine learning algorithm was determined to be best applicable algorithm for analyzing data from wearable devices. The software used in this research includes the R statistical package and the SensorLog app.

Download Full-text

A COMPARATIVE PERFORMANCE STUDY OF MACHINE LEARNING ALGORITHMS, FOR EFFICIENT DATA MINING MANAGEMENT OF INTRUSION DETECTION SYSTEMS

International Journal of Engineering Applied Sciences and Technology ◽

10.33564/ijeast.2020.v05i06.014 ◽

2020 ◽

Vol 5 (6) ◽

pp. 85-110

Author(s):

Salihu Alhasan ◽

Ajayi Ebenezer Akinyemi ◽

Daniel Dauda Wisdom

Keyword(s):

Machine Learning ◽

Data Mining ◽

Intrusion Detection ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Intrusion Detection Systems ◽

Performance Study ◽

Comparative Performance ◽

Detection Systems ◽

Efficient Data

Download Full-text

Increasing the Accuracy of Predictive Algorithms

Encyclopedia of Information Science and Technology, Second Edition ◽

10.4018/978-1-60566-026-4.ch300 ◽

2011 ◽

pp. 1906-1910

Author(s):

Sotiris Kotsiantis ◽

Dimitris Kanellopoulos ◽

Panayotis Pintelas

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Bayesian Networks ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Data Sets ◽

Combining Classifiers ◽

Predictive Algorithms

In classification learning, the learning scheme is presented with a set of classified examples from which it is expected tone can learn a way of classifying unseen examples (see Table 1). Formally, the problem can be stated as follows: Given training data {(x1, y1)…(xn, yn)}, produce a classifier h: X- >Y that maps an object x ? X to its classification label y ? Y. A large number of classification techniques have been developed based on artificial intelligence (logic-based techniques, perception-based techniques) and statistics (Bayesian networks, instance-based techniques). No single learning algorithm can uniformly outperform other algorithms over all data sets. The concept of combining classifiers is proposed as a new direction for the improvement of the performance of individual machine learning algorithms. Numerous methods have been suggested for the creation of ensembles of classi- fiers (Dietterich, 2000). Although, or perhaps because, many methods of ensemble creation have been proposed, there is as yet no clear picture of which method is best.

Download Full-text

Machine Learning Algorithms for Analysis of DNA Data Sets

Machine Learning Algorithms for Problem Solving in Computational Applications ◽

10.4018/978-1-4666-1833-6.ch004 ◽

2012 ◽

pp. 47-58 ◽

Cited By ~ 2

Author(s):

John Yearwood ◽

Adil Bagirov ◽

Andrei V. Kelarev

Keyword(s):

Machine Learning ◽

Dna Sequences ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Local Alignment ◽

Data Sets ◽

Data Set ◽

Applications Of Machine Learning ◽

New Machine

The applications of machine learning algorithms to the analysis of data sets of DNA sequences are very important. The present chapter is devoted to the experimental investigation of applications of several machine learning algorithms for the analysis of a JLA data set consisting of DNA sequences derived from non-coding segments in the junction of the large single copy region and inverted repeat A of the chloroplast genome in Eucalyptus collected by Australian biologists. Data sets of this sort represent a new situation, where sophisticated alignment scores have to be used as a measure of similarity. The alignment scores do not satisfy properties of the Minkowski metric, and new machine learning approaches have to be investigated. The authors’ experiments show that machine learning algorithms based on local alignment scores achieve very good agreement with known biological classes for this data set. A new machine learning algorithm based on graph partitioning performed best for clustering of the JLA data set. Our novel k-committees algorithm produced most accurate results for classification. Two new examples of synthetic data sets demonstrate that the authors’ k-committees algorithm can outperform both the Nearest Neighbour and k-medoids algorithms simultaneously.

Download Full-text

A Framework for Implementing Machine Learning algorithms using Data sets

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k1263.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 155-160

Keyword(s):

Machine Learning ◽

Data Mining ◽

Mathematical Analysis ◽

Rapid Development ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Commercial Application ◽

Data Sets ◽

Learning Technology ◽

Human Society

The rapid development of cloud computing, big data, machine learning and datamining made information technology and human society to enter new era of technology. Statistical and mathematical analysis on data given a new way of research on prediction and estimation using samples and data sets. Data mining is a mechanism that explores and analyzes many dis-organized or dis-ordered data to obtain potentially useful information and model it based on different algorithms. Machine learning is an iterative process rather than a linear process that requires each step to be revisited as more is learned about the problem. We discussed different machine learning algorithms that can manipulate data and analyses datasets based on best cases for accurate results. Design and Implementation of a framework that is associated with different machine learning algorithms. This paper expounds the definition, model, development stage, classification and commercial application of machine learning, and emphasizes the role of machine learning in data mining by deploying the framework. Therefore, this paper summarizes and analyzes machine learning technology, and discusses the use of machine learning algorithms in data mining. Finally, the mathematical analysis along with results and graphical analysis is given

Download Full-text

Comparative performance assessment of beam hardening correction algorithms applied on simulated data sets

Journal of Microscopy ◽

10.1111/jmi.12746 ◽

2018 ◽

Vol 272 (3) ◽

pp. 229-241 ◽

Cited By ~ 3

Author(s):

W. CAO ◽

T. SUN ◽

G. FARDELL ◽

B. PRICE ◽

W. DEWULF

Keyword(s):

Performance Assessment ◽

Simulated Data ◽

Data Sets ◽

Beam Hardening ◽

Comparative Performance ◽

Beam Hardening Correction ◽

Simulated Data Sets

Download Full-text

Domain-Based Benchmark Experiments: Exploratory and Inferential Analysis

Austrian Journal of Statistics ◽

10.17713/ajs.v41i1.185 ◽

2016 ◽

Vol 41 (1) ◽

Cited By ~ 8

Author(s):

Manuel J. A. Eugster ◽

Torsten Hothorn ◽

Friedrich Leisch

Keyword(s):

Learning Algorithm ◽

Learning Algorithms ◽

Joint Analysis ◽

Data Sets ◽

Complete Collection ◽

Data Set ◽

Enterprise Application ◽

Empirical Performance ◽

Formal Statistical Analysis ◽

Single Data

Benchmark experiments are the method of choice to compare learning algorithms empirically. For collections of data sets, the empirical performance distributions of a set of learning algorithms are estimated, compared, and ordered. Usually this is done for each data set separately. The present manuscript extends this single data set-based approach to a joint analysis for the complete collection, the so called problem domain. This enablesto decide which algorithms to deploy in a specific application or to compare newly developed algorithms with well-known algorithms on established problem domains.Specialized visualization methods allow for easy exploration of huge amounts of benchmark data. Furthermore, we take the benchmark experiment design into account and use mixed-effects models to provide a formal statistical analysis. Two domain-based benchmark experiments demonstrate our methods: the UCI domain as a well-known domain when one is developing a new algorithm; and the Grasshopper domain as a domain where we want to find the best learning algorithm for a prediction component in an enterprise application software system.

Download Full-text